- +

Maneage: managing data lineage

Copyright (C) 2018-2020 Mohammad Akhlaghi mohammad@akhlaghi.org
@@ -533,145 +545,145 @@ cd my-project # Go into git remote rename origin origin-maneage # Rename current/only remote to "origin-maneage". git checkout -b master # Create and enter your own "master" branch. pwd # Just to confirm where you are. - -

Prepare to build project: The ./project configure command of the - next step will build the different software packages within the - "build" directory (that you will specify). Nothing else on your system - will be touched. However, since it takes long, it is useful to see - what it is being built at every instant (its almost impossible to tell - from the torrent of commands that are produced!). So open another - terminal on your desktop and navigate to the same project directory - that you cloned (output of last command above). Then run the following - command. Once every second, this command will just print the date - (possibly followed by a non-existent directory notice). But as soon as - the next step starts building software, you'll see the names of - software get printed as they are being built. Once any software is - installed in the project build directory it will be removed. Again, - don't worry, nothing will be installed outside the build directory.

Prepare to build project: The ./project configure command of the + next step will build the different software packages within the + "build" directory (that you will specify). Nothing else on your system + will be touched. However, since it takes long, it is useful to see + what it is being built at every instant (its almost impossible to tell + from the torrent of commands that are produced!). So open another + terminal on your desktop and navigate to the same project directory + that you cloned (output of last command above). Then run the following + command. Once every second, this command will just print the date + (possibly followed by a non-existent directory notice). But as soon as + the next step starts building software, you'll see the names of + software get printed as they are being built. Once any software is + installed in the project build directory it will be removed. Again, + don't worry, nothing will be installed outside the build directory.


+                        
 # On another terminal (go to top project source directory, last command above)
 ./project --check-config
-

Test Maneage: Before making any changes, it is important to test it
-                                and see if everything works properly with the commands below. If there
-                                is any problem in the ./project configure or ./project make steps,
-                                please contact us to fix the problem before continuing. Since the
-                                building of dependencies in configuration can take long, you can take
-                                the next few steps (editing the files) while its working (they don't
-                                affect the configuration). After ./project make is finished, open
-                                paper.pdf. If it looks fine, you are ready to start customizing the
-                                Maneage for your project. But before that, clean all the extra Maneage
-                                outputs with make clean as shown below.
+

Test Maneage: Before making any changes, it is important to test it + and see if everything works properly with the commands below. If there + is any problem in the ./project configure or ./project make steps, + please contact us to fix the problem before continuing. Since the + building of dependencies in configuration can take long, you can take + the next few steps (editing the files) while its working (they don't + affect the configuration). After ./project make is finished, open + paper.pdf. If it looks fine, you are ready to start customizing the + Maneage for your project. But before that, clean all the extra Maneage + outputs with make clean as shown below.


+                        
 ./project configure     # Build the project's software environment (can take an hour or so).
 ./project make          # Do the processing and build paper (just a simple demo).
-                        # Open 'paper.pdf' and see if everything is ok.
-

Setup the remote: You can use any hosting
-                                    facility
-                                that supports Git to keep an online copy of your project's version
-                                controlled history. We recommend GitLab because
-                                it is more ethical (although not
-                                    perfect),
-                                and later you can also host GitLab on your own server. Anyway, create
-                                an account in your favorite hosting facility (if you don't already
-                                have one), and define a new project there. Please make sure the newly
-                                    created project is empty (some services ask to include a README in
-                                a new project which is bad in this scenario, and will not allow you to
-                                push to it). It will give you a URL (usually starting with git@ and
-                                ending in .git), put this URL in place of XXXXXXXXXX in the first
-                                command below. With the second command, "push" your master branch to
-                                your origin remote, and (with the --set-upstream option) set them
-                                to track/follow each other. However, the maneage branch is currently
-                                tracking/following your origin-maneage remote (automatically set
-                                when you cloned Maneage). So when pushing the maneage branch to your
-                                origin remote, you shouldn't use --set-upstream. With the last
-                                command, you can actually check this (which local and remote branches
-                                are tracking each other).
+# Open 'paper.pdf' and see if everything is ok.
+

Setup the remote: You can use any hosting + facility + that supports Git to keep an online copy of your project's version + controlled history. We recommend GitLab because + it is more ethical (although not + perfect), + and later you can also host GitLab on your own server. Anyway, create + an account in your favorite hosting facility (if you don't already + have one), and define a new project there. Please make sure the newly + created project is empty (some services ask to include a README in + a new project which is bad in this scenario, and will not allow you to + push to it). It will give you a URL (usually starting with git@ and + ending in .git), put this URL in place of XXXXXXXXXX in the first + command below. With the second command, "push" your master branch to + your origin remote, and (with the --set-upstream option) set them + to track/follow each other. However, the maneage branch is currently + tracking/following your origin-maneage remote (automatically set + when you cloned Maneage). So when pushing the maneage branch to your + origin remote, you shouldn't use --set-upstream. With the last + command, you can actually check this (which local and remote branches + are tracking each other).


+                        
 git remote add origin XXXXXXXXXX        # Newly created repo is now called 'origin'.
 git push --set-upstream origin master   # Push 'master' branch to 'origin' (with tracking).
 git push origin maneage                 # Push 'maneage' branch to 'origin' (no tracking).
-


-                                Title, short description and author: The title and basic
-                                    information of your project's output PDF paper should be added in
-                                    paper.tex. You should see the relevant place in the preamble (prior
-                                    to \begin{document}. After you are done, run the ./project make
-                                    command again to see your changes in the final PDF, and make sure that
-                                    your changes don't cause a crash in LaTeX. Of course, if you use a
-                                    different LaTeX package/style for managing the title and authors (in
-                                    particular a specific journal's style), please feel free to use it
-                                    your own methods after finishing this checklist and doing your first
-                                    commit.
-

Delete dummy parts: Maneage contains some parts that are only for
-                                    the initial/test run, mainly as a demonstration of important steps,
-                                    which you can use as a reference to use in your own project. But they
-                                    not for any real analysis, so you should remove these parts as
-                                    described below:
+

Title, short description and author: The title and basic + information of your project's output PDF paper should be added in + paper.tex. You should see the relevant place in the preamble (prior + to \begin{document}. After you are done, run the ./project make + command again to see your changes in the final PDF, and make sure that + your changes don't cause a crash in LaTeX. Of course, if you use a + different LaTeX package/style for managing the title and authors (in + particular a specific journal's style), please feel free to use it + your own methods after finishing this checklist and doing your first + commit.

Delete dummy parts: Maneage contains some parts that are only for + the initial/test run, mainly as a demonstration of important steps, + which you can use as a reference to use in your own project. But they + not for any real analysis, so you should remove these parts as + described below:

paper.tex: 1) Delete the text of the abstract (from - \includeabstract{ to \vspace{0.25cm}) and write your own (a - single sentence can be enough now, you can complete it later). 2) - Add some keywords under it in the keywords part. 3) Delete - everything between %% Start of main body. and %% End of main - body.. 4) Remove the notice in the "Acknowledgments" section (in - \new{}) and Acknowledge your funding sources (this can also be - done later). Just don't delete the existing acknowledgment - statement: Maneage is possible thanks to funding from several - grants. Since Maneage is being used in your work, it is necessary to - acknowledge them in your work also.
reproduce/analysis/make/top-make.mk: Delete the delete-me line - in the makesrc definition. Just make sure there is no empty line - between the download \ and verify \ lines (they should be - directly under each other).
reproduce/analysis/make/verify.mk: In the final recipe, under the - commented line Verify TeX macros, remove the full line that - contains delete-me, and set the value of s in the line for - download to XXXXX (any temporary string, you'll fix it in the - end of your project, when its complete).

Delete all delete-me* files in the following directories:


+                        
+                            paper.tex: 1) Delete the text of the abstract (from
+                                \includeabstract{ to \vspace{0.25cm}) and write your own (a
+                                single sentence can be enough now, you can complete it later). 2)
+                                Add some keywords under it in the keywords part. 3) Delete
+                                everything between %% Start of main body. and %% End of main
+                                    body.. 4) Remove the notice in the "Acknowledgments" section (in
+                                \new{}) and Acknowledge your funding sources (this can also be
+                                done later). Just don't delete the existing acknowledgment
+                                statement: Maneage is possible thanks to funding from several
+                                grants. Since Maneage is being used in your work, it is necessary to
+                                acknowledge them in your work also.
+                            reproduce/analysis/make/top-make.mk: Delete the delete-me line
+                                in the makesrc definition. Just make sure there is no empty line
+                                between the download \ and verify \ lines (they should be
+                                directly under each other).
+                            reproduce/analysis/make/verify.mk: In the final recipe, under the
+                                commented line Verify TeX macros, remove the full line that
+                                contains delete-me, and set the value of s in the line for
+                                download to XXXXX (any temporary string, you'll fix it in the
+                                end of your project, when its complete).
+                            Delete all delete-me* files in the following directories:
+                                
 rm tex/src/delete-me*
 rm reproduce/analysis/make/delete-me*
 rm reproduce/analysis/config/delete-me*
-                                            
-                                            Disable verification of outputs by removing the yes from
-                                                reproduce/analysis/config/verify-outputs.conf. Later, when you are
-                                                ready to submit your paper, or publish the dataset, activate
-                                                verification and make the proper corrections in this file (described
-                                                under the "Other basic customizations" section below). This is a
-                                                critical step and only takes a few minutes when your project is
-                                                finished. So DON'T FORGET to activate it in the end.
-                                            Re-make the project (after a cleaning) to see if you haven't
-                                                introduced any errors.
+

Disable verification of outputs by removing the yes from + reproduce/analysis/config/verify-outputs.conf. Later, when you are + ready to submit your paper, or publish the dataset, activate + verification and make the proper corrections in this file (described + under the "Other basic customizations" section below). This is a + critical step and only takes a few minutes when your project is + finished. So DON'T FORGET to activate it in the end.

Re-make the project (after a cleaning) to see if you haven't + introduced any errors.


+                                
 ./project make clean
 ./project make
-

Don't merge some files in future updates: As described below, you
-                                        can later update your infra-structure (for example to fix bugs) by
-                                        merging your master branch with maneage. For files that you have
-                                        created in your own branch, there will be no problem. However if you
-                                        modify an existing Maneage file for your project, next time its
-                                        updated on maneage you'll have an annoying conflict. The commands
-                                        below show how to fix this future problem. With them, you can
-                                        configure Git to ignore the changes in maneage for some of the files
-                                        you have already edited and deleted above (and will edit below). Note
-                                        that only the first echo command has a > (to write over the file),
-                                        the rest are >> (to append to it). If you want to avoid any other
-                                        set of files to be imported from Maneage into your project's branch,
-                                        you can follow a similar strategy. We recommend only doing it when you
-                                        encounter the same conflict in more than one merge and there is no
-                                        other change in that file. Also, don't add core Maneage Makefiles,
-                                        otherwise Maneage can break on the next run.
+

+ +

Don't merge some files in future updates: As described below, you + can later update your infra-structure (for example to fix bugs) by + merging your master branch with maneage. For files that you have + created in your own branch, there will be no problem. However if you + modify an existing Maneage file for your project, next time its + updated on maneage you'll have an annoying conflict. The commands + below show how to fix this future problem. With them, you can + configure Git to ignore the changes in maneage for some of the files + you have already edited and deleted above (and will edit below). Note + that only the first echo command has a > (to write over the file), + the rest are >> (to append to it). If you want to avoid any other + set of files to be imported from Maneage into your project's branch, + you can follow a similar strategy. We recommend only doing it when you + encounter the same conflict in more than one merge and there is no + other change in that file. Also, don't add core Maneage Makefiles, + otherwise Maneage can break on the next run.


+                            
 echo "paper.tex merge=ours" > .gitattributes
 echo "tex/src/delete-me.mk merge=ours" >> .gitattributes
 echo "tex/src/delete-me-demo.mk merge=ours" >> .gitattributes
@@ -679,50 +691,50 @@ echo "reproduce/analysis/make/delete-me.mk merge=ours" >> .gitattributes
 echo "reproduce/software/config/TARGETS.conf merge=ours" >> .gitattributes
 echo "reproduce/analysis/config/delete-me-num.conf merge=ours" >> .gitattributes
 git add .gitattributes
-


-                                        Copyright and License notice: It is necessary that all the
-                                            "copyright-able" files in your project (those larger than 10 lines)
-                                            have a copyright and license notice. Please take a moment to look at
-                                            several existing files to see a few examples. The copyright notice is
-                                            usually close to the start of the file, it is the line starting with
-                                            Copyright (C) and containing a year and the author's name (like the
-                                            examples below). The License notice is a short description of the
-                                            copyright license, usually one or two paragraphs with a URL to the
-                                            full license. Don't forget to add these two notices to any new
-                                                file you add in your project (you can just copy-and-paste). When you
-                                            modify an existing Maneage file (which already has the notices), just
-                                            add a copyright notice in your name under the existing one(s), like
-                                            the line with capital letters below. To start with, add this line with
-                                            your name and email address to paper.tex,
-                                            tex/src/preamble-header.tex, reproduce/analysis/make/top-make.mk,
-                                            and generally, all the files you modified in the previous step.
-
-                                            
+                        
+                        Copyright and License notice: It is necessary that all the
+                            "copyright-able" files in your project (those larger than 10 lines)
+                            have a copyright and license notice. Please take a moment to look at
+                            several existing files to see a few examples. The copyright notice is
+                            usually close to the start of the file, it is the line starting with
+                            Copyright (C) and containing a year and the author's name (like the
+                            examples below). The License notice is a short description of the
+                            copyright license, usually one or two paragraphs with a URL to the
+                            full license. Don't forget to add these two notices to any new
+                                file you add in your project (you can just copy-and-paste). When you
+                            modify an existing Maneage file (which already has the notices), just
+                            add a copyright notice in your name under the existing one(s), like
+                            the line with capital letters below. To start with, add this line with
+                            your name and email address to paper.tex,
+                            tex/src/preamble-header.tex, reproduce/analysis/make/top-make.mk,
+                            and generally, all the files you modified in the previous step.
+
+                            
 Copyright (C) 2018-2020 Existing Name <existing@email.address>
 Copyright (C) 2020 YOUR NAME <YOUR@EMAIL.ADDRESS>
-                                            
-

Configure Git for fist time: If this is the first time you are
-                                                running Git on this system, then you have to configure it with some
-                                                basic information in order to have essential information in the commit
-                                                messages (ignore this step if you have already done it). Git will
-                                                include your name and e-mail address information in each commit. You
-                                                can also specify your favorite text editor for making the commit
-                                                (emacs, vim, nano, and etc.).
+

Configure Git for fist time: If this is the first time you are + running Git on this system, then you have to configure it with some + basic information in order to have essential information in the commit + messages (ignore this step if you have already done it). Git will + include your name and e-mail address information in each commit. You + can also specify your favorite text editor for making the commit + (emacs, vim, nano, and etc.).


+                            
 git config --global user.name "YourName YourSurname"
 git config --global user.email your-email@example.com
 git config --global core.editor nano
-


-                                                Your first commit: You have already made some small and basic
-                                                    changes in the steps above and you are in your project's master
-                                                    branch. So, you can officially make your first commit in your
-                                                    project's history and push it. But before that, you need to make sure
-                                                    that there are no problems in the project. This is a good habit to
-                                                    always re-build the system before a commit to be sure it works as
-                                                    expected.
-
-                                                    
+                        
+                        Your first commit: You have already made some small and basic
+                            changes in the steps above and you are in your project's master
+                            branch. So, you can officially make your first commit in your
+                            project's history and push it. But before that, you need to make sure
+                            that there are no problems in the project. This is a good habit to
+                            always re-build the system before a commit to be sure it works as
+                            expected.
+
+                            
 git status                 # See which files you have changed.
 git diff                   # Check the lines you have added/changed.
 ./project make             # Make sure everything builds successfully.
@@ -731,13 +743,13 @@ git status                 # Make sure everything is fine.
 git diff --cached          # Confirm all the changes that will be committed.
 git commit                 # Your first commit: put a good description!
 git push                   # Push your commit to your remote.
-                                                    
-                                                    Start your exciting research: You are now ready to add flesh and
-                                                        blood to this raw skeleton by further modifying and adding your
-                                                        exciting research steps. You can use the "published works" section in
-                                                        the introduction (above) as some fully working models to learn
-                                                        from. Also, don't hesitate to contact us if you have any
-                                                        questions.
+

Start your exciting research: You are now ready to add flesh and + blood to this raw skeleton by further modifying and adding your + exciting research steps. You can use the "published works" section in + the introduction (above) as some fully working models to learn + from. Also, don't hesitate to contact us if you have any + questions.

Other basic customizations

@@ -777,76 +789,76 @@ git push # Push your commit to your remo


 grep -ir wfpc2 ./*
-

README.md: Correct all the XXXXX place holders (name of your - project, your own name, address of your project's online/remote - repository, link to download dependencies and etc). Generally, read - over the text and update it where necessary to fit your project. Don't - forget that this is the first file that is displayed on your online - repository and also your colleagues will first be drawn to read this - file. Therefore, make it as easy as possible for them to start - with. Also check and update this file one last time when you are ready - to publish your project's paper/source.

Verify outputs: During the initial customization checklist, you - disabled verification. This is natural because during the project you - need to make changes all the time and its a waste of time to enable - verification every time. But at significant moments of the project - (for example before submission to a journal, or publication) it is - necessary. When you activate verification, before building the paper, - all the specified datasets will be compared with their respective - checksum and if any file's checksum is different from the one recorded - in the project, it will stop and print the problematic file and its - expected and calculated checksums. First set the value of - verify-outputs variable in - reproduce/analysis/config/verify-outputs.conf to yes. Then go to - reproduce/analysis/make/verify.mk. The verification of all the files - is only done in one recipe. First the files that go into the - plots/figures are checked, then the LaTeX macros. Validation of the - former (inputs to plots/figures) should be done manually. If its the - first time you are doing this, you can see two examples of the dummy - steps (with delete-me, you can use them if you like). These two - examples should be removed before you can run the project. For the - latter, you just have to update the checksums. The important thing to - consider is that a simple checksum can be problematic because some - file generators print their run-time date in the file (for example as - commented lines in a text table). When checking text files, this - Makefile already has this function: - verify-txt-no-comments-leading-space. As the name suggests, it will - remove comment lines and empty lines before calculating the MD5 - checksum. For FITS formats (common in astronomy, fortunately there is - a DATASUM definition which will return the checksum independent of - the headers. You can use the provided function(s), or define one for - your special formats.

Feedback: As you use Maneage you will notice many things that if - implemented from the start would have been very useful for your - work. This can be in the actual scripting and architecture of Maneage, - or useful implementation and usage tips, like those below. In any - case, please share your thoughts and suggestions with us, so we can - add them here for everyone's benefit.

Re-preparation: Automatic preparation is only run in the first run - of the project on a system, to re-do the preparation you have to use - the option below. Here is the reason for this: when its necessary, the - preparation process can be slow and will unnecessarily slow down the - whole project while the project is under development (focus is on the - analysis that is done after preparation). Because of this, preparation - will be done automatically for the first time that the project is run - (when .build/software/preparation-done.mk doesn't exist). After the - preparation process completes once, future runs of ./project make - will not do the preparation process anymore (will not call - top-prepare.mk). They will only call top-make.mk for the - analysis. To manually invoke the preparation process after the first - attempt, the ./project make script should be run with the - --prepare-redo option, or you can delete the special file above.

README.md: Correct all the XXXXX place holders (name of your + project, your own name, address of your project's online/remote + repository, link to download dependencies and etc). Generally, read + over the text and update it where necessary to fit your project. Don't + forget that this is the first file that is displayed on your online + repository and also your colleagues will first be drawn to read this + file. Therefore, make it as easy as possible for them to start + with. Also check and update this file one last time when you are ready + to publish your project's paper/source.

Verify outputs: During the initial customization checklist, you + disabled verification. This is natural because during the project you + need to make changes all the time and its a waste of time to enable + verification every time. But at significant moments of the project + (for example before submission to a journal, or publication) it is + necessary. When you activate verification, before building the paper, + all the specified datasets will be compared with their respective + checksum and if any file's checksum is different from the one recorded + in the project, it will stop and print the problematic file and its + expected and calculated checksums. First set the value of + verify-outputs variable in + reproduce/analysis/config/verify-outputs.conf to yes. Then go to + reproduce/analysis/make/verify.mk. The verification of all the files + is only done in one recipe. First the files that go into the + plots/figures are checked, then the LaTeX macros. Validation of the + former (inputs to plots/figures) should be done manually. If its the + first time you are doing this, you can see two examples of the dummy + steps (with delete-me, you can use them if you like). These two + examples should be removed before you can run the project. For the + latter, you just have to update the checksums. The important thing to + consider is that a simple checksum can be problematic because some + file generators print their run-time date in the file (for example as + commented lines in a text table). When checking text files, this + Makefile already has this function: + verify-txt-no-comments-leading-space. As the name suggests, it will + remove comment lines and empty lines before calculating the MD5 + checksum. For FITS formats (common in astronomy, fortunately there is + a DATASUM definition which will return the checksum independent of + the headers. You can use the provided function(s), or define one for + your special formats.

Feedback: As you use Maneage you will notice many things that if + implemented from the start would have been very useful for your + work. This can be in the actual scripting and architecture of Maneage, + or useful implementation and usage tips, like those below. In any + case, please share your thoughts and suggestions with us, so we can + add them here for everyone's benefit.

Re-preparation: Automatic preparation is only run in the first run + of the project on a system, to re-do the preparation you have to use + the option below. Here is the reason for this: when its necessary, the + preparation process can be slow and will unnecessarily slow down the + whole project while the project is under development (focus is on the + analysis that is done after preparation). Because of this, preparation + will be done automatically for the first time that the project is run + (when .build/software/preparation-done.mk doesn't exist). After the + preparation process completes once, future runs of ./project make + will not do the preparation process anymore (will not call + top-prepare.mk). They will only call top-make.mk for the + analysis. To manually invoke the preparation process after the first + attempt, the ./project make script should be run with the + --prepare-redo option, or you can delete the special file above.


+                        
 ./project make --prepare-redo
-


-                            Pre-publication: add notice on reproducibility**: Add a notice
-                                somewhere prominent in the first page within your paper, informing the
-                                reader that your research is fully reproducible. For example in the
-                                end of the abstract, or under the keywords with a title like
-                                "reproducible paper". This will encourage them to publish their own
-                                works in this manner also and also will help spread the word.
+

Pre-publication: add notice on reproducibility**: Add a notice + somewhere prominent in the first page within your paper, informing the + reader that your research is fully reproducible. For example in the + end of the abstract, or under the keywords with a title like + "reproducible paper". This will encourage them to publish their own + works in this manner also and also will help spread the word.

Tips for designing your project

@@ -960,28 +972,28 @@ grep -ir wfpc2 ./*


 info make "automatic variables"
-

Debug: Since Make doesn't follow the common top-down paradigm, it - can be a little hard to get accustomed to why you get an error or - un-expected behavior. In such cases, run Make with the -d - option. With this option, Make prints a full list of exactly which - prerequisites are being checked for which targets. Looking - (patiently) through this output and searching for the faulty - file/step will clearly show you any mistake you might have made in - defining the targets or prerequisites.

Large files: If you are dealing with very large files (thus having - multiple copies of them for intermediate steps is not possible), one - solution is the following strategy (Also see the next item on "Fast - access to temporary files"). Set a small plain text file as the - actual target and delete the large file when it is no longer needed - by the project (in the last rule that needs it). Below is a simple - demonstration of doing this. In it, we use Gnuastro's Arithmetic - program to add all pixels of the input image with 2 and create - large1.fits. We then subtract 2 from large1.fits to create - large2.fits and delete large1.fits in the same rule (when its no - longer needed). We can later do the same with large2.fits when it - is no longer needed and so on. -

Debug: Since Make doesn't follow the common top-down paradigm, it + can be a little hard to get accustomed to why you get an error or + un-expected behavior. In such cases, run Make with the -d + option. With this option, Make prints a full list of exactly which + prerequisites are being checked for which targets. Looking + (patiently) through this output and searching for the faulty + file/step will clearly show you any mistake you might have made in + defining the targets or prerequisites.

Large files: If you are dealing with very large files (thus having + multiple copies of them for intermediate steps is not possible), one + solution is the following strategy (Also see the next item on "Fast + access to temporary files"). Set a small plain text file as the + actual target and delete the large file when it is no longer needed + by the project (in the last rule that needs it). Below is a simple + demonstration of doing this. In it, we use Gnuastro's Arithmetic + program to add all pixels of the input image with 2 and create + large1.fits. We then subtract 2 from large1.fits to create + large2.fits and delete large1.fits in the same rule (when its no + longer needed). We can later do the same with large2.fits when it + is no longer needed and so on. +


 large1.fits.txt: input.fits
 astarithmetic $< 2 + --output=$(subst .txt,,$@)
 echo "done" > $@
@@ -989,26 +1001,26 @@ large2.fits.txt: large1.fits.txt
 astarithmetic $(subst .txt,,$<) 2 - --output=$(subst .txt,,$@)
 rm $(subst .txt,,$<)
 echo "done" > $@
-

- A more advanced Make programmer will use Make's call function - to define a wrapper in reproduce/analysis/make/initialize.mk. This - wrapper will replace $(subst .txt,,XXXXX). Therefore, it will be - possible to greatly simplify this repetitive statement and make the - code even more readable throughout the whole project.

Fast access to temporary files: Most Unix-like operating systems - will give you a special shared-memory device (directory): on systems - using the GNU C Library (all GNU/Linux system), it is /dev/shm. The - contents of this directory are actually in your RAM, not in your - persistence storage like the HDD or SSD. Reading and writing from/to - the RAM is much faster than persistent storage, so if you have enough - RAM available, it can be very beneficial for large temporary files to - be put there. You can use the mktemp program to give the temporary - files a randomly-set name, and use text files as targets to keep that - name (as described in the item above under "Large files") for later - deletion. For example, see the minimal working example Makefile below - (which you can actually put in a Makefile and run if you have an - input.fits in the same directory, and Gnuastro is installed). -

+ A more advanced Make programmer will use Make's call function + to define a wrapper in reproduce/analysis/make/initialize.mk. This + wrapper will replace $(subst .txt,,XXXXX). Therefore, it will be + possible to greatly simplify this repetitive statement and make the + code even more readable throughout the whole project.

Fast access to temporary files: Most Unix-like operating systems + will give you a special shared-memory device (directory): on systems + using the GNU C Library (all GNU/Linux system), it is /dev/shm. The + contents of this directory are actually in your RAM, not in your + persistence storage like the HDD or SSD. Reading and writing from/to + the RAM is much faster than persistent storage, so if you have enough + RAM available, it can be very beneficial for large temporary files to + be put there. You can use the mktemp program to give the temporary + files a randomly-set name, and use text files as targets to keep that + name (as described in the item above under "Large files") for later + deletion. For example, see the minimal working example Makefile below + (which you can actually put in a Makefile and run if you have an + input.fits in the same directory, and Gnuastro is installed). +


 .ONESHELL:
 .SHELLFLAGS = -ec
 all: mean-std.txt
@@ -1027,30 +1039,30 @@ mean-std.txt: large2.txt
 input=$$(cat $<)
 aststatistics $$input.fits --mean --std > $@
 rm $$input.fits $$input
-

- The important point here is that the temporary name template - (shm-maneage) has no suffix. So you can add the suffix - corresponding to your desired format afterwards (for example - $$out.fits, or $$out.txt). But more importantly, when mktemp - sets the random name, it also checks if no file exists with that name - and creates a file with that exact name at that moment. So at the end - of each recipe above, you'll have two files in your /dev/shm, one - empty file with no suffix one with a suffix. The role of the file - without a suffix is just to ensure that the randomly set name will - not be used by other calls to mktemp (when running in parallel) and - it should be deleted with the file containing a suffix. This is the - reason behind the rm $$input.fits $$input command above: to make - sure that first the file with a suffix is deleted, then the core - random file (note that when working in parallel on powerful systems, - in the time between deleting two files of a single rm command, many - things can happen!). When using Maneage, you can put the definition - of shm-maneage in reproduce/analysis/make/initialize.mk to be - usable in all the different Makefiles of your analysis, and you won't - need the three lines above it. Finally, BE RESPONSIBLE: after you - are finished, be sure to clean up any possibly remaining files (due - to crashes in the processing while you are working), otherwise your - RAM may fill up very fast. You can do it easily with a command like - this on your command-line: rm -f /dev/shm/$(whoami)-*.

+ + The important point here is that the temporary name template + (shm-maneage) has no suffix. So you can add the suffix + corresponding to your desired format afterwards (for example + $$out.fits, or $$out.txt). But more importantly, when mktemp + sets the random name, it also checks if no file exists with that name + and creates a file with that exact name at that moment. So at the end + of each recipe above, you'll have two files in your /dev/shm, one + empty file with no suffix one with a suffix. The role of the file + without a suffix is just to ensure that the randomly set name will + not be used by other calls to mktemp (when running in parallel) and + it should be deleted with the file containing a suffix. This is the + reason behind the rm $$input.fits $$input command above: to make + sure that first the file with a suffix is deleted, then the core + random file (note that when working in parallel on powerful systems, + in the time between deleting two files of a single rm command, many + things can happen!). When using Maneage, you can put the definition + of shm-maneage in reproduce/analysis/make/initialize.mk to be + usable in all the different Makefiles of your analysis, and you won't + need the three lines above it. Finally, BE RESPONSIBLE: after you + are finished, be sure to clean up any possibly remaining files (due + to crashes in the processing while you are working), otherwise your + RAM may fill up very fast. You can do it easily with a command like + this on your command-line: rm -f /dev/shm/$(whoami)-*.

Software tarballs and raw inputs: It is critically important to document the raw inputs to your project (software tarballs and raw @@ -1101,91 +1113,91 @@ git log XXXXXX..XXXXXX --reverse # Inspect new work (re git log --oneline --graph --decorate --all # General view of branches. git checkout master # Go to your top working branch. git merge maneage # Import all the work into master. -

Adding Maneage to a fork of your project: As you and your colleagues - continue your project, it will be necessary to have separate - forks/clones of it. But when you clone your own project on a - different system, or a colleague clones it to collaborate with you, - the clone won't have the origin-maneage remote that you started the - project with. As shown in the previous item above, you need this - remote to be able to pull recent updates from Maneage. The steps - below will setup the origin-maneage remote, and a local maneage - branch to track it, on the new clone.

- -

Adding Maneage to a fork of your project: As you and your colleagues + continue your project, it will be necessary to have separate + forks/clones of it. But when you clone your own project on a + different system, or a colleague clones it to collaborate with you, + the clone won't have the origin-maneage remote that you started the + project with. As shown in the previous item above, you need this + remote to be able to pull recent updates from Maneage. The steps + below will setup the origin-maneage remote, and a local maneage + branch to track it, on the new clone.

+ +


 git remote add origin-maneage https://git.maneage.org/project.git
 git fetch origin-maneage
 git checkout -b maneage --track origin-maneage/maneage
-

Commit message: The commit message is a very important and useful - aspect of version control. To make the commit message useful for - others (or yourself, one year later), it is good to follow a - consistent style. Maneage already has a consistent formatting - (described below), which you can also follow in your project if you - like. You can see many examples by running git log in the maneage - branch. If you intend to push commits to Maneage, for the consistency - of Maneage, it is necessary to follow these guidelines. 1) No line - should be more than 75 characters (to enable easy reading of the - message when you run git log on the standard 80-character - terminal). 2) The first line is the title of the commit and should - summarize it (so git log --oneline can be useful). The title should - also not end with a point (., because its a short single sentence, - so a point is not necessary and only wastes space). 3) After the - title, leave an empty line and start the body of your message - (possibly containing many paragraphs). 4) Describe the context of - your commit (the problem it is trying to solve) as much as possible, - then go onto how you solved it. One suggestion is to start the main - body of your commit with "Until now ...", and continue describing the - problem in the first paragraph(s). Afterwards, start the next - paragraph with "With this commit ...".

Project outputs: During your research, it is possible to checkout a - specific commit and reproduce its results. However, the processing - can be time consuming. Therefore, it is useful to also keep track of - the final outputs of your project (at minimum, the paper's PDF) in - important points of history. However, keeping a snapshot of these - (most probably large volume) outputs in the main history of the - project can unreasonably bloat it. It is thus recommended to make a - separate Git repo to keep those files and keep your project's source - as small as possible. For example if your project is called - my-exciting-project, the name of the outputs repository can be - my-exciting-project-output. This enables easy sharing of the output - files with your co-authors (with necessary permissions) and not - having to bloat your email archive with extra attachments also (you - can just share the link to the online repo in your - communications). After the research is published, you can also - release the outputs repository, or you can just delete it if it is - too large or un-necessary (it was just for convenience, and fully - reproducible after all). For example Maneage's output is available - for demonstration in a - separate repository.

Full Git history in one file: When you are publishing your project - (for example to Zenodo for long term preservation), it is more - convenient to have the whole project's Git history into one file to - save with your datasets. After all, you can't be sure that your - current Git server (for example GitLab, Github, or Bitbucket) will be - active forever. While they are good for the immediate future, you - can't rely on them for archival purposes. Fortunately keeping your - whole history in one file is easy with Git using the following - commands. To learn more about it, run git help bundle.

- -

"bundle" your project's history into one file (just don't forget to - change my-project-git.bundle to a descriptive name of your - project):

- -

Commit message: The commit message is a very important and useful + aspect of version control. To make the commit message useful for + others (or yourself, one year later), it is good to follow a + consistent style. Maneage already has a consistent formatting + (described below), which you can also follow in your project if you + like. You can see many examples by running git log in the maneage + branch. If you intend to push commits to Maneage, for the consistency + of Maneage, it is necessary to follow these guidelines. 1) No line + should be more than 75 characters (to enable easy reading of the + message when you run git log on the standard 80-character + terminal). 2) The first line is the title of the commit and should + summarize it (so git log --oneline can be useful). The title should + also not end with a point (., because its a short single sentence, + so a point is not necessary and only wastes space). 3) After the + title, leave an empty line and start the body of your message + (possibly containing many paragraphs). 4) Describe the context of + your commit (the problem it is trying to solve) as much as possible, + then go onto how you solved it. One suggestion is to start the main + body of your commit with "Until now ...", and continue describing the + problem in the first paragraph(s). Afterwards, start the next + paragraph with "With this commit ...".

Project outputs: During your research, it is possible to checkout a + specific commit and reproduce its results. However, the processing + can be time consuming. Therefore, it is useful to also keep track of + the final outputs of your project (at minimum, the paper's PDF) in + important points of history. However, keeping a snapshot of these + (most probably large volume) outputs in the main history of the + project can unreasonably bloat it. It is thus recommended to make a + separate Git repo to keep those files and keep your project's source + as small as possible. For example if your project is called + my-exciting-project, the name of the outputs repository can be + my-exciting-project-output. This enables easy sharing of the output + files with your co-authors (with necessary permissions) and not + having to bloat your email archive with extra attachments also (you + can just share the link to the online repo in your + communications). After the research is published, you can also + release the outputs repository, or you can just delete it if it is + too large or un-necessary (it was just for convenience, and fully + reproducible after all). For example Maneage's output is available + for demonstration in a + separate repository.

Full Git history in one file: When you are publishing your project + (for example to Zenodo for long term preservation), it is more + convenient to have the whole project's Git history into one file to + save with your datasets. After all, you can't be sure that your + current Git server (for example GitLab, Github, or Bitbucket) will be + active forever. While they are good for the immediate future, you + can't rely on them for archival purposes. Fortunately keeping your + whole history in one file is easy with Git using the following + commands. To learn more about it, run git help bundle.

+ +

"bundle" your project's history into one file (just don't forget to + change my-project-git.bundle to a descriptive name of your + project):

+ +


 git bundle create my-project-git.bundle --all
-

+ -

You can easily upload my-project-git.bundle anywhere. Later, if - you need to un-bundle it, you can use the following command.

You can easily upload my-project-git.bundle anywhere. Later, if + you need to un-bundle it, you can use the following command.


+                                                

 git clone my-project-git.bundle
-

-- cgit v1.2.1