Next: Future improvements, Previous: Customization checklist, Up: About
The following is a list of design points, tips, or recommendations that have been learned after some experience with this type of project management. Please don't hesitate to share any experience you gain after using it with us. In this way, we can add it here (with full giving credit) for the benefit of others.
Modularity: Modularity is the key to easy and clean growth of a project. So it is always best to break up a job into as many sub-components as reasonable. Here are some tips to stay modular.
Short recipes: if you see the recipe of a rule becoming more than a handful of lines which involve significant processing, it is probably a good sign that you should break up the rule into its main components. Try to only have one major processing step per rule.
Context-based (many) Makefiles: For maximum modularity, this design
                                allows easy inclusion of many Makefiles: in
                                reproduce/analysis/make/*.mk for analysis steps, and
                                reproduce/software/make/*.mk for building software. So keep the
                                rules for closely related parts of the processing in separate
                                Makefiles.
Descriptive names: Be very clear and descriptive with the naming of the files and the variables because a few months after the processing, it will be very hard to remember what each one was for. Also this helps others (your collaborators or other people reading the project source after it is published) to more easily understand your work and find their way around.
Naming convention: As the project grows, following a single standard
                                or convention in naming the files is very useful. Try best to use
                                multiple word filenames for anything that is non-trivial (separating
                                the words with a -). For example if you have a Makefile for
                                creating a catalog and another two for processing it under models A
                                and B, you can name them like this: catalog-create.mk,
                                catalog-model-a.mk and catalog-model-b.mk. In this way, when
                                listing the contents of reproduce/analysis/make to see all the
                                Makefiles, those related to the catalog will all be close to each
                                other and thus easily found. This also helps in auto-completions by
                                the shell or text editors like Emacs.
Source directories: If you need to add files in other languages for
                                example in shell, Python, AWK or C, keep the files in the same
                                language in a separate directory under reproduce/analysis, with the
                                appropriate name.
Configuration files: If your research uses special programs as part
                                of the processing, put all their configuration files in a devoted
                                directory (with the program's name) within
                                reproduce/software/config. Similar to the
                                reproduce/software/config/gnuastro directory (which is put in
                                Maneage as a demo in case you use GNU Astronomy Utilities). It is
                                much cleaner and readable (thus less buggy) to avoid mixing the
                                configuration files, even if there is no technical necessity.
Contents: It is good practice to follow the following recommendations on the contents of your files, whether they are source code for a program, Makefiles, scripts or configuration files (copyrights aren't necessary for the latter).
Copyright: Always start a file containing programming constructs
                                    with a copyright statement like the ones that Maneage starts with
                                    (for example in the top level Makefile).
Comments: Comments are vital for readability (by yourself in two months, or others). Describe everything you can about why you are doing something, how you are doing it, and what you expect the result to be. Write the comments as if it was what you would say to describe the variable, recipe or rule to a friend sitting beside you. When writing the project it is very tempting to just steam ahead with commands and codes, but be patient and write comments before the rules or recipes. This will also allow you to think more about what you should be doing. Also, in several months when you come back to the code, you will appreciate the effort of writing them. Just don't forget to also read and update the comment first if you later want to make changes to the code (variable, recipe or rule). As a general rule of thumb: first the comments, then the code.
File title: In general, it is good practice to start all files with a single line description of what that particular file does. If further information about the totality of the file is necessary, add it after a blank line. This will help a fast inspection where you don't care about the details, but just want to remember/see what that file is (generally) for. This information must of course be commented (its for a human), but this is kept separate from the general recommendation on comments, because this is a comment for the whole file, not each step within it.
Make programming: Here are some experiences that we have come to learn over the years in using Make and are useful/handy in research contexts.
Environment of each recipe: If you need to define a special
                                        environment (or aliases, or scripts to run) for all the recipes in
                                        your Makefiles, you can use a Bash startup file
                                        reproduce/software/shell/bashrc.sh. This file is loaded before every
                                        Make recipe is run, just like the .bashrc in your home directory is
                                        loaded every time you start a new interactive, non-login terminal. See
                                        the comments in that file for more.
Automatic variables: These are wonderful and very useful Make
                                        constructs that greatly shrink the text, while helping in
                                        read-ability, robustness (less bugs in typos for example) and
                                        generalization. For example even when a rule only has one target or
                                        one prerequisite, always use $@ instead of the target's name, $<
                                        instead of the first prerequisite, $^ instead of the full list of
                                        prerequisites and etc. You can see the full list of automatic
                                        variables
                                        here. If
                                        you use GNU Make, you can also see this page on your command-line:
info make "automatic variables"Debug: Since Make doesn't follow the common top-down paradigm, it
                                        can be a little hard to get accustomed to why you get an error or
                                        un-expected behavior. In such cases, run Make with the -d
                                        option. With this option, Make prints a full list of exactly which
                                        prerequisites are being checked for which targets. Looking
                                        (patiently) through this output and searching for the faulty
                                        file/step will clearly show you any mistake you might have made in
                                        defining the targets or prerequisites.
Large files: If you are dealing with very large files (thus having
                                        multiple copies of them for intermediate steps is not possible), one
                                        solution is the following strategy (Also see the next item on "Fast
                                        access to temporary files"). Set a small plain text file as the
                                        actual target and delete the large file when it is no longer needed
                                        by the project (in the last rule that needs it). Below is a simple
                                        demonstration of doing this. In it, we use Gnuastro's Arithmetic
                                        program to add all pixels of the input image with 2 and create
                                        large1.fits. We then subtract 2 from large1.fits to create
                                        large2.fits and delete large1.fits in the same rule (when its no
                                        longer needed). We can later do the same with large2.fits when it
                                        is no longer needed and so on.
large1.fits.txt: input.fits
        astarithmetic $< 2 + --output=$(subst .txt,,$@)
        echo "done" > $@
large2.fits.txt: large1.fits.txt
        astarithmetic $(subst .txt,,$<) 2 - --output=$(subst .txt,,$@)
        rm $(subst .txt,,$<)
        echo "done" > $@reproduce/analysis/make/initialize.mk. This
                                        wrapper will replace $(subst .txt,,XXXXX). Therefore, it will be
                                        possible to greatly simplify this repetitive statement and make the
                                        code even more readable throughout the whole project.Fast access to temporary files: Most Unix-like operating systems
                                        will give you a special shared-memory device (directory): on systems
                                        using the GNU C Library (all GNU/Linux system), it is /dev/shm. The
                                        contents of this directory are actually in your RAM, not in your
                                        persistence storage like the HDD or SSD. Reading and writing from/to
                                        the RAM is much faster than persistent storage, so if you have enough
                                        RAM available, it can be very beneficial for large temporary files to
                                        be put there. You can use the mktemp program to give the temporary
                                        files a randomly-set name, and use text files as targets to keep that
                                        name (as described in the item above under "Large files") for later
                                        deletion. For example, see the minimal working example Makefile below
                                        (which you can actually put in a Makefile and run if you have an
                                        input.fits in the same directory, and Gnuastro is installed).
.ONESHELL:
.SHELLFLAGS = -ec
all: mean-std.txt
shm-maneage := /dev/shm/$(shell whoami)-maneage-XXXXXXXXXX
large1.txt: input.fits
        out=$$(mktemp $(shm-maneage))
        astarithmetic $< 2 + --output=$$out.fits
        echo "$$out" > $@
large2.txt: large1.txt
        input=$$(cat $<)
        out=$$(mktemp $(shm-maneage))
        astarithmetic $$input.fits 2 - --output=$$out.fits
        rm $$input.fits $$input
        echo "$$out" > $@
mean-std.txt: large2.txt
        input=$$(cat $<)
        aststatistics $$input.fits --mean --std > $@
        rm $$input.fits $$inputshm-maneage) has no suffix. So you can add the suffix
                                        corresponding to your desired format afterwards (for example
                                        $$out.fits, or $$out.txt). But more importantly, when mktemp
                                        sets the random name, it also checks if no file exists with that name
                                        and creates a file with that exact name at that moment. So at the end
                                        of each recipe above, you'll have two files in your /dev/shm, one
                                        empty file with no suffix one with a suffix. The role of the file
                                        without a suffix is just to ensure that the randomly set name will
                                        not be used by other calls to mktemp (when running in parallel) and
                                        it should be deleted with the file containing a suffix. This is the
                                        reason behind the rm $$input.fits $$input command above: to make
                                        sure that first the file with a suffix is deleted, then the core
                                        random file (note that when working in parallel on powerful systems,
                                        in the time between deleting two files of a single rm command, many
                                        things can happen!). When using Maneage, you can put the definition
                                        of shm-maneage in reproduce/analysis/make/initialize.mk to be
                                        usable in all the different Makefiles of your analysis, and you won't
                                        need the three lines above it. Finally, BE RESPONSIBLE: after you
                                        are finished, be sure to clean up any possibly remaining files (due
                                        to crashes in the processing while you are working), otherwise your
                                        RAM may fill up very fast. You can do it easily with a command like
                                        this on your command-line: rm -f /dev/shm/$(whoami)-*.Software tarballs and raw inputs: It is critically important to document the raw inputs to your project (software tarballs and raw input data):
Keep the source tarball of dependencies: After configuration
                                            finishes, the .build/software/tarballs directory will contain all
                                            the software tarballs that were necessary for your project. You can
                                            mirror the contents of this directory to keep a backup of all the
                                            software tarballs used in your project (possibly as another version
                                            controlled repository) that is also published with your project. Note
                                            that software web-pages are not written in stone and can suddenly go
                                            offline or not be accessible in some conditions. This backup is thus
                                            very important. If you intend to release your project in a place like
                                            Zenodo, you can upload/keep all the necessary tarballs (and data)
                                            there with your
                                            project. zenodo.1163746 is
                                            one example of how the data, Gnuastro (main software used) and all
                                            major Gnuastro's dependencies have been uploaded with the project's
                                            source. Just note that this is only possible for free and open-source
                                            software.
Keep your input data: The input data is also critical to the project's reproducibility, so like the above for software, make sure you have a backup of them, or their persistent identifiers (PIDs).
Version control: Version control is a critical component of Maneage. Here are some tips to help in effectively using it.
Regular commits: It is important (and extremely useful) to have the history of your project under version control. So try to make commits regularly (after any meaningful change/step/result).
Keep Maneage up-to-date: In time, Maneage is going to become more and more mature and robust (thanks to your feedback and the feedback of other users). Bugs will be fixed and new/improved features will be added. So every once and a while, you can run the commands below to pull new work that is done in Maneage. If the changes are useful for your work, you can merge them with your project to benefit from them. Just pay very close attention to resolving possible conflicts which might happen in the merge (updated settings that you have customized in Maneage).
git checkout maneage
git pull                            # Get recent work in Maneage
git log XXXXXX..XXXXXX --reverse    # Inspect new work (replace XXXXXXs with hashs mentioned in output of previous command).
git log --oneline --graph --decorate --all # General view of branches.
git checkout master                 # Go to your top working branch.
git merge maneage                   # Import all the work into master.Adding Maneage to a fork of your project: As you and your colleagues
                                                continue your project, it will be necessary to have separate
                                                forks/clones of it. But when you clone your own project on a
                                                different system, or a colleague clones it to collaborate with you,
                                                the clone won't have the origin-maneage remote that you started the
                                                project with. As shown in the previous item above, you need this
                                                remote to be able to pull recent updates from Maneage. The steps
                                                below will setup the origin-maneage remote, and a local maneage
                                                branch to track it, on the new clone.
git remote add origin-maneage https://git.maneage.org/project.git
git fetch origin-maneage
git checkout -b maneage --track origin-maneage/maneageCommit message: The commit message is a very important and useful
                                                aspect of version control. To make the commit message useful for
                                                others (or yourself, one year later), it is good to follow a
                                                consistent style. Maneage already has a consistent formatting
                                                (described below), which you can also follow in your project if you
                                                like. You can see many examples by running git log in the maneage
                                                branch. If you intend to push commits to Maneage, for the consistency
                                                of Maneage, it is necessary to follow these guidelines. 1) No line
                                                should be more than 75 characters (to enable easy reading of the
                                                message when you run git log on the standard 80-character
                                                terminal). 2) The first line is the title of the commit and should
                                                summarize it (so git log --oneline can be useful). The title should
                                                also not end with a point (., because its a short single sentence,
                                                so a point is not necessary and only wastes space). 3) After the
                                                title, leave an empty line and start the body of your message
                                                (possibly containing many paragraphs). 4) Describe the context of
                                                your commit (the problem it is trying to solve) as much as possible,
                                                then go onto how you solved it. One suggestion is to start the main
                                                body of your commit with "Until now ...", and continue describing the
                                                problem in the first paragraph(s). Afterwards, start the next
                                                paragraph with "With this commit ...".
Project outputs: During your research, it is possible to checkout a
                                                specific commit and reproduce its results. However, the processing
                                                can be time consuming. Therefore, it is useful to also keep track of
                                                the final outputs of your project (at minimum, the paper's PDF) in
                                                important points of history.  However, keeping a snapshot of these
                                                (most probably large volume) outputs in the main history of the
                                                project can unreasonably bloat it. It is thus recommended to make a
                                                separate Git repo to keep those files and keep your project's source
                                                as small as possible. For example if your project is called
                                                my-exciting-project, the name of the outputs repository can be
                                                my-exciting-project-output. This enables easy sharing of the output
                                                files with your co-authors (with necessary permissions) and not
                                                having to bloat your email archive with extra attachments also (you
                                                can just share the link to the online repo in your
                                                communications). After the research is published, you can also
                                                release the outputs repository, or you can just delete it if it is
                                                too large or un-necessary (it was just for convenience, and fully
                                                reproducible after all). For example Maneage's output is available
                                                for demonstration in a
                                                    separate repository.
Full Git history in one file: When you are publishing your project
                                                (for example to Zenodo for long term preservation), it is more
                                                convenient to have the whole project's Git history into one file to
                                                save with your datasets. After all, you can't be sure that your
                                                current Git server (for example GitLab, Github, or Bitbucket) will be
                                                active forever. While they are good for the immediate future, you
                                                can't rely on them for archival purposes. Fortunately keeping your
                                                whole history in one file is easy with Git using the following
                                                commands. To learn more about it, run git help bundle.
my-project-git.bundle to a descriptive name of your
                                                        project):git bundle create my-project-git.bundle --allmy-project-git.bundle anywhere. Later, if
                                                        you need to un-bundle it, you can use the following command.git clone my-project-git.bundleNext: Future improvements, Previous: Customization checklist, Up: About