From 6b87843fc38c1646615ab0342a703f7ab3caf1cb Mon Sep 17 00:00:00 2001 From: Mohammad Akhlaghi Date: Thu, 26 Nov 2020 03:45:54 +0000 Subject: The long about.hml is now broken up into smaller pages The "About" page ('about.html') was effectively a full copy of Maneage's 'README-hacking.md', so it was very long. To help in readability it has now been broken down into smaller pages (one for each section). Also the indentation of Make recipes was corrected, both in the about pages, and also in the tutorial. --- about-make.html | 221 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 221 insertions(+) create mode 100644 about-make.html (limited to 'about-make.html') diff --git a/about-make.html b/about-make.html new file mode 100644 index 0000000..4474075 --- /dev/null +++ b/about-make.html @@ -0,0 +1,221 @@ + + + + + + + + + Maneage -- Managing data lineage + + + + + + + + + + + + + + + + + + + +
+
+ + +
+ + + + + +
+

Next: Maneage architecture, Previous: Citation and published papers, Up: About

+ +

Why Make?

+ +

When batch processing is necessary (no manual intervention, as in a + reproducible project), shell scripts are usually the first solution that + come to mind. However, the inherent complexity and non-linearity of + progress in a scientific project (where experimentation is key) make it + hard to manage the script(s) as the project evolves. For example, a script + will start from the top/start every time it is run. So if you have already + completed 90% of a research project and want to run the remaining 10% that + you have newly added, you have to run the whole script from the start + again. Only then will you see the effects of the last new steps (to find + possible errors, or better solutions and etc).

+ +

It is possible to manually ignore/comment parts of a script to only do a + special part. However, such checks/comments will only add to the complexity + of the script and will discourage you to play-with/change an already + completed part of the project when an idea suddenly comes up. It is also + prone to very serious bugs in the end (when trying to reproduce from + scratch). Such bugs are very hard to notice during the work and frustrating + to find in the end.

+ +

The Make paradigm, on the other hand, starts from the end: the final + target. It builds a dependency tree internally, and finds where it should + start each time the project is run. Therefore, in the scenario above, a + researcher that has just added the final 10% of steps of her research to + her Makefile, will only have to run those extra steps. With Make, it is + also trivial to change the processing of any intermediate (already written) + rule (or step) in the middle of an already written analysis: the next + time Make is run, only rules that are affected by the changes/additions + will be re-run, not the whole analysis/project.

+ +

This greatly speeds up the processing (enabling creative changes), while + keeping all the dependencies clearly documented (as part of the Make + language), and most importantly, enabling full reproducibility from scratch + with no changes in the project code that was working during the + research. This will allow robust results and let the scientists get to what + they do best: experiment and be critical to the methods/analysis without + having to waste energy and time on technical problems that come up as a + result of that experimentation in scripts.

+ +

Since the dependencies are clearly demarcated in Make, it can identify + independent steps and run them in parallel. This further speeds up the + processing. Make was designed for this purpose. It is how huge projects + like all Unix-like operating systems (including GNU/Linux or Mac OS + operating systems) and their core components are built. Therefore, Make is + a highly mature paradigm/system with robust and highly efficient + implementations in various operating systems perfectly suited for a complex + non-linear research project.

+ +

Make is a small language with the aim of defining rules containing + targets, prerequisites and recipes. It comes with some nice features + like functions or automatic-variables to greatly facilitate the management + of text (filenames for example) or any of those constructs. For a more + detailed (yet still general) introduction see the article on Wikipedia:

+ + + +

Make is a +40 year old software that is still evolving, therefore many + implementations of Make exist. The only difference in them is some extra + features over the standard + definition + (which is shared in all of them). Maneage is primarily written in GNU Make + (which it installs itself, you don't have to have it on your system). GNU + Make is the most common, most actively developed, and most advanced + implementation. Just note that Maneage downloads, builds, internally + installs, and uses its own dependencies (including GNU Make), so you don't + have to have it installed before you try it out.

+ +

How can I learn Make?

+ +

The GNU Make book/manual (links below) is arguably the best place to learn + Make. It is an excellent and non-technical book to help get started (it is + only non-technical in its first few chapters to get you started easily). It + is freely available and always up to date with the current GNU Make + release. It also clearly explains which features are specific to GNU Make + and which are general in all implementations. So the first few chapters + regarding the generalities are useful for all implementations.

+ +

The first link below points to the GNU Make manual in various formats and + in the second, you can download it in PDF (which may be easier for a first + time reading).

+ + + +

If you use GNU Make, you also have the whole GNU Make manual on the + command-line with the following command (you can come out of the "Info" + environment by pressing q).

+ +
info make
+ +

If you aren't familiar with the Info documentation format, we strongly + recommend running $ info info and reading along. In less than an hour, + you will become highly proficient in it (it is very simple and has a great + manual for itself). Info greatly simplifies your access (without taking + your hands off the keyboard!) to many manuals that are installed on your + system, allowing you to be much more efficient as you work. If you use the + GNU Emacs text editor (or any of its variants), you also have access to all + Info manuals while you are writing your projects (again, without taking + your hands off the keyboard!).

+ +

Next: Maneage architecture, Previous: Citation and published papers, Up: About

+ + + + + + +
+ -- cgit v1.2.1