From 3fd41afb4bf67d0b2b2aae76b133d97d024ddcbe Mon Sep 17 00:00:00 2001 From: Pedram Ashofteh Ardakani Date: Wed, 29 Apr 2020 15:27:13 +0430 Subject: about: Fix code blocks Maybe we should create a side navigation toolbar. --- about.html | 2508 +++++++++++++++++++++++++++++++----------------------------- 1 file changed, 1278 insertions(+), 1230 deletions(-) diff --git a/about.html b/about.html index fbfc83c..8a4a069 100644 --- a/about.html +++ b/about.html @@ -1,1240 +1,1288 @@ -

Maneage: managing data lineage

- -

Copyright (C) 2018-2020 Mohammad Akhlaghi mohammad@akhlaghi.org\ -Copyright (C) 2020 Raul Infante-Sainz infantesainz@gmail.com\ -See the end of the file for license conditions.

- -

Maneage is a fully working template for doing reproducible research (or -writing a reproducible paper) as defined in the link below. If the link -below is not accessible at the time of reading, please see the appendix at -the end of this file for a portion of its introduction. Some -slides are also available -to help demonstrate the concept implemented here.

- -

http://akhlaghi.org/reproducible-science.html

- -

Maneage is created with the aim of supporting reproducible research by -making it easy to start a project in this framework. As shown below, it is -very easy to customize Maneage for any particular (research) project and -expand it as it starts and evolves. It can be run with no modification (as -described in README.md) as a demonstration and customized for use in any -project as fully described below.

- -

A project designed using Maneage will download and build all the necessary -libraries and programs for working in a closed environment (highly -independent of the host operating system) with fixed versions of the -necessary dependencies. The tarballs for building the local environment are -also collected in a separate -repository. The final -output of the project is a -paper. Notice the -last paragraph of the Acknowledgments where all the necessary software are -mentioned with their versions.

- -

Below, we start with a discussion of why Make was chosen as the high-level -language/framework for project management and how to learn and master Make -easily (and freely). The general architecture and design of the project is -then discussed to help you navigate the files and their contents. This is -followed by a checklist for the easy/fast customization of Maneage to your -exciting research. We continue with some tips and guidelines on how to -manage or extend your project as it grows based on our experiences with it -so far. The main body concludes with a description of possible future -improvements that are planned for Maneage (but not yet implemented). As -discussed above, we end with a short introduction on the necessity of -reproducible science in the appendix.

- -

Please don't forget to share your thoughts, suggestions and -criticisms. Maintaining and designing Maneage is itself a separate project, -so please join us if you are interested. Once it is mature enough, we will -describe it in a paper (written by all contributors) for a formal -introduction to the community.

- -

Why Make?

- -

When batch processing is necessary (no manual intervention, as in a -reproducible project), shell scripts are usually the first solution that -come to mind. However, the inherent complexity and non-linearity of -progress in a scientific project (where experimentation is key) make it -hard to manage the script(s) as the project evolves. For example, a script -will start from the top/start every time it is run. So if you have already -completed 90% of a research project and want to run the remaining 10% that -you have newly added, you have to run the whole script from the start -again. Only then will you see the effects of the last new steps (to find -possible errors, or better solutions and etc).

- -

It is possible to manually ignore/comment parts of a script to only do a -special part. However, such checks/comments will only add to the complexity -of the script and will discourage you to play-with/change an already -completed part of the project when an idea suddenly comes up. It is also -prone to very serious bugs in the end (when trying to reproduce from -scratch). Such bugs are very hard to notice during the work and frustrating -to find in the end.

- -

The Make paradigm, on the other hand, starts from the end: the final -target. It builds a dependency tree internally, and finds where it should -start each time the project is run. Therefore, in the scenario above, a -researcher that has just added the final 10% of steps of her research to -her Makefile, will only have to run those extra steps. With Make, it is -also trivial to change the processing of any intermediate (already written) -rule (or step) in the middle of an already written analysis: the next -time Make is run, only rules that are affected by the changes/additions -will be re-run, not the whole analysis/project.

- -

This greatly speeds up the processing (enabling creative changes), while -keeping all the dependencies clearly documented (as part of the Make -language), and most importantly, enabling full reproducibility from scratch -with no changes in the project code that was working during the -research. This will allow robust results and let the scientists get to what -they do best: experiment and be critical to the methods/analysis without -having to waste energy and time on technical problems that come up as a -result of that experimentation in scripts.

- -

Since the dependencies are clearly demarcated in Make, it can identify -independent steps and run them in parallel. This further speeds up the -processing. Make was designed for this purpose. It is how huge projects -like all Unix-like operating systems (including GNU/Linux or Mac OS -operating systems) and their core components are built. Therefore, Make is -a highly mature paradigm/system with robust and highly efficient -implementations in various operating systems perfectly suited for a complex -non-linear research project.

- -

Make is a small language with the aim of defining rules containing -targets, prerequisites and recipes. It comes with some nice features -like functions or automatic-variables to greatly facilitate the management -of text (filenames for example) or any of those constructs. For a more -detailed (yet still general) introduction see the article on Wikipedia:

- -

https://en.wikipedia.org/wiki/Make_(software)

- -

Make is a +40 year old software that is still evolving, therefore many -implementations of Make exist. The only difference in them is some extra -features over the standard -definition -(which is shared in all of them). Maneage is primarily written in GNU Make -(which it installs itself, you don't have to have it on your system). GNU -Make is the most common, most actively developed, and most advanced -implementation. Just note that Maneage downloads, builds, internally -installs, and uses its own dependencies (including GNU Make), so you don't -have to have it installed before you try it out.

- -

How can I learn Make?

- -

The GNU Make book/manual (links below) is arguably the best place to learn -Make. It is an excellent and non-technical book to help get started (it is -only non-technical in its first few chapters to get you started easily). It -is freely available and always up to date with the current GNU Make -release. It also clearly explains which features are specific to GNU Make -and which are general in all implementations. So the first few chapters -regarding the generalities are useful for all implementations.

- -

The first link below points to the GNU Make manual in various formats and -in the second, you can download it in PDF (which may be easier for a first -time reading).

- -

https://www.gnu.org/software/make/manual/

- -

https://www.gnu.org/software/make/manual/make.pdf

- -

If you use GNU Make, you also have the whole GNU Make manual on the -command-line with the following command (you can come out of the "Info" -environment by pressing q).

- -

shell - $ info make -

- -

If you aren't familiar with the Info documentation format, we strongly -recommend running $ info info and reading along. In less than an hour, -you will become highly proficient in it (it is very simple and has a great -manual for itself). Info greatly simplifies your access (without taking -your hands off the keyboard!) to many manuals that are installed on your -system, allowing you to be much more efficient as you work. If you use the -GNU Emacs text editor (or any of its variants), you also have access to all -Info manuals while you are writing your projects (again, without taking -your hands off the keyboard!).

- -

Published works using Maneage

- -

The list below shows some of the works that have already been published -with (earlier versions of) Maneage. Previously it was simply called -"Reproducible paper template". Note that Maneage is evolving, so some -details may be different in them. The more recent ones can be used as a -good working example.

- - - -

Citation

- -

A paper to fully describe Maneage has been submitted. Until then, if you -used it in your work, please cite the paper that implemented its first -version: Akhlaghi & Ichikawa -(2015, ApJS, 220, 1).

- -

Also, when your paper is published, don't forget to add a notice in your -own paper (in coordination with the publishing editor) that the paper is -fully reproducible and possibly add a sentence or paragraph in the end of -the paper shortly describing the concept. This will help spread the word -and encourage other scientists to also manage and publish their projects in -a reproducible manner.

- -

Project architecture

- -

In order to customize Maneage to your research, it is important to first -understand its architecture so you can navigate your way in the directories -and understand how to implement your research project within its framework: -where to add new files and which existing files to modify for what -purpose. But if this the first time you are using Maneage, before reading -this theoretical discussion, please run Maneage once from scratch without -any changes (described in README.md). You will see how it works (note that -the configure step builds all necessary software, so it can take long, but -you can continue reading while its working).

- -

The project has two top-level directories: reproduce and -tex. reproduce hosts all the software building and analysis -steps. tex contains all the final paper's components to be compiled into -a PDF using LaTeX.

- -

The reproduce directory has two sub-directories: software and -analysis. As the name says, the former contains all the instructions to -download, build and install (independent of the host operating system) the -necessary software (these are called by the ./project configure -command). The latter contains instructions on how to use those software to -do your project's analysis.

- -

After it finishes, ./project configure will create the following symbolic -links in the project's top source directory: .build which points to the -top build directory and .local for easy access to the custom built -software installation directory. With these you can easily access the build -directory and project-specific software from your top source directory. For -example if you run .local/bin/ls you will be using the ls of Maneage, -which is probably different from your system's ls (run them both with ---version to check).

- -

Once the project is configured for your system, ./project make will do -the basic preparations and run the project's analysis with the custom -version of software. The project script is just a wrapper, and with the -make argument, it will first call top-prepare.mk and top-make.mk -(both are in the reproduce/analysis/make directory).

- -

In terms of organization, top-prepare.mk and top-make.mk have an -identical design, only minor differences. So, let's continue Maneage's -architecture with top-make.mk. Once you understand that, you'll clearly -understand top-prepare.mk also. These very high-level files are -relatively short and heavily commented so hopefully the descriptions in -each comment will be enough to understand the general details. As you read -this section, please also look at the contents of the mentioned files and -directories to fully understand what is going on.

- -

Before starting to look into the top top-make.mk, it is important to -recall that Make defines dependencies by files. Therefore, the -input/prerequisite and output of every step/rule must be a file. Also -recall that Make will use the modification date of the prerequisite(s) and -target files to see if the target must be re-built or not. Therefore during -the processing, many intermediate files will be created (see the tips -section below on a good strategy to deal with large/huge files).

- -

To keep the source and (intermediate) built files separate, the user must -define a top-level build directory variable (or $(BDIR)) to host all the -intermediate files (you defined it during ./project configure). This -directory doesn't need to be version controlled or even synchronized, or -backed-up in other servers: its contents are all products, and can be -easily re-created any time. As you define targets for your new rules, it is -thus important to place them all under sub-directories of $(BDIR). As -mentioned above, you always have fast access to this "build"-directory with -the .build symbolic link. Also, beware to never make any manual change -in the files of the build-directory, just delete them (so they are -re-built).

- -

In this architecture, we have two types of Makefiles that are loaded into -the top Makefile: configuration-Makefiles (only independent -variables/configurations) and workhorse-Makefiles (Makefiles that -actually contain analysis/processing rules).

- -

The configuration-Makefiles are those that satisfy these two wildcards: -reproduce/software/config/*.conf (for building the necessary software -when you run ./project configure) and reproduce/analysis/config/*.conf -(for the high-level analysis, when you run ./project make). These -Makefiles don't actually have any rules, they just have values for various -free parameters throughout the configuration or analysis. Open a few of -them to see for yourself. These Makefiles must only contain raw Make -variables (project configurations). By "raw" we mean that the Make -variables in these files must not depend on variables in any other -configuration-Makefile. This is because we don't want to assume any order -in reading them. It is also very important to not define any rule, or -other Make construct, in these configuration-Makefiles.

- -

Following this rule-of-thumb enables you to set these configure-Makefiles -as a prerequisite to any target that depends on their variable -values. Therefore, if you change any of their values, all targets that -depend on those values will be re-built. This is very convenient as your -project scales up and gets more complex.

- -

The workhorse-Makefiles are those satisfying this wildcard -reproduce/software/make/*.mk and reproduce/analysis/make/*.mk. They -contain the details of the processing steps (Makefiles containing -rules). Therefore, in this phase order is important, because the -prerequisites of most rules will be the targets of other rules that will be -defined prior to them (not a fixed name like paper.pdf). The lower-level -rules must be imported into Make before the higher-level ones.

- -

All processing steps are assumed to ultimately (usually after many rules) -end up in some number, image, figure, or table that will be included in the -paper. The writing of these results into the final report/paper is managed -through separate LaTeX files that only contain macros (a name given to a -number/string to be used in the LaTeX source, which will be replaced when -compiling it to the final PDF). So the last target in a workhorse-Makefile -is a .tex file (with the same base-name as the Makefile, but in -$(BDIR)/tex/macros). As a result, if the targets in a workhorse-Makefile -aren't directly a prerequisite of other workhorse-Makefile targets, they -can be a prerequisite of that intermediate LaTeX macro file and thus be -called when necessary. Otherwise, they will be ignored by Make.

- -

Maneage also has a mode to share the build directory between several -users of a Unix group (when working on large computer clusters). In this -scenario, each user can have their own cloned project source, but share the -large built files between each other. To do this, it is necessary for all -built files to give full permission to group members while not allowing any -other users access to the contents. Therefore the ./project configure and -./project make steps must be called with special conditions which are -managed in the --group option.

- -

Let's see how this design is implemented. Please open and inspect -top-make.mk it as we go along here. The first step (un-commented line) is -to import the local configuration (your answers to the questions of -./project configure). They are defined in the configuration-Makefile -reproduce/software/config/LOCAL.conf which was also built by ./project -configure (based on the LOCAL.conf.in template of the same directory).

- -

The next non-commented set of the top Makefile defines the ultimate -target of the whole project (paper.pdf). But to avoid mistakes, a sanity -check is necessary to see if Make is being run with the same group settings -as the configure script (for example when the project is configured for -group access using the ./for-group script, but Make isn't). Therefore we -use a Make conditional to define the all target based on the group -permissions.

- -

Having defined the top/ultimate target, our next step is to include all the -other necessary Makefiles. However, order matters in the importing of -workhorse-Makefiles and each must also have a TeX macro file with the same -base name (without a suffix). Therefore, the next step in the top-level -Makefile is to define the makesrc variable to keep the base names -(without a .mk suffix) of the workhorse-Makefiles that must be imported, -in the proper order.

- -

Finally, we import all the necessary remaining Makefiles: 1) All the -analysis configuration-Makefiles with a wildcard. 2) The software -configuration-Makefile that contains their version (just in case its -necessary). 3) All workhorse-Makefiles in the proper order using a Make -foreach loop.

- -

In short, to keep things modular, readable and manageable, follow these -recommendations: 1) Set clear-to-understand names for the -configuration-Makefiles, and workhorse-Makefiles, 2) Only import other -Makefiles from top Makefile. These will let you know/remember generally -which step you are taking before or after another. Projects will scale up -very fast. Thus if you don't start and continue with a clean and robust -convention like this, in the end it will become very dirty and hard to -manage/understand (even for yourself). As a general rule of thumb, break -your rules into as many logically-similar but independent steps as -possible.

- -

The reproduce/analysis/make/paper.mk Makefile must be the final Makefile -that is included. This workhorse Makefile ends with the rule to build -paper.pdf (final target of the whole project). If you look in it, you -will notice that this Makefile starts with a rule to create -$(mtexdir)/project.tex (mtexdir is just a shorthand name for -$(BDIR)/tex/macros mentioned before). As you see, the only dependency of -$(mtexdir)/project.tex is $(mtexdir)/verify.tex (which is the last -analysis step: it verifies all the generated results). Therefore, -$(mtexdir)/project.tex is the connection between the -processing/analysis steps of the project, and the steps to build the final -PDF.

- -

During the research, it often happens that you want to test a step that is -not a prerequisite of any higher-level operation. In such cases, you can -(temporarily) define that processing as a rule in the most relevant -workhorse-Makefile and set its target as a prerequisite of its TeX -macro. If your test gives a promising result and you want to include it in -your research, set it as prerequisites to other rules and remove it from -the list of prerequisites for TeX macro file. In fact, this is how a -project is designed to grow in this framework.

- -

File modification dates (meta data)

- -

While Git does an excellent job at keeping a history of the contents of -files, it makes no effort in keeping the file meta data, and in particular -the dates of files. Therefore when you checkout to a different branch, -files that are re-written by Git will have a newer date than the other -project files. However, file dates are important in the current design of -Maneage: Make checks the dates of the prerequisite files and target files -to see if the target should be re-built.

- -

To fix this problem, for Maneage we use a forked version of -Metastore. Metastore use -a binary database file (which is called .file-metadata) to keep the -modification dates of all the files under version control. This file is -also under version control, but is hidden (because it shouldn't be modified -by hand). During the project's configuration, Maneage installs to Git hooks -to run Metastore 1) before making a commit to update its database with the -file dates in a branch, and 2) after doing a checkout, to reset the -file-dates after the checkout is complete and re-set the file dates back to -what they were.

- -

In practice, Metastore should work almost fully invisibly within your -project. The only place you might notice its presence is that you'll see -.file-metadata in the list of modified/staged files (commonly after -merging your branches). Since its a binary file, Git also won't show you -the changed contents. In a merge, you can simply accept any changes with -git add -u. But if Git is telling you that it has changed without a merge -(for example if you started a commit, but canceled it in the middle), you -can just do git checkout .file-metadata and set it back to its original -state.

- -

Summary

- -

Based on the explanation above, some major design points you should have in -mind are listed below.

- - - -

Customization checklist

- -

Take the following steps to fully customize Maneage for your research -project. After finishing the list, be sure to run ./project configure and -project make to see if everything works correctly. If you notice anything -missing or any in-correct part (probably a change that has not been -explained here), please let us know to correct it.

- -

As described above, the concept of reproducibility (during a project) -heavily relies on version -control. Currently Maneage -uses Git as its main version control system. If you are not already -familiar with Git, please read the first three chapters of the ProGit -book which provides a wonderful practical -understanding of the basics. You can read later chapters as you get more -advanced in later stages of your work.

- -

First custom commit

- -
    -
  1. Get this repository and its history (if you don't already have it): - Arguably the easiest way to start is to clone Maneage and prepare for - your customizations as shown below. After the cloning first you rename - the default origin remote server to specify that this is Maneage's - remote server. This will allow you to use the conventional origin - name for your own project as shown in the next steps. Second, you will - create and go into the conventional master branch to start - committing in your project later.

    - -

    shell - $ git clone https://git.maneage.org/project.git # Clone/copy the project and its history. - $ mv project my-project # Change the name to your project's name. - $ cd my-project # Go into the cloned directory. - $ git remote rename origin origin-maneage # Rename current/only remote to "origin-maneage". - $ git checkout -b master # Create and enter your own "master" branch. - $ pwd # Just to confirm where you are. -

  2. -
  3. Prepare to build project: The ./project configure command of the - next step will build the different software packages within the - "build" directory (that you will specify). Nothing else on your system - will be touched. However, since it takes long, it is useful to see - what it is being built at every instant (its almost impossible to tell - from the torrent of commands that are produced!). So open another - terminal on your desktop and navigate to the same project directory - that you cloned (output of last command above). Then run the following - command. Once every second, this command will just print the date - (possibly followed by a non-existent directory notice). But as soon as - the next step starts building software, you'll see the names of - software get printed as they are being built. Once any software is - installed in the project build directory it will be removed. Again, - don't worry, nothing will be installed outside the build directory.

    - -

    shell - # On another terminal (go to top project source directory, last command above) - $ ./project --check-config -

  4. -
  5. Test Maneage: Before making any changes, it is important to test it - and see if everything works properly with the commands below. If there - is any problem in the ./project configure or ./project make steps, - please contact us to fix the problem before continuing. Since the - building of dependencies in configuration can take long, you can take - the next few steps (editing the files) while its working (they don't - affect the configuration). After ./project make is finished, open - paper.pdf. If it looks fine, you are ready to start customizing the - Maneage for your project. But before that, clean all the extra Maneage - outputs with make clean as shown below.

    - -

    ```shell - $ ./project configure # Build the project's software environment (can take an hour or so). - $ ./project make # Do the processing and build paper (just a simple demo).

    - -

    # Open 'paper.pdf' and see if everything is ok. - ```

  6. -
  7. Setup the remote: You can use any hosting - facility - that supports Git to keep an online copy of your project's version - controlled history. We recommend GitLab because - it is more ethical (although not - perfect), - and later you can also host GitLab on your own server. Anyway, create - an account in your favorite hosting facility (if you don't already - have one), and define a new project there. Please make sure the newly - created project is empty (some services ask to include a README in - a new project which is bad in this scenario, and will not allow you to - push to it). It will give you a URL (usually starting with git@ and - ending in .git), put this URL in place of XXXXXXXXXX in the first - command below. With the second command, "push" your master branch to - your origin remote, and (with the --set-upstream option) set them - to track/follow each other. However, the maneage branch is currently - tracking/following your origin-maneage remote (automatically set - when you cloned Maneage). So when pushing the maneage branch to your - origin remote, you shouldn't use --set-upstream. With the last - command, you can actually check this (which local and remote branches - are tracking each other).

    - -

    shell - git remote add origin XXXXXXXXXX # Newly created repo is now called 'origin'. - git push --set-upstream origin master # Push 'master' branch to 'origin' (with tracking). - git push origin maneage # Push 'maneage' branch to 'origin' (no tracking). -

  8. -
  9. Title, short description and author: The title and basic - information of your project's output PDF paper should be added in - paper.tex. You should see the relevant place in the preamble (prior - to \begin{document}. After you are done, run the ./project make - command again to see your changes in the final PDF, and make sure that - your changes don't cause a crash in LaTeX. Of course, if you use a - different LaTeX package/style for managing the title and authors (in - particular a specific journal's style), please feel free to use it - your own methods after finishing this checklist and doing your first - commit.

  10. -
  11. Delete dummy parts: Maneage contains some parts that are only for - the initial/test run, mainly as a demonstration of important steps, - which you can use as a reference to use in your own project. But they - not for any real analysis, so you should remove these parts as - described below:

    - -
      -
    • paper.tex: 1) Delete the text of the abstract (from -\includeabstract{ to \vspace{0.25cm}) and write your own (a -single sentence can be enough now, you can complete it later). 2) -Add some keywords under it in the keywords part. 3) Delete -everything between %% Start of main body. and %% End of main -body.. 4) Remove the notice in the "Acknowledgments" section (in -\new{}) and Acknowledge your funding sources (this can also be -done later). Just don't delete the existing acknowledgment -statement: Maneage is possible thanks to funding from several -grants. Since Maneage is being used in your work, it is necessary to -acknowledge them in your work also.

    • -
    • reproduce/analysis/make/top-make.mk: Delete the delete-me line -in the makesrc definition. Just make sure there is no empty line -between the download \ and verify \ lines (they should be -directly under each other).

    • -
    • reproduce/analysis/make/verify.mk: In the final recipe, under the -commented line Verify TeX macros, remove the full line that -contains delete-me, and set the value of s in the line for -download to XXXXX (any temporary string, you'll fix it in the -end of your project, when its complete).

    • -
    • Delete all delete-me* files in the following directories:

      - -

      shell -$ rm tex/src/delete-me* -$ rm reproduce/analysis/make/delete-me* -$ rm reproduce/analysis/config/delete-me* -

    • -
    • Disable verification of outputs by removing the yes from -reproduce/analysis/config/verify-outputs.conf. Later, when you are -ready to submit your paper, or publish the dataset, activate -verification and make the proper corrections in this file (described -under the "Other basic customizations" section below). This is a -critical step and only takes a few minutes when your project is -finished. So DON'T FORGET to activate it in the end.

    • -
    • Re-make the project (after a cleaning) to see if you haven't -introduced any errors.

      - -

      shell -$ ./project make clean -$ ./project make -

    • -
  12. -
  13. Don't merge some files in future updates: As described below, you - can later update your infra-structure (for example to fix bugs) by - merging your master branch with maneage. For files that you have - created in your own branch, there will be no problem. However if you - modify an existing Maneage file for your project, next time its - updated on maneage you'll have an annoying conflict. The commands - below show how to fix this future problem. With them, you can - configure Git to ignore the changes in maneage for some of the files - you have already edited and deleted above (and will edit below). Note - that only the first echo command has a > (to write over the file), - the rest are >> (to append to it). If you want to avoid any other - set of files to be imported from Maneage into your project's branch, - you can follow a similar strategy. We recommend only doing it when you - encounter the same conflict in more than one merge and there is no - other change in that file. Also, don't add core Maneage Makefiles, - otherwise Maneage can break on the next run.

    - -

    shell - $ echo "paper.tex merge=ours" > .gitattributes - $ echo "tex/src/delete-me.mk merge=ours" >> .gitattributes - $ echo "tex/src/delete-me-demo.mk merge=ours" >> .gitattributes - $ echo "reproduce/analysis/make/delete-me.mk merge=ours" >> .gitattributes - $ echo "reproduce/software/config/TARGETS.conf merge=ours" >> .gitattributes - $ echo "reproduce/analysis/config/delete-me-num.conf merge=ours" >> .gitattributes - $ git add .gitattributes -

  14. -
  15. Copyright and License notice: It is necessary that all the - "copyright-able" files in your project (those larger than 10 lines) - have a copyright and license notice. Please take a moment to look at - several existing files to see a few examples. The copyright notice is - usually close to the start of the file, it is the line starting with - Copyright (C) and containing a year and the author's name (like the - examples below). The License notice is a short description of the - copyright license, usually one or two paragraphs with a URL to the - full license. Don't forget to add these two notices to any new - file you add in your project (you can just copy-and-paste). When you - modify an existing Maneage file (which already has the notices), just - add a copyright notice in your name under the existing one(s), like - the line with capital letters below. To start with, add this line with - your name and email address to paper.tex, - tex/src/preamble-header.tex, reproduce/analysis/make/top-make.mk, - and generally, all the files you modified in the previous step.

    - -

    - Copyright (C) 2018-2020 Existing Name <existing@email.address> - Copyright (C) 2020 YOUR NAME <YOUR@EMAIL.ADDRESS> -

  16. -
  17. Configure Git for fist time: If this is the first time you are - running Git on this system, then you have to configure it with some - basic information in order to have essential information in the commit - messages (ignore this step if you have already done it). Git will - include your name and e-mail address information in each commit. You - can also specify your favorite text editor for making the commit - (emacs, vim, nano, and etc.).

    - -

    shell - $ git config --global user.name "YourName YourSurname" - $ git config --global user.email your-email@example.com - $ git config --global core.editor nano -

  18. -
  19. Your first commit: You have already made some small and basic - changes in the steps above and you are in your project's master - branch. So, you can officially make your first commit in your - project's history and push it. But before that, you need to make sure - that there are no problems in the project. This is a good habit to - always re-build the system before a commit to be sure it works as - expected.

    - -

    shell - $ git status # See which files you have changed. - $ git diff # Check the lines you have added/changed. - $ ./project make # Make sure everything builds successfully. - $ git add -u # Put all tracked changes in staging area. - $ git status # Make sure everything is fine. - $ git diff --cached # Confirm all the changes that will be committed. - $ git commit # Your first commit: put a good description! - $ git push # Push your commit to your remote. -

  20. -
  21. Start your exciting research: You are now ready to add flesh and - blood to this raw skeleton by further modifying and adding your - exciting research steps. You can use the "published works" section in - the introduction (above) as some fully working models to learn - from. Also, don't hesitate to contact us if you have any - questions.

  22. -
- -

Other basic customizations

- - - -

Tips for designing your project

- -

The following is a list of design points, tips, or recommendations that -have been learned after some experience with this type of project -management. Please don't hesitate to share any experience you gain after -using it with us. In this way, we can add it here (with full giving credit) -for the benefit of others.

- -

+ +

Future improvements

+ +

This is an evolving project and as time goes on, it will evolve and become + more robust. Some of the most prominent issues we plan to implement in the + future are listed below, please join us if you are interested.

+ +

Package management

+ +

It is important to have control of the environment of the project. Maneage + currently builds the higher-level programs (for example GNU Bash, GNU Make, + GNU AWK and domain-specific software) it needs, then sets PATH so the + analysis is done only with the project's built software. But currently the + configuration of each program is in the Makefile rules that build it. This + is not good because a change in the build configuration does not + automatically cause a re-build. Also, each separate project on a system + needs to have its own built tools (that can waste a lot of space).

+ +

A good solution is based on the Nix package manager: a separate file is present for + each software, containing all the necessary info to build it (including its + URL, its tarball MD5 hash, dependencies, configuration parameters, build + steps and etc). Using this file, a script can automatically generate the + Make rules to download, build and install program and its dependencies + (along with the dependencies of those dependencies and etc).

+ +

All the software are installed in a "store". Each installed file (library + or executable) is prefixed by a hash of this configuration (and the OS + architecture) and the standard program name. For example (from the Nix + webpage):

+ +

 /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1/
-

- -

The important thing is that the "store" is not in the project's search -path. After the complete installation of the software, symbolic links are -made to populate each project's program and library search paths without a -hash. This hash will be unique to that particular software and its -particular configuration. So simply by searching for this hash in the -installed directory, we can find the installed files of that software to -generate the links.

- -

This scenario has several advantages: 1) a change in a software's build -configuration triggers a rebuild. 2) a single "store" can be used in many -projects, thus saving space and configuration time for new projects (that -commonly have large overlaps in lower-level programs).

- -

Appendix: Necessity of exact reproduction in scientific research

- -

In case the link above is -not accessible at the time of reading, here is a copy of the introduction -of that link, describing the necessity for a reproducible project like this -(copied on February 7th, 2018):

- -

The most important element of a "scientific" statement/result is the fact -that others should be able to falsify it. The Tsunami of data that has -engulfed astronomers in the last two decades, combined with faster -processors and faster internet connections has made it much more easier to -obtain a result. However, these factors have also increased the complexity -of a scientific analysis, such that it is no longer possible to describe -all the steps of an analysis in the published paper. Citing this -difficulty, many authors suffice to describing the generalities of their -analysis in their papers.

- -

However, It is impossible to falsify (or even study) a result if you can't -exactly reproduce it. The complexity of modern science makes it vitally -important to exactly reproduce the final result. Because even a small -deviation can be due to many different parts of an analysis. Nature is -already a black box which we are trying so hard to comprehend. Not letting -other scientists see the exact steps taken to reach a result, or not -allowing them to modify it (do experiments on it) is a self-imposed black -box, which only exacerbates our ignorance.

- -

Other scientists should be able to reproduce, check and experiment on the -results of anything that is to carry the "scientific" label. Any result -that is not reproducible (due to incomplete information by the author) is -not scientific: the readers have to have faith in the subjective experience -of the authors in the very important choice of configuration values and -order of operations: this is contrary to the scientific spirit.

- -

Copyright information

- -

This file is part of Maneage's core: https://git.maneage.org/project.git

- -

Maneage is free software: you can redistribute it and/or modify it under -the terms of the GNU General Public License as published by the Free -Software Foundation, either version 3 of the License, or (at your option) -any later version.

- -

Maneage is distributed in the hope that it will be useful, but WITHOUT ANY -WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS -FOR A PARTICULAR PURPOSE. See the GNU General Public License for more -details.

- -

You should have received a copy of the GNU General Public License along -with Maneage. If not, see https://www.gnu.org/licenses/.

+
+ +

The important thing is that the "store" is not in the project's search + path. After the complete installation of the software, symbolic links are + made to populate each project's program and library search paths without a + hash. This hash will be unique to that particular software and its + particular configuration. So simply by searching for this hash in the + installed directory, we can find the installed files of that software to + generate the links.

+ +

This scenario has several advantages: 1) a change in a software's build + configuration triggers a rebuild. 2) a single "store" can be used in many + projects, thus saving space and configuration time for new projects (that + commonly have large overlaps in lower-level programs).

+ +

Appendix: Necessity of exact reproduction in scientific research

+ +

In case the link above is + not accessible at the time of reading, here is a copy of the introduction + of that link, describing the necessity for a reproducible project like this + (copied on February 7th, 2018):

+ +

The most important element of a "scientific" statement/result is the fact + that others should be able to falsify it. The Tsunami of data that has + engulfed astronomers in the last two decades, combined with faster + processors and faster internet connections has made it much more easier to + obtain a result. However, these factors have also increased the complexity + of a scientific analysis, such that it is no longer possible to describe + all the steps of an analysis in the published paper. Citing this + difficulty, many authors suffice to describing the generalities of their + analysis in their papers.

+ +

However, It is impossible to falsify (or even study) a result if you can't + exactly reproduce it. The complexity of modern science makes it vitally + important to exactly reproduce the final result. Because even a small + deviation can be due to many different parts of an analysis. Nature is + already a black box which we are trying so hard to comprehend. Not letting + other scientists see the exact steps taken to reach a result, or not + allowing them to modify it (do experiments on it) is a self-imposed black + box, which only exacerbates our ignorance.

+ +

Other scientists should be able to reproduce, check and experiment on the + results of anything that is to carry the "scientific" label. Any result + that is not reproducible (due to incomplete information by the author) is + not scientific: the readers have to have faith in the subjective experience + of the authors in the very important choice of configuration values and + order of operations: this is contrary to the scientific spirit.

+ +

Copyright information

+ +

This file is part of Maneage's core: https://git.maneage.org/project.git

+ +

Maneage is free software: you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation, either version 3 of the License, or (at your option) + any later version.

+ +

Maneage is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for more + details.

+ +

You should have received a copy of the GNU General Public License along + with Maneage. If not, see https://www.gnu.org/licenses/.

+ + -- cgit v1.2.1