aboutsummaryrefslogtreecommitdiff
path: root/README-hacking.md
diff options
context:
space:
mode:
authorMohammad Akhlaghi <mohammad@akhlaghi.org>2020-05-01 22:36:45 +0100
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2020-05-01 22:36:45 +0100
commitdf878cc2dc80ea09e09617949869ab7044d3c3c3 (patch)
tree4ffb0d73d6e6b3403d01923b94b27628f566398e /README-hacking.md
parenta6f5fcd6177b8f6319ffccddda1627f8b1dad415 (diff)
parent82666074e0c921e53c21b9e2c444e9a2d407d092 (diff)
Imported recent changes in Maneage, minor conflicts fixed
A few small conflicts showed up here and there. They are fixed with this merge.
Diffstat (limited to 'README-hacking.md')
-rw-r--r--README-hacking.md663
1 files changed, 325 insertions, 338 deletions
diff --git a/README-hacking.md b/README-hacking.md
index c1efccc..149c6a2 100644
--- a/README-hacking.md
+++ b/README-hacking.md
@@ -1,54 +1,53 @@
-Reproducible paper template
-===========================
+Maneage: managing data lineage
+==============================
Copyright (C) 2018-2020 Mohammad Akhlaghi <mohammad@akhlaghi.org>\
Copyright (C) 2020 Raul Infante-Sainz <infantesainz@gmail.com>\
See the end of the file for license conditions.
-This project contains a **fully working template** for doing reproducible
-research (or writing a reproducible paper) as defined in the link below. If
-the link below is not accessible at the time of reading, please see the
-appendix at the end of this file for a portion of its introduction. Some
+Maneage is a **fully working template** for doing reproducible research (or
+writing a reproducible paper) as defined in the link below. If the link
+below is not accessible at the time of reading, please see the appendix at
+the end of this file for a portion of its introduction. Some
[slides](http://akhlaghi.org/pdf/reproducible-paper.pdf) are also available
to help demonstrate the concept implemented here.
http://akhlaghi.org/reproducible-science.html
-This template is created with the aim of supporting reproducible research
-by making it easy to start a project in this framework. As shown below, it
-is very easy to customize this reproducible paper template for any
-particular (research) project and expand it as it starts and evolves. It
-can be run with no modification (as described in `README.md`) as a
-demonstration and customized for use in any project as fully described
-below.
-
-A project designed using this template will download and build all the
-necessary libraries and programs for working in a closed environment
-(highly independent of the host operating system) with fixed versions of
-the necessary dependencies. The tarballs for building the local environment
-are also collected in a [separate
-repository](https://gitlab.com/makhlaghi/reproducible-paper-dependencies). The
-final output of the project is [a
-paper](https://gitlab.com/makhlaghi/reproducible-paper-output/raw/master/paper.pdf).
-Notice the last paragraph of the Acknowledgments where all the necessary
-software are mentioned with their versions.
+Maneage is created with the aim of supporting reproducible research by
+making it easy to start a project in this framework. As shown below, it is
+very easy to customize Maneage for any particular (research) project and
+expand it as it starts and evolves. It can be run with no modification (as
+described in `README.md`) as a demonstration and customized for use in any
+project as fully described below.
+
+A project designed using Maneage will download and build all the necessary
+libraries and programs for working in a closed environment (highly
+independent of the host operating system) with fixed versions of the
+necessary dependencies. The tarballs for building the local environment are
+also collected in a [separate
+repository](http://git.maneage.org/tarballs-software.git/tree/). The final
+output of the project is [a
+paper](http://git.maneage.org/output-raw.git/plain/paper.pdf). Notice the
+last paragraph of the Acknowledgments where all the necessary software are
+mentioned with their versions.
Below, we start with a discussion of why Make was chosen as the high-level
language/framework for project management and how to learn and master Make
easily (and freely). The general architecture and design of the project is
then discussed to help you navigate the files and their contents. This is
-followed by a checklist for the easy/fast customization of this template to
-your exciting research. We continue with some tips and guidelines on how to
+followed by a checklist for the easy/fast customization of Maneage to your
+exciting research. We continue with some tips and guidelines on how to
manage or extend your project as it grows based on our experiences with it
so far. The main body concludes with a description of possible future
-improvements that are planned for the template (but not yet
-implemented). As discussed above, we end with a short introduction on the
-necessity of reproducible science in the appendix.
-
-Please don't forget to share your thoughts, suggestions and criticisms on
-this template. Maintaining and designing this template is itself a separate
-project, so please join us if you are interested. Once it is mature enough,
-we will describe it in a paper (written by all contributors) for a formal
+improvements that are planned for Maneage (but not yet implemented). As
+discussed above, we end with a short introduction on the necessity of
+reproducible science in the appendix.
+
+Please don't forget to share your thoughts, suggestions and
+criticisms. Maintaining and designing Maneage is itself a separate project,
+so please join us if you are interested. Once it is mature enough, we will
+describe it in a paper (written by all contributors) for a formal
introduction to the community.
@@ -117,9 +116,10 @@ Make is a +40 year old software that is still evolving, therefore many
implementations of Make exist. The only difference in them is some extra
features over the [standard
definition](https://pubs.opengroup.org/onlinepubs/009695399/utilities/make.html)
-(which is shared in all of them). This template has been created for GNU
-Make which is the most common, most actively developed, and most advanced
-implementation. Just note that this template downloads, builds, internally
+(which is shared in all of them). Maneage is primarily written in GNU Make
+(which it installs itself, you don't have to have it on your system). GNU
+Make is the most common, most actively developed, and most advanced
+implementation. Just note that Maneage downloads, builds, internally
installs, and uses its own dependencies (including GNU Make), so you don't
have to have it installed before you try it out.
@@ -168,13 +168,14 @@ your hands off the keyboard!).
-Published works using this template
------------------------------------
+Published works using Maneage
+-----------------------------
The list below shows some of the works that have already been published
-with (earlier versions of) this template. Note that this template is
-evolving, so some details may be different in them. The more recent ones
-can be used as a good working example besides the default template.
+with (earlier versions of) Maneage. Previously it was simply called
+"Reproducible paper template". Note that Maneage is evolving, so some
+details may be different in them. The more recent ones can be used as a
+good working example.
- Infante-Sainz et
al. ([2020](https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.5317I),
@@ -208,11 +209,11 @@ can be used as a good working example besides the default template.
([2015](http://adsabs.harvard.edu/abs/2015ApJS..220....1A), ApJS, 220,
1): The version controlled project is available [on
GitLab](https://gitlab.com/makhlaghi/NoiseChisel-paper). This is the
- very first (and much less mature!) implementation of this template: the
- history of this template started more than two years after this paper
- was published. It is a very rudimentary/initial implementation, thus it
- is only included here for historical reasons. However, the project
- source is complete, accurate and uploaded to arXiv along with the paper.
+ very first (and much less mature!) incarnation of Maneage: the history
+ of Maneage started more than two years after this paper was
+ published. It is a very rudimentary/initial implementation, thus it is
+ only included here for historical reasons. However, the project source
+ is complete, accurate and uploaded to arXiv along with the paper.
@@ -221,15 +222,11 @@ can be used as a good working example besides the default template.
Citation
--------
-A paper will be published to fully describe this reproducible paper
-template. Until then, if you used this template in your work, please cite
-the paper that implemented its first version: Akhlaghi & Ichikawa
+A paper to fully describe Maneage has been submitted. Until then, if you
+used it in your work, please cite the paper that implemented its first
+version: Akhlaghi & Ichikawa
([2015](http://adsabs.harvard.edu/abs/2015ApJS..220....1A), ApJS, 220, 1).
-The experience gained with this template after several more implementations
-will be used to make it robust enough for a complete and useful paper to
-introduce to the community afterwards.
-
Also, when your paper is published, don't forget to add a notice in your
own paper (in coordination with the publishing editor) that the paper is
fully reproducible and possibly add a sentence or paragraph in the end of
@@ -249,15 +246,15 @@ a reproducible manner.
Project architecture
====================
-In order to customize this template to your research, it is important to
-first understand its architecture so you can navigate your way in the
-directories and understand how to implement your research project within
-its framework: where to add new files and which existing files to modify
-for what purpose. But if this the first time you are using this template,
-before reading this theoretical discussion, please run the template once
-from scratch without any chages (described in `README.md`). You will see
-how it works (note that the configure step builds all necessary software,
-so it can take long, but you can continue reading while its working).
+In order to customize Maneage to your research, it is important to first
+understand its architecture so you can navigate your way in the directories
+and understand how to implement your research project within its framework:
+where to add new files and which existing files to modify for what
+purpose. But if this the first time you are using Maneage, before reading
+this theoretical discussion, please run Maneage once from scratch without
+any changes (described in `README.md`). You will see how it works (note that
+the configure step builds all necessary software, so it can take long, but
+you can continue reading while its working).
The project has two top-level directories: `reproduce` and
`tex`. `reproduce` hosts all the software building and analysis
@@ -276,9 +273,9 @@ links in the project's top source directory: `.build` which points to the
top build directory and `.local` for easy access to the custom built
software installation directory. With these you can easily access the build
directory and project-specific software from your top source directory. For
-example if you run `.local/bin/ls` you will be using the `ls` of the
-template, which is problably different from your system's `ls` (run them
-both with `--version` to check).
+example if you run `.local/bin/ls` you will be using the `ls` of Maneage,
+which is probably different from your system's `ls` (run them both with
+`--version` to check).
Once the project is configured for your system, `./project make` will do
the basic preparations and run the project's analysis with the custom
@@ -287,7 +284,7 @@ version of software. The `project` script is just a wrapper, and with the
(both are in the `reproduce/analysis/make` directory).
In terms of organization, `top-prepare.mk` and `top-make.mk` have an
-identical design, only minor differences. So, let's continue the template's
+identical design, only minor differences. So, let's continue Maneage's
architecture with `top-make.mk`. Once you understand that, you'll clearly
understand `top-prepare.mk` also. These very high-level files are
relatively short and heavily commented so hopefully the descriptions in
@@ -356,10 +353,10 @@ compiling it to the final PDF). So the last target in a workhorse-Makefile
is a `.tex` file (with the same base-name as the Makefile, but in
`$(BDIR)/tex/macros`). As a result, if the targets in a workhorse-Makefile
aren't directly a prerequisite of other workhorse-Makefile targets, they
-can be a pre-requisite of that intermediate LaTeX macro file and thus be
+can be a prerequisite of that intermediate LaTeX macro file and thus be
called when necessary. Otherwise, they will be ignored by Make.
-This template also has a mode to share the build directory between several
+Maneage also has a mode to share the build directory between several
users of a Unix group (when working on large computer clusters). In this
scenario, each user can have their own cloned project source, but share the
large built files between each other. To do this, it is necessary for all
@@ -441,27 +438,27 @@ files, it makes no effort in keeping the file meta data, and in particular
the dates of files. Therefore when you checkout to a different branch,
files that are re-written by Git will have a newer date than the other
project files. However, file dates are important in the current design of
-the template: Make uses file dates of the pre-requisits and targets to see
-if the target should be re-built.
+Maneage: Make checks the dates of the prerequisite files and target files
+to see if the target should be re-built.
-To fix this problem, for this template we use a forked version of
+To fix this problem, for Maneage we use a forked version of
[Metastore](https://github.com/mohammad-akhlaghi/metastore). Metastore use
a binary database file (which is called `.file-metadata`) to keep the
modification dates of all the files under version control. This file is
also under version control, but is hidden (because it shouldn't be modified
-by hand). During the project's configuration, the template installs to Git
-hooks to run Metastore 1) before making a commit to update its database
-with the file dates in a branch, and 2) after doing a checkout, to reset
-the file-dates after the checkout is complete and re-set the file dates
-back to what they were.
+by hand). During the project's configuration, Maneage installs to Git hooks
+to run Metastore 1) before making a commit to update its database with the
+file dates in a branch, and 2) after doing a checkout, to reset the
+file-dates after the checkout is complete and re-set the file dates back to
+what they were.
-In practice, Metastore should work almost fully invisiablly within your
+In practice, Metastore should work almost fully invisibly within your
project. The only place you might notice its presence is that you'll see
`.file-metadata` in the list of modified/staged files (commonly after
merging your branches). Since its a binary file, Git also won't show you
the changed contents. In a merge, you can simply accept any changes with
`git add -u`. But if Git is telling you that it has changed without a merge
-(for example if you started a commit, but cancelled it in the middle), you
+(for example if you started a commit, but canceled it in the middle), you
can just do `git checkout .file-metadata` and set it back to its original
state.
@@ -485,7 +482,7 @@ mind are listed below.
in the workhorse-Makefiles or paper's LaTeX source. Define such
constants as logically-grouped, separate configuration-Makefiles in
`reproduce/analysis/config/XXXXX.conf`. Then set this
- configuration-Makefiles file as a pre-requisite to any rule that uses
+ configuration-Makefiles file as a prerequisite to any rule that uses
the variable defined in it.
- Through any number of intermediate prerequisites, all processing steps
@@ -506,7 +503,7 @@ mind are listed below.
Customization checklist
=======================
-Take the following steps to fully customize this template for your research
+Take the following steps to fully customize Maneage for your research
project. After finishing the list, be sure to run `./project configure` and
`project make` to see if everything works correctly. If you notice anything
missing or any in-correct part (probably a change that has not been
@@ -514,39 +511,35 @@ explained here), please let us know to correct it.
As described above, the concept of reproducibility (during a project)
heavily relies on [version
-control](https://en.wikipedia.org/wiki/Version_control). Currently this
-template uses Git as its main version control system. If you are not
-already familiar with Git, please read the first three chapters of the
-[ProGit book](https://git-scm.com/book/en/v2) which provides a wonderful
-practical understanding of the basics. You can read later chapters as you
-get more advanced in later stages of your work.
+control](https://en.wikipedia.org/wiki/Version_control). Currently Maneage
+uses Git as its main version control system. If you are not already
+familiar with Git, please read the first three chapters of the [ProGit
+book](https://git-scm.com/book/en/v2) which provides a wonderful practical
+understanding of the basics. You can read later chapters as you get more
+advanced in later stages of your work.
First custom commit
-------------------
- - **Get this repository and its history** (if you don't already have it):
- Arguably the easiest way to start is to clone this repository as shown
- below. As you see, after the cloning some further corrections to your
- clone's Git settings are necessary: first, you need to remove all
- possibly existing Git tags from the template's history. Then you need
- to rename the conventional `origin` remote server, and the `master`
- branch. This renaming allows you to use these standard names for your
- own customized project (which greatly helps because this convention is
- widely used).
+ 1. **Get this repository and its history** (if you don't already have it):
+ Arguably the easiest way to start is to clone Maneage and prepare for
+ your customizations as shown below. After the cloning first you rename
+ the default `origin` remote server to specify that this is Maneage's
+ remote server. This will allow you to use the conventional `origin`
+ name for your own project as shown in the next steps. Second, you will
+ create and go into the conventional `master` branch to start
+ committing in your project later.
```shell
- $ git clone git://git.sv.gnu.org/reproduce # Clone/copy the project and its history.
- $ mv reproduce my-project # Change the name to your project's name.
- $ cd my-project # Go into the cloned directory.
- $ git tag | xargs git tag -d # Delete all template tags.
- $ git config remote.origin.tagopt --no-tags # No tags in future fetch/pull from this template.
- $ git remote rename origin template-origin # Rename current/only remote to "template-origin".
- $ git branch -m template # Rename current/only branch to "template".
- $ git checkout -b master # Create and enter new "master" branch.
- $ pwd # Just to confirm where you are.
+ $ git clone https://git.maneage.org/project.git # Clone/copy the project and its history.
+ $ mv project my-project # Change the name to your project's name.
+ $ cd my-project # Go into the cloned directory.
+ $ git remote rename origin origin-maneage # Rename current/only remote to "origin-maneage".
+ $ git checkout -b master # Create and enter your own "master" branch.
+ $ pwd # Just to confirm where you are.
```
- - **Prepare to build project**: The `./project configure` command of the
+ 2. **Prepare to build project**: The `./project configure` command of the
next step will build the different software packages within the
"build" directory (that you will specify). Nothing else on your system
will be touched. However, since it takes long, it is useful to see
@@ -555,36 +548,36 @@ First custom commit
terminal on your desktop and navigate to the same project directory
that you cloned (output of last command above). Then run the following
command. Once every second, this command will just print the date
- (possibly followed by a non-existant directory notice). But as soon as
+ (possibly followed by a non-existent directory notice). But as soon as
the next step starts building software, you'll see the names of
software get printed as they are being built. Once any software is
installed in the project build directory it will be removed. Again,
don't worry, nothing will be installed outside the build directory.
```shell
- # On another terminal (go to top project directory)
+ # On another terminal (go to top project source directory, last command above)
$ ./project --check-config
```
- - **Test the template**: Before making any changes, it is important to
- test it and see if everything works properly with the commands
- below. If there is any problem in the `./project configure` or
- `./project make` steps, please contact us to fix the problem before
- continuing. Since the building of dependencies in configuration can
- take long, you can take the next few steps (editing the files) while
- its working (they don't affect the configuration). After `./project
- make` is finished, open `paper.pdf`. If it looks fine, you are ready
- to start customizing the template for your project. But before that,
- clean all the extra template outputs with `make clean` as shown below.
+ 3. **Test Maneage**: Before making any changes, it is important to test it
+ and see if everything works properly with the commands below. If there
+ is any problem in the `./project configure` or `./project make` steps,
+ please contact us to fix the problem before continuing. Since the
+ building of dependencies in configuration can take long, you can take
+ the next few steps (editing the files) while its working (they don't
+ affect the configuration). After `./project make` is finished, open
+ `paper.pdf`. If it looks fine, you are ready to start customizing the
+ Maneage for your project. But before that, clean all the extra Maneage
+ outputs with `make clean` as shown below.
```shell
$ ./project configure # Build the project's software environment (can take an hour or so).
- $ ./project make # Do the processing and build paper (just a simple demo in the template).
+ $ ./project make # Do the processing and build paper (just a simple demo).
# Open 'paper.pdf' and see if everything is ok.
```
- - **Setup the remote**: You can use any [hosting
+ 4. **Setup the remote**: You can use any [hosting
facility](https://en.wikipedia.org/wiki/Comparison_of_source_code_hosting_facilities)
that supports Git to keep an online copy of your project's version
controlled history. We recommend [GitLab](https://gitlab.com) because
@@ -592,25 +585,27 @@ First custom commit
perfect)](https://www.gnu.org/software/repo-criteria-evaluation.html),
and later you can also host GitLab on your own server. Anyway, create
an account in your favorite hosting facility (if you don't already
- have one), and define a new project there. It will give you a URL
- (usually starting with `git@` and ending in `.git`), put this URL in
- place of `XXXXXXXXXX` in the first command below. With the second
- command, "push" your `master` branch to your `origin` remote, and
- (with the `--set-upstream` option) set them to track/follow each
- other. However, the `template` branch is currently tracking/following
- your `template-origin` remote (automatically set when you cloned the
- template). So when pushing the `template` branch to your `origin`
- remote, you _shouldn't_ use `--set-upstream`. With the last command,
- you can actually check this (which local and remote branches are
- tracking each other).
+ have one), and define a new project there. Please make sure *the newly
+ created project is empty* (some services ask to include a `README` in
+ a new project which is bad in this scenario, and will not allow you to
+ push to it). It will give you a URL (usually starting with `git@` and
+ ending in `.git`), put this URL in place of `XXXXXXXXXX` in the first
+ command below. With the second command, "push" your `master` branch to
+ your `origin` remote, and (with the `--set-upstream` option) set them
+ to track/follow each other. However, the `maneage` branch is currently
+ tracking/following your `origin-maneage` remote (automatically set
+ when you cloned Maneage). So when pushing the `maneage` branch to your
+ `origin` remote, you _shouldn't_ use `--set-upstream`. With the last
+ command, you can actually check this (which local and remote branches
+ are tracking each other).
```shell
git remote add origin XXXXXXXXXX # Newly created repo is now called 'origin'.
- git push --set-upstream origin master # Push 'master' branch to 'origin' (enable tracking).
- git push origin template # Push 'template' branch to 'origin' (no tracking).
+ git push --set-upstream origin master # Push 'master' branch to 'origin' (with tracking).
+ git push origin maneage # Push 'maneage' branch to 'origin' (no tracking).
```
- - **Title**, **short description** and **author**: The title and basic
+ 5. **Title**, **short description** and **author**: The title and basic
information of your project's output PDF paper should be added in
`paper.tex`. You should see the relevant place in the preamble (prior
to `\begin{document}`. After you are done, run the `./project make`
@@ -621,26 +616,28 @@ First custom commit
your own methods after finishing this checklist and doing your first
commit.
- - **Delete dummy parts (can be done later)**: The template contains some
- parts that are only for the initial/test run, mainly as a
- demonstration of important steps. They not for any real analysis. You
- can remove these parts in the file below
+ 6. **Delete dummy parts**: Maneage contains some parts that are only for
+ the initial/test run, mainly as a demonstration of important steps,
+ which you can use as a reference to use in your own project. But they
+ not for any real analysis, so you should remove these parts as
+ described below:
- `paper.tex`: 1) Delete the text of the abstract (from
- `\includeabstract{` to `\vspace{0.25cm}`) and start writing your own
- (a single sentence can be enough now). 2) Add some keywords under it
- in the keywords part. 3) Delete everything between `%% Start of main
- body.` and `%% End of main body.`. 4) Remove the notice in the
- "Acknowledgments" section (in `\new{}`) and add Acknowledge your
- funding sources. Just don't delete the existing acknowledgment
- statement: this template was designed by funding from many
- grants. Since you are using it in your work, it is necessary to
+ `\includeabstract{` to `\vspace{0.25cm}`) and write your own (a
+ single sentence can be enough now, you can complete it later). 2)
+ Add some keywords under it in the keywords part. 3) Delete
+ everything between `%% Start of main body.` and `%% End of main
+ body.`. 4) Remove the notice in the "Acknowledgments" section (in
+ `\new{}`) and Acknowledge your funding sources (this can also be
+ done later). Just don't delete the existing acknowledgment
+ statement: Maneage is possible thanks to funding from several
+ grants. Since Maneage is being used in your work, it is necessary to
acknowledge them in your work also.
- `reproduce/analysis/make/top-make.mk`: Delete the `delete-me` line
in the `makesrc` definition. Just make sure there is no empty line
between the `download \` and `verify \` lines (they should be
- directly under eachother).
+ directly under each other).
- `reproduce/analysis/make/verify.mk`: In the final recipe, under the
commented line `Verify TeX macros`, remove the full line that
@@ -672,88 +669,89 @@ First custom commit
$ ./project make
```
- - Tell Git _not_ to merge changes in the dummy `delete-me` files, and
- `paper.tex` (its contents are just dummy place holders) from the
- template (by keeping their names in a `.gitattributes` file). Note
- that only the first `echo` command has a `>` (to re-write the file
- with the given line), the rest are `>>` (to append to it). After
- doing this step in your own branch, when future commits in the
- template make any change in the given files, they will not be
- imported into your project's branch (it can be annoying!). You can
- follow a similar strategy if you want to avoid any other set of
- files to be imported from the template into your project's branch.
+ 7. **Don't merge some files in future updates**: As described below, you
+ can later update your infra-structure (for example to fix bugs) by
+ merging your `master` branch with `maneage`. For files that you have
+ created in your own branch, there will be no problem. However if you
+ modify an existing Maneage file for your project, next time its
+ updated on `maneage` you'll have an annoying conflict. The commands
+ below show how to fix this future problem. With them, you can
+ configure Git to ignore the changes in `maneage` for some of the files
+ you have already edited and deleted above (and will edit below). Note
+ that only the first `echo` command has a `>` (to write over the file),
+ the rest are `>>` (to append to it). If you want to avoid any other
+ set of files to be imported from Maneage into your project's branch,
+ you can follow a similar strategy. We recommend only doing it when you
+ encounter the same conflict in more than one merge and there is no
+ other change in that file. Also, don't add core Maneage Makefiles,
+ otherwise Maneage can break on the next run.
- ```shell
- $ echo "paper.tex" > .gitattributes
- $ echo "tex/src/delete-me.mk merge=ours" >> .gitattributes
- $ echo "tex/src/delete-me-demo.mk merge=ours" >> .gitattributes
- $ echo "reproduce/analysis/make/delete-me.mk merge=ours" >> .gitattributes
- $ echo "reproduce/analysis/config/delete-me-num.conf merge=ours" >> .gitattributes
- $ git add .gitattributes
- ```
+ ```shell
+ $ echo "paper.tex merge=ours" > .gitattributes
+ $ echo "tex/src/delete-me.mk merge=ours" >> .gitattributes
+ $ echo "tex/src/delete-me-demo.mk merge=ours" >> .gitattributes
+ $ echo "reproduce/analysis/make/delete-me.mk merge=ours" >> .gitattributes
+ $ echo "reproduce/software/config/TARGETS.conf merge=ours" >> .gitattributes
+ $ echo "reproduce/analysis/config/delete-me-num.conf merge=ours" >> .gitattributes
+ $ git add .gitattributes
+ ```
- - **Copyright and License notice**: To be usable/modifiable by others
- after publication, _all_ the "copyright-able" files in your project
- (those larger than 10 lines) must have a copyright notice and license
- notice. Please take a moment to look at several existing files to see
- a few examples. The copyright notice is usually close to the start of
- the file, it is the line starting with `Copyright (C)` and containing
- a year and the author's name. The License notice is a short (or full,
- when its not too long, like the MIT license) description of the
- copyright license, usually less than three paragraphs. Don't forget to
- add these _two_ notices to any new file you add to this template for
- your project. When you modify an existing template file (which already
- has the notices), just add a copyright notice in your name under the
- existing one(s), like the line with capital letters below. Please add
- this line with your name and email address to `paper.tex` and
- `tex/src/preamble-header.tex`.
+ 8. **Copyright and License notice**: It is necessary that _all_ the
+ "copyright-able" files in your project (those larger than 10 lines)
+ have a copyright and license notice. Please take a moment to look at
+ several existing files to see a few examples. The copyright notice is
+ usually close to the start of the file, it is the line starting with
+ `Copyright (C)` and containing a year and the author's name (like the
+ examples below). The License notice is a short description of the
+ copyright license, usually one or two paragraphs with a URL to the
+ full license. Don't forget to add these _two_ notices to *any new
+ file* you add in your project (you can just copy-and-paste). When you
+ modify an existing Maneage file (which already has the notices), just
+ add a copyright notice in your name under the existing one(s), like
+ the line with capital letters below. To start with, add this line with
+ your name and email address to `paper.tex`,
+ `tex/src/preamble-header.tex`, `reproduce/analysis/make/top-make.mk`,
+ and generally, all the files you modified in the previous step.
```
- Copyright (C) 2018-2020 Mohammad Akhlaghi <mohammad@akhlaghi.org>
+ Copyright (C) 2018-2020 Existing Name <existing@email.address>
Copyright (C) 2020 YOUR NAME <YOUR@EMAIL.ADDRESS>
```
- - **Configure Git for fist time**: If you have never used Git, then you
- have to configure it with some basic information in order to have
- essential information in the commit messages (ignore this step if you
- have already done it). Git will include your name and e-mail address
- information in each commit. You can also specify your favorite text
- editor for making the commit (`emacs`, `vim`, etc.).
+ 9. **Configure Git for fist time**: If this is the first time you are
+ running Git on this system, then you have to configure it with some
+ basic information in order to have essential information in the commit
+ messages (ignore this step if you have already done it). Git will
+ include your name and e-mail address information in each commit. You
+ can also specify your favorite text editor for making the commit
+ (`emacs`, `vim`, `nano`, and etc.).
```shell
$ git config --global user.name "YourName YourSurname"
$ git config --global user.email your-email@example.com
- $ git config --global core.editor vim
+ $ git config --global core.editor nano
```
- - **Your first commit**: You have already made some small and basic
- changes in the steps above and you are in the `master` branch. So, you
- can officially make your first commit in your project's history. But
- before that you need to make sure that there are no problems in the
- project (this is a good habit to always re-build the system before a
- commit to be sure it works as expected).
+ 10. **Your first commit**: You have already made some small and basic
+ changes in the steps above and you are in your project's `master`
+ branch. So, you can officially make your first commit in your
+ project's history and push it. But before that, you need to make sure
+ that there are no problems in the project. This is a good habit to
+ always re-build the system before a commit to be sure it works as
+ expected.
```shell
$ git status # See which files you have changed.
- $ git diff # See the lines you have added/changed.
+ $ git diff # Check the lines you have added/changed.
$ ./project make # Make sure everything builds successfully.
$ git add -u # Put all tracked changes in staging area.
$ git status # Make sure everything is fine.
- $ git commit # Your first commit, add a nice description.
- $ git tag -a v0 # Tag this as the zero-th version of your project.
- ```
-
- - **Push to the remote**: Push your first commit and its tag to your
- remote repository with these commands. Since we have setup your
- `master` branch to follow `origin/master`, you can just use `git push`
- from now on.
-
- ```shell
- $ git push
- $ git push --tags
+ $ git diff --cached # Confirm all the changes that will be committed.
+ $ git commit # Your first commit: put a good description!
+ $ git push # Push your commit to your remote.
```
- - **Start your exciting research**: You are now ready to add flesh and
+ 11. **Start your exciting research**: You are now ready to add flesh and
blood to this raw skeleton by further modifying and adding your
exciting research steps. You can use the "published works" section in
the introduction (above) as some fully working models to learn
@@ -764,17 +762,17 @@ First custom commit
Other basic customizations
--------------------------
- - **High-level software**: The template installs all the software that
- your project needs. You can specify which software your project needs
- in `reproduce/software/config/TARGETS.conf`. The necessary software
- are classified into two classes: 1) programs or libraries (usually
- written in C/C++) which are run directly by the operating system. 2)
- Python modules/libraries that are run within Python. By default
+ - **High-level software**: Maneage installs all the software that your
+ project needs. You can specify which software your project needs in
+ `reproduce/software/config/TARGETS.conf`. The necessary software are
+ classified into two classes: 1) programs or libraries (usually written
+ in C/C++) which are run directly by the operating system. 2) Python
+ modules/libraries that are run within Python. By default
`TARGETS.conf` only has GNU Astronomy Utilities (Gnuastro) as one
scientific program and Astropy as one scientific Python module. Both
have many dependencies which will be installed into your project
during the configuration step. To see a list of software that are
- currently ready to be built in the template, see
+ currently ready to be built in Maneage, see
`reproduce/software/config/versions.conf` (which has their versions
also), the comments in `TARGETS.conf` describe how to use the software
name from `versions.conf`. Currently the raw pipeline just uses
@@ -793,7 +791,7 @@ Other basic customizations
`reproduce/analysis/make/download.mk`. Have a look there to see how
these values are to be used. This information about the input datasets
is also used in the initial `configure` script (to inform the users),
- so also modify that file. You can find all occurrences of the template
+ so also modify that file. You can find all occurrences of the demo
dataset with the command below and replace it with your input's
dataset.
@@ -802,7 +800,7 @@ Other basic customizations
```
- **`README.md`**: Correct all the `XXXXX` place holders (name of your
- project, your own name, address of the template's online/remote
+ project, your own name, address of your project's online/remote
repository, link to download dependencies and etc). Generally, read
over the text and update it where necessary to fit your project. Don't
forget that this is the first file that is displayed on your online
@@ -821,7 +819,7 @@ Other basic customizations
checksum and if any file's checksum is different from the one recorded
in the project, it will stop and print the problematic file and its
expected and calculated checksums. First set the value of
- `verify-outputs` valiable in
+ `verify-outputs` variable in
`reproduce/analysis/config/verify-outputs.conf` to `yes`. Then go to
`reproduce/analysis/make/verify.mk`. The verification of all the files
is only done in one recipe. First the files that go into the
@@ -842,24 +840,33 @@ Other basic customizations
the headers. You can use the provided function(s), or define one for
your special formats.
- - **Feedback**: As you use the template you will notice many things that
- if implemented from the start would have been very useful for your
- work. This can be in the actual scripting and architecture of the
- template, or useful implementation and usage tips, like those
- below. In any case, please share your thoughts and suggestions with
- us, so we can add them here for everyone's benefit.
-
- - **Updating TeXLive**: Currently the only software package that the
- template doesn't build is TeXLive (since its not part of the analysis,
- only for demonstration: building the PDf). So when a new version of
- TeXLive comes (once every year), if you would like to build the paper,
- its necessary to update it in your project (otherwise the configure
- script will crash). To do that, just modify the years in
- `reproduce/software/config/texlive.conf`, then delete
- `.build/software/tarballs/install-tl-unx.tar.gz`. The next time you
- run `./project configure`, the new TeXLive will be installed and used.
-
- - **Pre-publication: add notice on reproducibility**: Add a notice
+ - **Feedback**: As you use Maneage you will notice many things that if
+ implemented from the start would have been very useful for your
+ work. This can be in the actual scripting and architecture of Maneage,
+ or useful implementation and usage tips, like those below. In any
+ case, please share your thoughts and suggestions with us, so we can
+ add them here for everyone's benefit.
+
+ - **Re-preparation**: Automatic preparation is only run in the first run
+ of the project on a system, to re-do the preparation you have to use
+ the option below. Here is the reason for this: when its necessary, the
+ preparation process can be slow and will unnecessarily slow down the
+ whole project while the project is under development (focus is on the
+ analysis that is done after preparation). Because of this, preparation
+ will be done automatically for the first time that the project is run
+ (when `.build/software/preparation-done.mk` doesn't exist). After the
+ preparation process completes once, future runs of `./project make`
+ will not do the preparation process anymore (will not call
+ `top-prepare.mk`). They will only call `top-make.mk` for the
+ analysis. To manually invoke the preparation process after the first
+ attempt, the `./project make` script should be run with the
+ `--prepare-redo` option, or you can delete the special file above.
+
+ ```shell
+ $ ./project make --prepare-redo
+ ```
+
+ - **Pre-publication**: add notice on reproducibility**: Add a notice
somewhere prominent in the first page within your paper, informing the
reader that your research is fully reproducible. For example in the
end of the abstract, or under the keywords with a title like
@@ -926,8 +933,8 @@ for the benefit of others.
of the processing, put all their configuration files in a devoted
directory (with the program's name) within
`reproduce/software/config`. Similar to the
- `reproduce/software/config/gnuastro` directory (which is put in the
- template as a demo in case you use GNU Astronomy Utilities). It is
+ `reproduce/software/config/gnuastro` directory (which is put in
+ Maneage as a demo in case you use GNU Astronomy Utilities). It is
much cleaner and readable (thus less buggy) to avoid mixing the
configuration files, even if there is no technical necessity.
@@ -938,8 +945,8 @@ for the benefit of others.
(copyrights aren't necessary for the latter).
- *Copyright*: Always start a file containing programming constructs
- with a copyright statement like the ones that this template starts
- with (for example in the top level `Makefile`).
+ with a copyright statement like the ones that Maneage starts with
+ (for example in the top level `Makefile`).
- *Comments*: Comments are vital for readability (by yourself in two
months, or others). Describe everything you can about why you are
@@ -971,11 +978,11 @@ for the benefit of others.
contexts.
- *Environment of each recipe*: If you need to define a special
- environment (or alises, or scripts to run) for all the recipes in
+ environment (or aliases, or scripts to run) for all the recipes in
your Makefiles, you can use a Bash startup file
`reproduce/software/shell/bashrc.sh`. This file is loaded before every
Make recipe is run, just like the `.bashrc` in your home directory is
- loaded everytime you start a new interactive, non-login terminal. See
+ loaded every time you start a new interactive, non-login terminal. See
the comments in that file for more.
- *Automatic variables*: These are wonderful and very useful Make
@@ -1034,8 +1041,8 @@ for the benefit of others.
will give you a special shared-memory device (directory): on systems
using the GNU C Library (all GNU/Linux system), it is `/dev/shm`. The
contents of this directory are actually in your RAM, not in your
- persistance storage like the HDD or SSD. Reading and writing from/to
- the RAM is much faster than persistant storage, so if you have enough
+ persistence storage like the HDD or SSD. Reading and writing from/to
+ the RAM is much faster than persistent storage, so if you have enough
RAM available, it can be very beneficial for large temporary files to
be put there. You can use the `mktemp` program to give the temporary
files a randomly-set name, and use text files as targets to keep that
@@ -1047,14 +1054,14 @@ for the benefit of others.
.ONESHELL:
.SHELLFLAGS = -ec
all: mean-std.txt
- shm-template = /dev/shm/$(shell whoami)-XXXXXXXXXX
+ shm-maneage := /dev/shm/$(shell whoami)-maneage-XXXXXXXXXX
large1.txt: input.fits
- out=$$(mktemp $(shm-template))
+ out=$$(mktemp $(shm-maneage))
astarithmetic $< 2 + --output=$$out.fits
echo "$$out" > $@
large2.txt: large1.txt
input=$$(cat $<)
- out=$$(mktemp $(shm-template))
+ out=$$(mktemp $(shm-maneage))
astarithmetic $$input.fits 2 - --output=$$out.fits
rm $$input.fits $$input
echo "$$out" > $@
@@ -1064,7 +1071,7 @@ for the benefit of others.
rm $$input.fits $$input
```
The important point here is that the temporary name template
- (`shm-template`) has no suffix. So you can add the suffix
+ (`shm-maneage`) has no suffix. So you can add the suffix
corresponding to your desired format afterwards (for example
`$$out.fits`, or `$$out.txt`). But more importantly, when `mktemp`
sets the random name, it also checks if no file exists with that name
@@ -1078,15 +1085,14 @@ for the benefit of others.
sure that first the file with a suffix is deleted, then the core
random file (note that when working in parallel on powerful systems,
in the time between deleting two files of a single `rm` command, many
- things can happen!). When using this template, you can put the
- definition of `shm-template` in
- `reproduce/analysis/make/initialize.mk` to be usable in all the
- different Makefiles of your analysis, and you won't need the three
- lines above it. **Finally, BE RESPONSIBLE:** after you are finished,
- be sure to clean up any possibly remaining files (due to crashes in
- the processing while you are working), otherwise your RAM may fill up
- very fast. You can do it easily with a command like this on your
- command-line: `rm -f /dev/shm/$(whoami)-*`.
+ things can happen!). When using Maneage, you can put the definition
+ of `shm-maneage` in `reproduce/analysis/make/initialize.mk` to be
+ usable in all the different Makefiles of your analysis, and you won't
+ need the three lines above it. **Finally, BE RESPONSIBLE:** after you
+ are finished, be sure to clean up any possibly remaining files (due
+ to crashes in the processing while you are working), otherwise your
+ RAM may fill up very fast. You can do it easily with a command like
+ this on your command-line: `rm -f /dev/shm/$(whoami)-*`.
- **Software tarballs and raw inputs**: It is critically important to
@@ -1114,86 +1120,69 @@ for the benefit of others.
project's reproducibility, so like the above for software, make sure
you have a backup of them, or their persistent identifiers (PIDs).
- - **Version control**: Version control is a critical component of this
- template. Here are some tips to help in effectively using it.
+ - **Version control**: Version control is a critical component of
+ Maneage. Here are some tips to help in effectively using it.
- *Regular commits*: It is important (and extremely useful) to have the
history of your project under version control. So try to make commits
regularly (after any meaningful change/step/result).
- - *Keep template up-to-date*: In time, this template is going to become
- more and more mature and robust (thanks to your feedback and the
- feedback of other users). Bugs will be fixed and new/improved
- features will be added. So every once and a while, you can run the
- commands below to pull new work that is done in this template. If the
- changes are useful for your work, you can merge them with your
- project to benefit from them. Just pay **very close attention** to
- resolving possible **conflicts** which might happen in the merge
- (updated settings that you have customized in the template).
+ - *Keep Maneage up-to-date*: In time, Maneage is going to become more
+ and more mature and robust (thanks to your feedback and the feedback
+ of other users). Bugs will be fixed and new/improved features will be
+ added. So every once and a while, you can run the commands below to
+ pull new work that is done in Maneage. If the changes are useful for
+ your work, you can merge them with your project to benefit from
+ them. Just pay **very close attention** to resolving possible
+ **conflicts** which might happen in the merge (updated settings that
+ you have customized in Maneage).
```shell
- $ git checkout template
- $ git pull # Get recent work in the template
+ $ git checkout maneage
+ $ git pull # Get recent work in Maneage
$ git log XXXXXX..XXXXXX --reverse # Inspect new work (replace XXXXXXs with hashs mentioned in output of previous command).
$ git log --oneline --graph --decorate --all # General view of branches.
$ git checkout master # Go to your top working branch.
- $ git merge template # Import all the work into master.
+ $ git merge maneage # Import all the work into master.
```
- - *Adding this template to a fork of your project*: As you and your
- colleagues continue your project, it will be necessary to have
- separate forks/clones of it. But when you clone your own project on a
+ - *Adding Maneage to a fork of your project*: As you and your colleagues
+ continue your project, it will be necessary to have separate
+ forks/clones of it. But when you clone your own project on a
different system, or a colleague clones it to collaborate with you,
- the clone won't have the `template-origin` remote that you started
- the project with. As shown in the previous item above, you need this
- remote to be able to pull recent updates from the template. The steps
- below will setup the `template-origin` remote, and a local `template`
+ the clone won't have the `origin-maneage` remote that you started the
+ project with. As shown in the previous item above, you need this
+ remote to be able to pull recent updates from Maneage. The steps
+ below will setup the `origin-maneage` remote, and a local `maneage`
branch to track it, on the new clone.
```shell
- $ git remote add template-origin git://git.sv.gnu.org/reproduce
- $ git fetch template-origin
- $ git checkout -b template --track template-origin/master
+ $ git remote add origin-maneage https://git.maneage.org/project.git
+ $ git fetch origin-maneage
+ $ git checkout -b maneage --track origin-maneage/maneage
```
- *Commit message*: The commit message is a very important and useful
aspect of version control. To make the commit message useful for
others (or yourself, one year later), it is good to follow a
- consistent style. The template already has a consistent formatting
+ consistent style. Maneage already has a consistent formatting
(described below), which you can also follow in your project if you
- like. You can see many examples by running `git log` in the
- `template` branch. If you intend to push commits to the main
- template, for the consistency of the template, it is necessary to
- follow these guidelines. 1) No line should be more than 75 characters
- (to enable easy reading of the message when you run `git log` on the
- standard 80-character terminal). 2) The first line is the title of
- the commit and should summarize it (so `git log --oneline` can be
- useful). The title should also not end with a point (`.`, because its
- a short single sentence, so a point is not necessary and only wastes
- space). 3) After the title, leave an empty line and start the body of
- your message (possibly containing many paragraphs). 4) Describe the
- context of your commit (the problem it is trying to solve) as much as
- possible, then go onto how you solved it. One suggestion is to start
- the main body of your commit with "Until now ...", and continue
- describing the problem in the first paragraph(s). Afterwards, start
- the next paragraph with "With this commit ...".
-
- - *Tags*: To help manage the history, tag all major commits. This helps
- make a more human-friendly output of `git describe`: for example
- `v1-4-gaafdb04` states that we are on commit `aafdb04` which is 4
- commits after tag `v1`. The output of `git describe` is included in
- your final PDF as part of this project. Also, if you use
- reproducibility-friendly software like Gnuastro, this value will also
- be included in all output files, see the description of `COMMIT` in
- [Output
- headers](https://www.gnu.org/software/gnuastro/manual/html_node/Output-headers.html).
- In the checklist above, you tagged the first commit of your project
- with `v0`. Here is one suggestion on when to tag: when you have fully
- adopted the template and have got the first (initial) results, you
- can make a `v1` tag. Subsequently when you first start reporting the
- results to your colleagues, you can tag the commit as `v2` and
- increment the version on every later circulation, or referee
- submission.
+ like. You can see many examples by running `git log` in the `maneage`
+ branch. If you intend to push commits to Maneage, for the consistency
+ of Maneage, it is necessary to follow these guidelines. 1) No line
+ should be more than 75 characters (to enable easy reading of the
+ message when you run `git log` on the standard 80-character
+ terminal). 2) The first line is the title of the commit and should
+ summarize it (so `git log --oneline` can be useful). The title should
+ also not end with a point (`.`, because its a short single sentence,
+ so a point is not necessary and only wastes space). 3) After the
+ title, leave an empty line and start the body of your message
+ (possibly containing many paragraphs). 4) Describe the context of
+ your commit (the problem it is trying to solve) as much as possible,
+ then go onto how you solved it. One suggestion is to start the main
+ body of your commit with "Until now ...", and continue describing the
+ problem in the first paragraph(s). Afterwards, start the next
+ paragraph with "With this commit ...".
- *Project outputs*: During your research, it is possible to checkout a
specific commit and reproduce its results. However, the processing
@@ -1212,16 +1201,15 @@ for the benefit of others.
communications). After the research is published, you can also
release the outputs repository, or you can just delete it if it is
too large or un-necessary (it was just for convenience, and fully
- reproducible after all). For example this template's output is
- available for demonstration in the separate
- [reproducible-paper-output](https://gitlab.com/makhlaghi/reproducible-paper-output)
- repository.
+ reproducible after all). For example Maneage's output is available
+ for demonstration in [a
+ separate](http://git.maneage.org/output-raw.git/) repository.
- *Full Git history in one file*: When you are publishing your project
(for example to Zenodo for long term preservation), it is more
convenient to have the whole project's Git history into one file to
- save with your datasets. Afterall, you can't be sure that your
- current Git server (for example Gitlab, Github, or Bitbucket) will be
+ save with your datasets. After all, you can't be sure that your
+ current Git server (for example GitLab, Github, or Bitbucket) will be
active forever. While they are good for the immediate future, you
can't rely on them for archival purposes. Fortunately keeping your
whole history in one file is easy with Git using the following
@@ -1236,7 +1224,7 @@ for the benefit of others.
```
- You can easily upload `my-project-git.bundle` anywhere. Later, if
- you need to unbundle it, you can use the following command.
+ you need to un-bundle it, you can use the following command.
```shell
$ git clone my-project-git.bundle
@@ -1261,13 +1249,13 @@ future are listed below, please join us if you are interested.
Package management
------------------
-It is important to have control of the environment of the project. The
-current template builds the higher-level programs (for example GNU Bash,
-GNU Make, GNU AWK and domain-specific software) it needs, then sets `PATH`
-so the analysis is done only with the project's built software. But
-currently the configuration of each program is in the Makefile rules that
-build it. This is not good because a change in the build configuration does
-not automatically cause a re-build. Also, each separate project on a system
+It is important to have control of the environment of the project. Maneage
+currently builds the higher-level programs (for example GNU Bash, GNU Make,
+GNU AWK and domain-specific software) it needs, then sets `PATH` so the
+analysis is done only with the project's built software. But currently the
+configuration of each program is in the Makefile rules that build it. This
+is not good because a change in the build configuration does not
+automatically cause a re-build. Also, each separate project on a system
needs to have its own built tools (that can waste a lot of space).
A good solution is based on the [Nix package
@@ -1346,18 +1334,17 @@ order of operations: this is contrary to the scientific spirit.
Copyright information
---------------------
-This file is part of the reproducible paper template
- http://savannah.nongnu.org/projects/reproduce
+This file is part of Maneage's core: https://git.maneage.org/project.git
-This template is free software: you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the Free
+Maneage is free software: you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option)
any later version.
-This template is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
-or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-more details.
+Maneage is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
+details.
You should have received a copy of the GNU General Public License along
-with Template. If not, see <https://www.gnu.org/licenses/>.
+with Maneage. If not, see <https://www.gnu.org/licenses/>.