aboutsummaryrefslogtreecommitdiff
path: root/README-hacking.md
diff options
context:
space:
mode:
authorMohammad Akhlaghi <mohammad@akhlaghi.org>2020-04-02 15:51:28 +0100
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2020-04-02 16:05:03 +0100
commitde3842a14e36b2debb3ad375d95411d15d45dc84 (patch)
tree86ae571f85c639c616cf8939cbcec7f1c1be4c9c /README-hacking.md
parent646756675566a0907edf143c6b6950e0479d9e7e (diff)
parentcbf177e09af6b9d240388d148b0cb5e3488d8b09 (diff)
Imported recent work on Maneage, minor conflicts fixed
A few minor conflicts occurred and were fixed.
Diffstat (limited to 'README-hacking.md')
-rw-r--r--README-hacking.md176
1 files changed, 87 insertions, 89 deletions
diff --git a/README-hacking.md b/README-hacking.md
index 5e202b3..82b40e3 100644
--- a/README-hacking.md
+++ b/README-hacking.md
@@ -1,7 +1,8 @@
Reproducible paper template
===========================
-Copyright (C) 2018-2020 Mohammad Akhlaghi <mohammad@akhlaghi.org>
+Copyright (C) 2018-2020 Mohammad Akhlaghi <mohammad@akhlaghi.org>\
+Copyright (C) 2020 Raul Infante-Sainz <infantesainz@gmail.com>\
See the end of the file for license conditions.
This project contains a **fully working template** for doing reproducible
@@ -176,7 +177,7 @@ evolving, so some details may be different in them. The more recent ones
can be used as a good working example besides the default template.
- Infante-Sainz et
- al. ([2019](https://ui.adsabs.harvard.edu/abs/2019MNRAS.tmp.2729I),
+ al. ([2020](https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.5317I),
MNRAS, 491, 5317): The version controlled project source is available
[on GitLab](https://gitlab.com/infantesainz/sdss-extended-psfs-paper)
and is also archived on Zenodo with all the necessary software tarballs:
@@ -279,28 +280,20 @@ example if you run `.local/bin/ls` you will be using the `ls` of the
template, which is problably different from your system's `ls` (run them
both with `--version` to check).
-Once the project is configured for your system, `./project prepare` and
-`./project make` will do the basic preparations and run the project's
-analysis with the custom version of software. The `project` script is just
-a wrapper, and with the commands above, it will call `top-prepare.mk` and
-`top-make.mk` (both are in the `reproduce/analysis/make` directory).
-
-In the template, no particular preparation is necessary, so it will
-immediately finish and instruct you to run `./project make`. But in some
-projects, it can be very useful to do some very basic preparatory steps on
-the input data that can greatly optimize running of `./project make`. For
-example, you may need to query a server, to find how many input files there
-are. Once that number is known in the preparation phase, `./project make`
-can parallelize the analysis much more effectively.
+Once the project is configured for your system, `./project make` will do
+the basic preparations and run the project's analysis with the custom
+version of software. The `project` script is just a wrapper, and with the
+`make` argument, it will first call `top-prepare.mk` and `top-make.mk`
+(both are in the `reproduce/analysis/make` directory).
In terms of organization, `top-prepare.mk` and `top-make.mk` have an
-identical design, only a minor difference. So, let's continue the
-template's architecture with `top-make.mk`. Once you understand that,
-you'll clearly understand `top-prepare.mk` also. These very high-level
-files are relatively short and heavily commented so hopefully the
-descriptions in each comment will be enough to understand the general
-details. As you read this section, please also look at the contents of the
-mentioned files and directories to fully understand what is going on.
+identical design, only minor differences. So, let's continue the template's
+architecture with `top-make.mk`. Once you understand that, you'll clearly
+understand `top-prepare.mk` also. These very high-level files are
+relatively short and heavily commented so hopefully the descriptions in
+each comment will be enough to understand the general details. As you read
+this section, please also look at the contents of the mentioned files and
+directories to fully understand what is going on.
Before starting to look into the top `top-make.mk`, it is important to
recall that Make defines dependencies by files. Therefore, the
@@ -328,17 +321,17 @@ variables/configurations) and _workhorse-Makefiles_ (Makefiles that
actually contain analysis/processing rules).
The configuration-Makefiles are those that satisfy these two wildcards:
-`reproduce/software/config/installation/*.mk` (for building the necessary
+`reproduce/software/config/installation/*.conf` (for building the necessary
software when you run `./project configure`) and
-`reproduce/analysis/config/*.mk` (for the high-level analysis, when you run
-`./project make`). These Makefiles don't actually have any rules, they just
-have values for various free parameters throughout the configuration or
-analysis. Open a few of them to see for yourself. These Makefiles must only
-contain raw Make variables (project configurations). By "raw" we mean that
-the Make variables in these files must not depend on variables in any other
-configuration-Makefile. This is because we don't want to assume any order
-in reading them. It is also very important to *not* define any rule, or
-other Make construct, in these configuration-Makefiles.
+`reproduce/analysis/config/*.conf` (for the high-level analysis, when you
+run `./project make`). These Makefiles don't actually have any rules, they
+just have values for various free parameters throughout the configuration
+or analysis. Open a few of them to see for yourself. These Makefiles must
+only contain raw Make variables (project configurations). By "raw" we mean
+that the Make variables in these files must not depend on variables in any
+other configuration-Makefile. This is because we don't want to assume any
+order in reading them. It is also very important to *not* define any rule,
+or other Make construct, in these configuration-Makefiles.
Following this rule-of-thumb enables you to set these configure-Makefiles
as a prerequisite to any target that depends on their variable
@@ -379,8 +372,8 @@ Let's see how this design is implemented. Please open and inspect
`top-make.mk` it as we go along here. The first step (un-commented line) is
to import the local configuration (your answers to the questions of
`./project configure`). They are defined in the configuration-Makefile
-`reproduce/software/config/installation/LOCAL.mk` which was also built by
-`./project configure` (based on the `LOCAL.mk.in` template of the same
+`reproduce/software/config/installation/LOCAL.conf` which was also built by
+`./project configure` (based on the `LOCAL.conf.in` template of the same
directory).
The next non-commented set of the top `Makefile` defines the ultimate
@@ -444,7 +437,7 @@ project is designed to grow in this framework.
File modification dates (meta data)
-----------------------------------
-While git does an excellent job at keeping a history of the contents of
+While Git does an excellent job at keeping a history of the contents of
files, it makes no effort in keeping the file meta data, and in particular
the dates of files. Therefore when you checkout to a different branch,
files that are re-written by Git will have a newer date than the other
@@ -492,7 +485,7 @@ mind are listed below.
- Do not use any constant numbers (or important names like filter names)
in the workhorse-Makefiles or paper's LaTeX source. Define such
constants as logically-grouped, separate configuration-Makefiles in
- `reproduce/analysis/config/XXXXX.mk`. Then set this
+ `reproduce/analysis/config/XXXXX.conf`. Then set this
configuration-Makefiles file as a pre-requisite to any rule that uses
the variable defined in it.
@@ -551,6 +544,27 @@ First custom commit
$ git remote rename origin template-origin # Rename current/only remote to "template-origin".
$ git branch -m template # Rename current/only branch to "template".
$ git checkout -b master # Create and enter new "master" branch.
+ $ pwd # Just to confirm where you are.
+ ```
+
+ - **Prepare to build project**: The `./project configure` command of the
+ next step will build the different software packages within the
+ "build" directory (that you will specify). Nothing else on your system
+ will be touched. However, since it takes long, it is useful to see
+ what it is being built at every instant (its almost impossible to tell
+ from the torrent of commands that are produced!). So open another
+ terminal on your desktop and navigate to the same project directory
+ that you cloned (output of last command above). Then run the following
+ command. Once every second, this command will just print the date
+ (possibly followed by a non-existant directory notice). But as soon as
+ the next step starts building software, you'll see the names of
+ software get printed as they are being built. Once any software is
+ installed in the project build directory it will be removed. Again,
+ don't worry, nothing will be installed outside the build directory.
+
+ ```shell
+ # On another terminal (go to top project directory)
+ $ ./project --check-config
```
- **Test the template**: Before making any changes, it is important to
@@ -566,22 +580,11 @@ First custom commit
```shell
$ ./project configure # Build the project's software environment (can take an hour or so).
- $ ./project prepare # Pre-processing preparations (doing nothing in the raw template).
$ ./project make # Do the processing and build paper (just a simple demo in the template).
# Open 'paper.pdf' and see if everything is ok.
```
- - **Software building status**: While the `./project configure` command of
- the step above is busy building all the different software, you can
- check the status by running the following command in another terminal
- (but same project source directory). See the "Inspecting status"
- section below for more.
-
- ```shell
- $ while true; do echo; date; ls .build/software/build-tmp; sleep 1; done
- ```
-
- **Setup the remote**: You can use any [hosting
facility](https://en.wikipedia.org/wiki/Comparison_of_source_code_hosting_facilities)
that supports Git to keep an online copy of your project's version
@@ -637,7 +640,8 @@ First custom commit
- `reproduce/analysis/make/top-make.mk`: Delete the `delete-me` line
in the `makesrc` definition. Just make sure there is no empty line
- between the `download \` and `verify \` lines.
+ between the `download \` and `verify \` lines (they should be
+ directly under eachother).
- `reproduce/analysis/make/verify.mk`: In the final recipe, under the
commented line `Verify TeX macros`, remove the full line that
@@ -654,7 +658,7 @@ First custom commit
```
- Disable verification of outputs by removing the `yes` from
- `reproduce/analysis/config/verify-outputs.mk`. Later, when you are
+ `reproduce/analysis/config/verify-outputs.conf`. Later, when you are
ready to submit your paper, or publish the dataset, activate
verification and make the proper corrections in this file (described
under the "Other basic customizations" section below). This is a
@@ -685,7 +689,7 @@ First custom commit
$ echo "tex/src/delete-me.mk merge=ours" >> .gitattributes
$ echo "tex/src/delete-me-demo.mk merge=ours" >> .gitattributes
$ echo "reproduce/analysis/make/delete-me.mk merge=ours" >> .gitattributes
- $ echo "reproduce/analysis/config/delete-me-num.mk merge=ours" >> .gitattributes
+ $ echo "reproduce/analysis/config/delete-me-num.conf merge=ours" >> .gitattributes
$ git add .gitattributes
```
@@ -710,6 +714,19 @@ First custom commit
Copyright (C) 2020 YOUR NAME <YOUR@EMAIL.ADDRESS>
```
+ - **Configure Git for fist time**: If you have never used Git, then you
+ have to configure it with some basic information in order to have
+ essential information in the commit messages (ignore this step if you
+ have already done it). Git will include your name and e-mail address
+ information in each commit. You can also specify your favorite text
+ editor for making the commit (`emacs`, `vim`, etc.).
+
+ ```shell
+ $ git config --global user.name "YourName YourSurname"
+ $ git config --global user.email your-email@example.com
+ $ git config --global core.editor vim
+ ```
+
- **Your first commit**: You have already made some small and basic
changes in the steps above and you are in the `master` branch. So, you
can officially make your first commit in your project's history. But
@@ -750,29 +767,29 @@ Other basic customizations
- **High-level software**: The template installs all the software that
your project needs. You can specify which software your project needs
- in `reproduce/software/config/installation/TARGETS.mk`. The necessary
- software are classified into two classes: 1) programs or libraries
- (usually written in C/C++) which are run directly by the operating
- system. 2) Python modules/libraries that are run within Python. By
- default `TARGETS.mk` only has GNU Astronomy Utilities (Gnuastro) as
- one scientific program and Astropy as one scientific Python
- module. Both have many dependencies which will be installed into your
- project during the configuration step. To see a list of software that
- are currently ready to be built in the template, see
- `reproduce/software/config/installation/versions.mk` (which has their
- versions also), the comments in `TARGETS.mk` describe how to use the
- software name from `versions.mk`. Currently the raw pipeline just uses
- Gnuastro to make the demonstration plots. Therefore if you don't need
- Gnuastro, go through the analysis steps in `reproduce/analysis` and
- remove all its use cases (clearly marked).
+ in `reproduce/software/config/installation/TARGETS.conf`. The
+ necessary software are classified into two classes: 1) programs or
+ libraries (usually written in C/C++) which are run directly by the
+ operating system. 2) Python modules/libraries that are run within
+ Python. By default `TARGETS.conf` only has GNU Astronomy Utilities
+ (Gnuastro) as one scientific program and Astropy as one scientific
+ Python module. Both have many dependencies which will be installed
+ into your project during the configuration step. To see a list of
+ software that are currently ready to be built in the template, see
+ `reproduce/software/config/installation/versions.conf` (which has
+ their versions also), the comments in `TARGETS.conf` describe how to use
+ the software name from `versions.conf`. Currently the raw pipeline just
+ uses Gnuastro to make the demonstration plots. Therefore if you don't
+ need Gnuastro, go through the analysis steps in `reproduce/analysis`
+ and remove all its use cases (clearly marked).
- **Input dataset**: The input datasets are managed through the
- `reproduce/analysis/config/INPUTS.mk` file. It is best to gather all
+ `reproduce/analysis/config/INPUTS.conf` file. It is best to gather all
the information regarding all the input datasets into this one central
file. To ensure that the proper dataset is being downloaded and used
by the project, it is also recommended get an [MD5
checksum](https://en.wikipedia.org/wiki/MD5) of the file and include
- that in `INPUTS.mk` so the project can check it automatically. The
+ that in `INPUTS.conf` so the project can check it automatically. The
preparation/downloading of the input datasets is done in
`reproduce/analysis/make/download.mk`. Have a look there to see how
these values are to be used. This information about the input datasets
@@ -806,7 +823,7 @@ Other basic customizations
in the project, it will stop and print the problematic file and its
expected and calculated checksums. First set the value of
`verify-outputs` valiable in
- `reproduce/analysis/config/verify-outputs.mk` to `yes`. Then go to
+ `reproduce/analysis/config/verify-outputs.conf` to `yes`. Then go to
`reproduce/analysis/make/verify.mk`. The verification of all the files
is only done in one recipe. First the files that go into the
plots/figures are checked, then the LaTeX macros. Validation of the
@@ -956,8 +973,8 @@ for the benefit of others.
- *Environment of each recipe*: If you need to define a special
environment (or alises, or scripts to run) for all the recipes in
- your Makefiles, you can use the Bash startup file
- `reproduce/software/bash/bashrc.sh`. This file is loaded before every
+ your Makefiles, you can use a Bash startup file
+ `reproduce/software/shell/bashrc.sh`. This file is loaded before every
Make recipe is run, just like the `.bashrc` in your home directory is
loaded everytime you start a new interactive, non-login terminal. See
the comments in that file for more.
@@ -1226,25 +1243,6 @@ for the benefit of others.
$ git clone my-project-git.bundle
```
- - **Inspecting software building status**: When you run `./project
- configure`, several programs and libraries start to get configured and
- build (in many cases, simultaneously). To understand the building
- process, or for debugging a strange situation, it is sometimes useful
- to know which programs are being built at every moment. To do this,
- you can look into the `.build/software/build-tmp` directory (from the
- top project directory). This temporary directory is only present while
- building the software. At every moment, it contains the unpacked
- source tarball directories of the all the packages that are being
- built. After a software is successfully installed in your project, it
- is removed from this directory. To automatically get a listing of this
- directory every second, you can run the command below (on another
- terminal while the software are being built). Press `CTRL-C` to stop
- it and return back to the command-line).
-
- ```shell
- $ while true; do echo; date; ls .build/software/build-tmp; sleep 1; done
- ```
-