aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.file-metadatabin4005 -> 3947 bytes
-rw-r--r--.gitignore2
-rw-r--r--README-hacking.md146
-rwxr-xr-xconfigure288
4 files changed, 157 insertions, 279 deletions
diff --git a/.file-metadata b/.file-metadata
index 9dd95e7..eb42414 100644
--- a/.file-metadata
+++ b/.file-metadata
Binary files differ
diff --git a/.gitignore b/.gitignore
index e6dc938..2808dc1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -30,6 +30,7 @@ mmap_*
.tex
build
.local
+.build
Makefile
tex/tikz
.gnuastro
@@ -38,7 +39,6 @@ tex/tikz
tex/pipeline
LOCAL_tmp.mk
LOCAL_old.mk
-reproduce/build
gnuastro-local.conf
reproduce/config/pipeline/LOCAL.mk
diff --git a/README-hacking.md b/README-hacking.md
index 62ba01e..ed9f153 100644
--- a/README-hacking.md
+++ b/README-hacking.md
@@ -254,29 +254,38 @@ will use). Therefore, the pipeline builds its own dependencies during the
`reproduce/src/make/dependencies-basic.mk` and
`reproduce/src/make/dependencies.mk`. These Makefiles are called by the
`./configure` script and not used afterwards. The first is intended for
-downloading and building the most basic tools like GNU Bash, GNU Make, and
-GNU Tar. Therefore it must only contain very basic and portable Make and
-shell features. The second is called after the first, thus enabling usage
-of the modern and advanced features of GNU Bash and GNU Make, similar to
-the rest of the pipeline. Later, if you add a new program/library for your
-research, you will need to include a rule on how to download and build it
-(in `reproduce/src/make/dependencies.mk`).
-
-After it finishes, `./configure` will create a `Makefile` in the top
-directory (a symbolic link to `reproduce/src/make/top.mk`) and a `.local`
-directory (a link for easy access to the custom built software
-packages). The `.local/bin/make` command will then use our custom version
-of GNU Make to do the analysis. The first file that is read by Make is the
-top-level `Makefile`. Therefore, we'll start our navigation/discussion with
-this file. This file is relatively short and heavily commented so hopefully
-the descriptions in each comment will be enough to understand the general
-details. As you read this section, please also look at the contents of the
-mentioned files and directories to fully understand what is going on.
+downloading and building the most basic tools like GNU Tar, GNU Bash, GNU
+Make, and GNU Compiler Collection (GCC). Therefore it must only contain
+very basic and portable Make and shell features. The second is called after
+the first, thus enabling usage of the modern and advanced features of GNU
+Bash, GNU Make and other low-level GNU tools, similar to the rest of the
+pipeline. Later, if you add a new program/library for your research, you
+will need to include a rule on how to download and build it (in
+`reproduce/src/make/dependencies.mk`).
+
+After it finishes, `./configure` will create the following symbolic links
+in the project's top source directory: 1) `Makefile` in the top directory
+which points to `reproduce/src/make/top.mk`. 2) `.build' which points to
+the top build directory, and 3) `.local` for easy access to the custom
+built software packages installation directory. The first is for practical
+necessity (so you can run `make' from the top source directory), but the
+latter are just for convenience (fast access to the built products and
+software).
+
+Therefore, by running `.local/bin/make` we will build the project with the
+project's custom version of GNU Make, not the host system's Make. The first
+file that is read by Make (the template's starting point) is the top-level
+`Makefile` (also created by `./configure`). Therefore, we'll start
+describing the template's architecture with this file. This file is
+relatively short and heavily commented so hopefully the descriptions in
+each comment will be enough to understand the general details. As you read
+this section, please also look at the contents of the mentioned files and
+directories to fully understand what is going on.
Before starting to look into the top `Makefile`, it is important to recall
that Make defines dependencies by files. Therefore, the input/prerequisite
and output of every step/rule must be a file. Also recall that Make will
-use the modification date of the prerequisite and target files to see if
+use the modification date of the prerequisite(s) and target files to see if
the target must be re-built or not. Therefore during the processing, _many_
intermediate files will be created (see the tips section below on a good
strategy to deal with large/huge files).
@@ -287,29 +296,31 @@ intermediate files (it was defined in `./configure`). This directory
doesn't need to be version controlled or even synchronized, or backed-up in
other servers: its contents are all products of the pipeline, and can be
easily re-created any time. As you define targets for your new rules, it is
-thus important to place them all under sub-directories of `$(BDIR)`.
+thus important to place them all under sub-directories of `$(BDIR)`. As
+mentioned above, you always have fast access to this "build"-directory with
+the `.build` symbolic link.
In this architecture, we have two types of Makefiles that are loaded into
the top `Makefile`: _configuration-Makefiles_ (only independent
variables/configurations) and _workhorse-Makefiles_ (Makefiles that
-actually contain rules).
+actually contain analysis/processing rules).
The configuration-Makefiles are those that satisfy this wildcard:
`reproduce/config/pipeline/*.mk`. These Makefiles don't actually have any
rules, they just have values for various free parameters throughout the
-analysis/processing. Open a few of them to see for your self. These
+analysis/processing. Open a few of them to see for yourself. These
Makefiles must only contain raw Make variables (pipeline
-configurations). By raw we mean that the Make variables in these files must
-not depend on variables in any other configuration-Makefile. This is
+configurations). By "raw" we mean that the Make variables in these files
+must not depend on variables in any other configuration-Makefile. This is
because we don't want to assume any order in reading them. It is also very
-important to *not* define any rule, or other Make construct in any of these
+important to *not* define any rule, or other Make construct, in these
configuration-Makefiles.
-These conditions will enable you to set these configure-Makefiles as a
-prerequisite to any target that depends on their variable
-values. Therefore, if you change any of their values, all targets that
-depend on those values will be re-built. This is very convenient as your
-project scales up and gets more complex.
+This enables you to set these configure-Makefiles as a prerequisite to any
+target that depends on their variable values. Therefore, if you change any
+of their values, all targets that depend on those values will be
+re-built. This is very convenient as your project scales up and gets more
+complex.
The workhorse-Makefiles are those satisfying this wildcard
`reproduce/src/make/*.mk`. They contain the details of the processing steps
@@ -320,17 +331,16 @@ other rules that will be defined prior to them (not a fixed name like
higher-level ones.
All processing steps are assumed to ultimately (usually after many rules)
-end up in some number, image, figure, or table that are to be included in
-the paper. The writing of these results into the final report/paper is
-managed through separate LaTeX files that only contain macros (a name given
-to a number/string to be used in the LaTeX source, which will be replaced
-when compiling it to the final PDF). So the last target in a
-workhorse-Makefile is a `.tex` file (with the same base-name as the
-Makefile, but in `$(BDIR)/tex/macros`). As a result, if the targets in a
-workhorse-Makefile aren't directly a prerequisite of other
-workhorse-Makefile targets, they can be a pre-requisite of that
-intermediate LaTeX macro file and thus be called when necessary. Otherwise,
-they will be ignored by Make.
+end up in some number, image, figure, or table that will be included in the
+paper. The writing of these results into the final report/paper is managed
+through separate LaTeX files that only contain macros (a name given to a
+number/string to be used in the LaTeX source, which will be replaced when
+compiling it to the final PDF). So the last target in a workhorse-Makefile
+is a `.tex` file (with the same base-name as the Makefile, but in
+`$(BDIR)/tex/macros`). As a result, if the targets in a workhorse-Makefile
+aren't directly a prerequisite of other workhorse-Makefile targets, they
+can be a pre-requisite of that intermediate LaTeX macro file and thus be
+called when necessary. Otherwise, they will be ignored by Make.
This pipeline also has a mode to share the build directory between several
users of a Unix group (when working on large computer clusters). In this
@@ -339,16 +349,15 @@ the large built files between each other. To do this, it is necessary for
all built files to give full permission to group members while not allowing
any other users access to the contents. Therefore the `./configure` and
Make steps must be called with special conditions which are managed in the
-`for-group` file.
+`for-group` script.
-Let's see how this design is implemented. When the `./configure` finishes,
-it a `Makefile` will be placed in the top directory. This `Makefile` is
-just a symbolic link to `reproduce/src/make/top.mk`. Please open and
-inspect it as we go along here. The first step (un-commented line) is to
-import the local configuration (answers to the questions `./configure`
-asked you). They are defined in the configuration-Makefile
-`reproduce/config/pipeline/LOCAL.mk` which was also built by `./configure`
-(based on the `LOCAL.mk.in` template).
+Let's see how this design is implemented. When `./configure` finishes: By
+creating a `Makefile` in the top directory, it allows us to start "making"
+the project. Please open and inspect it as we go along here. The first step
+(un-commented line) is to import the local configuration (answers to the
+questions `./configure` asked you). They are defined in the
+configuration-Makefile `reproduce/config/pipeline/LOCAL.mk` which was also
+built by `./configure` (based on the `LOCAL.mk.in` template).
The next non-commented set of lines define the ultimate target of the whole
pipeline (`paper.pdf`). But to avoid mistakes, a sanity check is necessary
@@ -358,14 +367,8 @@ the `./for-group` script, but Make isn't). Therefore we use a Make
conditional to define the `all` target based on the group permissions being
consistent between the initial configuration and the current run.
-If there is a problem `all` will not depend on anything and will just print
-a warning to inform you of the problem. When the group conditions are fine,
-`all` will depend on `paper.pdf` (which is defined in
-`reproduce/src/make/paper.mk` and will be imported into this top Makefile
-later).
-
Having defined the top target, our next step is to include all the other
-necessary Makefiles. But order matters in the importing of
+necessary Makefiles. However, order matters in the importing of
workhorse-Makefiles and each must also have a TeX macro file with the same
base name (without a suffix). Therefore, the next step in the top-level
Makefile is to define a `makesrc` variable to keep the base names (without
@@ -389,15 +392,16 @@ your rules into as many logically-similar but independent steps as
possible.
The `reproduce/src/make/paper.mk` Makefile must be the final Makefile that
-is included. It ends with the rule to build `paper.pdf` (final target of
-the whole reproduction pipeline). If look in it, you will notice that it
-starts with a rule to create `$(mtexdir)/pipeline.tex` (`mtexdir` is just a
-shorthand name for `$(BDIR)/tex/macros` mentioned before).
-`$(mtexdir)/pipeline.tex` is the connection between the processing/analysis
-steps of the pipeline, and the steps to build the final PDF. As you see,
-`$(mtexdir)/pipeline.tex` only instruct LaTeX to import the LaTeX macros of
-each high-level processing step during the analysis (the separate
-work-horse Makefiles that you defined and included).
+is included. This workhorse Makefile ends with the rule to build
+`paper.pdf` (final target of the whole reproduction pipeline). If you look
+in it, you will notice that it starts with a rule to create
+`$(mtexdir)/pipeline.tex` (`mtexdir` is just a shorthand name for
+`$(BDIR)/tex/macros` mentioned before). `$(mtexdir)/pipeline.tex` is the
+connection between the processing/analysis steps of the pipeline, and the
+steps to build the final PDF. As you see, `$(mtexdir)/pipeline.tex` only
+instructs LaTeX to import the LaTeX macros of each high-level processing
+step during the analysis (the separate work-horse Makefiles that you
+defined and included).
During the research, it often happens that you want to test a step that is
not a prerequisite of any higher-level operation. In such cases, you can
@@ -408,16 +412,6 @@ your research, set it as prerequisites to other rules and remove it from
the list of prerequisites for TeX macro file. In fact, this is how a
project is designed to grow in this framework.
-When working within a group, more than one person may want to work with the
-pipeline outputs (in the build directory). For example each person is
-developing part of the higher-level steps of the pipeline in their own Git
-branch of the pipeline, but using the same build directory. Therefore, the
-lower-level parts of the built outputs, can be shared between them. In such
-scenarios, this pipeline comes with a `for-group` script (in the top
-directory) which is just a simple wrapper to run the configure and building
-steps. You can specify a group name within this file. Therefore, when you
-use it (fully described in the comments at the start of the file), it will
-ensure that all group members have write access to the created files.
diff --git a/configure b/configure
index 3e0a7f0..14be525 100755
--- a/configure
+++ b/configure
@@ -22,9 +22,6 @@
# <http://www.gnu.org/licenses/>.
-
-
-
# Script settings
# ---------------
# Stop the script if there are any errors.
@@ -34,58 +31,33 @@ set -e
-# Output of --help
-# ----------------
-me=$0 # Executable file name.
-help_print() {
-
- if [ x"$build_dir" = x ]; then
- bdir_status="NOT SET"
- else
- bdir_status="$build_dir"
- fi
-
- if [ x"$input_dir" = x ]; then
- indir_status="NOT SET"
- else
- indir_status="$input_dir"
- fi
+# Default option values
+jobs=0
+build_dir=
+input_dir=
+software_dir=
+existing_conf=0
+minmapsize=10000000000
- if [ x"$software_dir" = x ]; then
- software_status="NOT SET"
- else
- software_status="$software_dir"
- fi
- if [ $in_minmapsize = 0 ]; then
- mm_status="NOT SET"
- else
- mm_status="$in_minmapsize"
- fi
- if [ $jobs = "0" ]; then
- jobs_status="NUMBER OF THREADS ON SYSTEM"
- else
- jobs_status=$jobs
- fi
- if [ $existing_conf = 1 ]; then
- ec_status="ACTIVATED"
- else
- ec_status="NOT SET"
- fi
+# Output of --help
+# ----------------
+me=$0 # Executable file name.
+print_help() {
# Print the output.
cat <<EOF
Usage: $me [OPTION]...
-Configure the reproducible paper template for this system (do local
-settings). The local settings can be given on the command-line through the
-options below. If not, the configure script will interactively ask for a
-value to each one (with basic necessary background information printed
-before them). Alternatively, if you have already configured this script for
-your system, you can use the '--existing-conf' to use it and avoid
-re-setting the values.
+Configure the reproducible paper template for this system (set local
+settings for this system). The local settings can be given on the
+command-line through the options below. If not, the configure script will
+interactively ask for a value to each one (with basic necessary background
+information printed before them). Alternatively, if you have already
+configured this script for your system, you can use the '--existing-conf'
+to use its values directly.
RECOMMENDATION: If this is the first time you are running this pipeline,
please don't use the options and let the script explain each parameter in
@@ -100,140 +72,97 @@ download them.
With the options below you can modify the default behavior. Just note that
you should not put an '=' sign between an option name and its value.
-Options:
-
- -b, --build-dir STR Top directory to build the project in.
- Current value: $bdir_status
+Configure options:
+ Top-level directory settings:
+ -b, --build-dir=STR Top directory to build the project in.
+ -i, --input-dir=STR Directory containing input datasets (optional).
+ -s, --software-dir=STR Directory containing necessary software tarballs.
- -i, --input-dir STR Directory containing necessary input datasets.
- Current value: $indir_status
+ Operating mode options:
+ -m, --minmapsize=INT (Gnuastro) Minimum number of bytes to use RAM.
+ -j, --jobs=INT Number of threads to build the software.
+ -e, --existing-conf Use (possibly existing) local configuration.
+ -h, --help Print this help list.
- -s, --software-dir STR Directory containing necessary software tarballs.
- Current value: $software_status
+Mandatory or optional arguments to long options are also mandatory or optional
+for any corresponding short options.
- -m, --minmapsize INT (Specific to Gnuastro) Number of bytes to avoid
- using RAM, use HDD/SSD instead of memory.
- Current value: $mm_status
+Reproducible paper template: https://gitlab.com/makhlaghi/reproducible-paper
- -j, --jobs INT Number of threads to use in building the software
- during the pipeline. Note that on MacOS, currently
- the first phase will be done on a single thread,
- but higher-level software will be built in parallel.
- Current value: $jobs_status
+Report bugs to mohammad@akhlaghi.org
+EOF
+}
- -e, --existing-conf Use (possibly existing) local configuration.
- Current value: $ec_status
- -h, --help Print this help list.
-Mandatory or optional arguments to long options are also mandatory or optional
-for any corresponding short options.
-Reproducible paper template: https://gitlab.com/makhlaghi/reproducible-paper
-Report bugs to mohammad@akhlaghi.org
+# Functions to check option values and complain if necessary.
+function on_off_option_error() {
+ cat <<EOF
+$scriptname: '$1' doesn't take any values.
EOF
+ exit 1
}
+function check_v() {
+ if [ x"$2" = x ]; then
+ echo "$scriptname: option '$1' requires an argument."
+ echo "Try '$scriptname --help' for more information."
+ exit 1;
+ fi
+}
-# Parse the arguments
-# -------------------
-jobs=0
-build_dir=
-input_dir=
-software_dir=
-in_minmapsize=0
-existing_conf=0
+
+# Separate command-line arguments from options. Then put the option
+# value into the respective variable.
+#
+# Each option has two lines because we want to process both these formats:
+# `--name=value' and `--name value'. The former (with `=') is a single
+# command-line argument, so we just need to shift the counter by one. The
+# latter (without `=') is two arguments, so we'll need two shifts.
while [[ $# -gt 0 ]]
do
- key="$1"
- case $key in
- -b|--build-dir)
- build_dir="$2"
- if [ x"$build_dir" = x ]; then
- echo "No argument given to '--build-dir' ('-b')."
- exit 1;
- fi
- shift # past argument
- shift # past value
- ;;
- -i|--input-dir)
- input_dir="$2"
- if [ x"$input_dir" = x ]; then
- echo "No argument given to '--input-dir' ('-i')."
- exit 1;
- fi
- shift # past argument
- shift # past value
- ;;
- -s|--software-dir)
- software_dir="$2"
- if [ x"$software_dir" = x ]; then
- echo "No argument given to '--software-dir' ('-s')."
- exit 1;
- fi
- shift # past argument
- shift # past value
- ;;
- -m|--minmapsize)
- in_minmapsize="$2"
- if [ x"$in_minmapsize" = x ]; then
- echo "No argument given to '--minmapsize' ('-m')."
- exit 1;
- fi
- shift # past argument
- shift # past value
- ;;
- -j|--jobs)
- jobs="$2"
- if [ x"$jobs" = x ]; then
- echo "No argument given to '--jobs' ('-j')."
- exit 1;
- fi
- shift # past argument
- shift # past value
- ;;
- -e|--existing-conf)
- existing_conf=1
- shift # past argument
- ;;
-
- -h|-P|--help|--printparams)
- help_print
- exit 0
- ;;
-# -V|--version)
-# echo $version
-# exit 0
-# ;;
- *) # unknown option
- cat <<EOF
-Usage: $me [OPTION]...
-'$1' isn't a recognized option. Aborted.
-
-Note that for this script, option names (short or long format) and values
-must be separated by atleast one white-space character and MUST NOT have
-an '=' between them.
-EOF
- exit 1
- ;;
- esac
+ case $1 in
+ # Input parameters.
+ -b=*|--build-dir=*) build_dir="${1#*=}"; check_v $1 "$build_dir"; shift;;
+ -b|--builddir) build_dir="$2"; check_v $1 "$build_dir"; shift;shift;;
+ -i=*|--inputdir=*) input_dir="${1#*=}"; check_v $1 "$input_dir"; shift;;
+ -i|--inputdir) input_dir="$2"; check_v $1 "$input_dir"; shift;shift;;
+ -s=*|--software-dir=*) software_dir="${1#*=}"; check_v $1 "$software_dir"; shift;;
+ -s|--software-dir) software_dir="$2"; check_v $1 "$software_dir"; shift;shift;;
+ -m=*|--minmapsize=*) minmapsize="${1#*=}"; check_v $1 "$minmapsize"; shift;;
+ -m|--minmapsize) minmapsize="$2"; check_v $1 "$minmapsize"; shift;shift;;
+
+ # Operating mode options.
+ -j=*|--jobs=*) jobs="${1#*=}"; check_v $1 "$jobs"; shift;;
+ -j|--jobs) jobs="$2"; check_v $1 "$jobs"; shift;shift;;
+ -e=*|--existing-conf=*) existing_conf="${1#*=}"; check_v $1 "$existing_conf"; shift;;
+ -e|--existing-conf) existing_conf="$2"; check_v $1 "$existing_conf"; shift;shift;;
+ -?|--help) print_help; exit 0;;
+
+ # Unrecognized option:
+ -*) echo "$scriptname: unknown option '$1'"; exit 1;;
+
+ # Not an option, an argument.
+ *) echo "The configure script doesn't accept arguments."; exit 1;;
+ esac
done
-# Important internal locations
-# ----------------------------
+# Internal directories
+# --------------------
#
# These are defined to help make this script more readable.
topdir=$(pwd)
+lbdir=.build
installedlink=.local
-lbdir=reproduce/build
cdir=reproduce/config
optionaldir="/optional/path"
@@ -339,7 +268,6 @@ if [ -f $pconf ] || [ -f $glconf ]; then
if [ -f $pconf ]; then rewritepconfig=no; fi
if [ -f $glconf ]; then rewritegconfig=no; fi
fi
- echo
fi
@@ -428,12 +356,13 @@ if [ $rewritepconfig = yes ]; then
Build directory
===============
-The "source" (this directory) and "build" directories are treated
+The project's "source" (this directory) and "build" directories are treated
separately. This greatly helps in managing the many intermediate files that
are created during the build. The intermediate build files don't need to be
-archived or backed up: you can always re-build them with this reproduction
-pipeline. The build directory also needs a relatively large amount of free
-space (atleast serveral Giga-bytes).
+archived or backed up: you can always re-build them with the contents of
+the source directory. The build directory also needs a relatively large
+amount of free space (atleast serveral Giga-bytes), while the source
+directory (all plain text) will usually be a mega-byte or less.
'$lbdir' (a symbolic link to the build directory) will also be created
during this configuration. It can help encourage you to set the actual
@@ -564,51 +493,6 @@ fi
-# Memory mapping minimum size
-# ---------------------------
-#
-# This option is specific to GNU Astronomy Utilities. It is primarily
-# included here as a demonstration option for software that need special
-# local settings (that are irrelevant to their processing, but necessary to
-# set based on local settings). If you do not use Gnuastro, please remove
-# this option from this script.
-if [ x"$in_minmapsize" = x ]; then
- minmapsize=10000000000
-else
- minmapsize=$in_minmapsize
-fi
-if [ $rewritegconfig = yes ] && [ $in_minmapsize = 0 ]; then
- cat <<EOF
-
----------------------------
-Minimum memory mapping size
----------------------------
-
-Some programs (for example Gnuastro) can deal with cases where the local
-system doesn't have enough memory (RAM) to keep large files. For example,
-they will create memory-mapped (mmap) files on the HDD or SSD and
-read/write to/from them instead of RAM. This will ofcourse, slow down the
-processing, but atleast the program won't crash.
-
-Since the memory requirements of different systems are different and it has
-no effect on the software's final result, the minimum size of an allocated
-array to warrant a mapping to HDD/SSD instead of RAM must also be defined
-here. This value will be used in the programs that support this feature.
-
-EOF
-
- read -p"Minimum memory mapping size in bytes (default: $minmapsize): " \
- tmpminmapsize
- if [ x"$tmpminmapsize" != x ]; then
- minmapsize=$tmpminmapsize
- echo " -- Using '$minmapsize'"
- fi
-fi
-
-
-
-
-
# Write the parameters into the local configuration file.
if [ $rewritepconfig = yes ]; then