Necessary programs checked at configure time

The mandatory and optional (for example downloader) dependencies are now checked at configure time so users can know what they may be missing before the processing starts. Since its recommended to be run in parallel, it can be hard to find what you are missing after running the pipeline. As part of these checks, the program to use for downloading is now also set at configure time, it is only used as a pre-defined (in `LOCAL.mk') variable during Make's processing. A small title was also added to discus the pipeline architecture that will be filled in the next commit.
author: Mohammad Akhlaghi <mohammad@akhlaghi.org> 2018-02-20 14:27:27 +0100
committer: Mohammad Akhlaghi <mohammad@akhlaghi.org> 2018-02-20 14:38:19 +0100
commit: 8ba0292cd9299e415bc9c2c2a3307d61177f0cf5 (patch)
tree: 5eade2f5d05269e767eecf8d9a1fd601c502d7c4
parent: f21104afffe7a5cb28f7824211f90dc778d45b57 (diff)
5 files changed, 187 insertions, 59 deletions
diff --git a/README b/README
index a322c17..919005b 100644
--- a/README
+++ b/README
@@ -97,16 +97,11 @@ Gnuastro, the version of the other programs will not make a difference.
   -----
 
   This is a small program to manage file locks from the command-line. It is
-  available in most GNU/Linux distributions.
+  available in all GNU/Linux distributions. For those operating systems
+  that don't have it, an implementation that is easy to install is
+  available in the link below.
 
-  If you can't find it in your package manager or on some Mac OS systems,
-  please put a copy of `reproduce/src/flock' file into your search path
-  (this script needs Perl, so have that installed is well). To learn more
-  about the search path and where to install this file, please see the link
-  below. Before running this pipeline you should be able to run the `flock'
-  command on your command-line.
-
-    https://www.gnu.org/software/gnuastro/manual/html_node/Installation-directory.html
+    https://github.com/discoteq/flock
 
 
 
diff --git a/README.md b/README.md
index 11050c4..6308ef9 100644
--- a/README.md
+++ b/README.md
@@ -19,12 +19,15 @@ modification (as described in `README`) as a demonstration and customized
 by editing the existing rules and adding new rules as well as adding new
 Makefiles as the research/project grows.
 
-This file will continue with a discussion of why Make is the perfect
-language/framework for a research reproduction pipeline and how to master
-Make easily. Afterwards, a checklist of actions that are necessary to
-customize this pipeline for your research is provided. The main body of
-this text will finish with some tips and guidelines on how to manage or
-extend it as your research grows. Please share your thoughts and
+This file will continue with a discussion of why Make is a suitable (maybe
+perfect) language/framework for a research reproduction pipeline and how to
+master Make easily (and freely). An introduction is then given to the
+general architecture of the pipeline. It is followed b checklist of steps
+that are necessary to start customizing this pipeline for your
+research. The main body will finish with some tips and guidelines on how to
+manage or extend it as your research grows based on our experiences with
+it. As discussed above, in the appendix, a short introduction on the
+necessity of reproducible science is given. Please share your thoughts and
 suggestions on this pipeline so we can implement them and make it even more
 easier to use and more robust.
 
@@ -120,6 +123,18 @@ Make manual there also.
 
 
 
+Reproduction pipeline architecture
+==================================
+
+In order to adopt this pipeline to your research, it is important to first
+understand its architecture so you can navigate your research within its
+(very general) framework. The version of
+
+
+
+
+
+
 Checklist to customize the pipeline
 ===================================
 
@@ -172,10 +187,19 @@ been explained here), please let us know to correct it.
      readable.
 
    - Delete the description about Gnuastro in `README`.
-   - Delete marked parts in `configure`.
+   - Delete marked part(s) in `configure`.
    - Delete marked parts in `reproduce/src/make/initialize.mk`.
    - Delete `and Gnuastro \gnuastroversion` from `tex/preamble-style`.
 
+ - **Other dependencies**: If there are any more of the dependencies that
+     you don't use (or others that you need), then remove (or add) them in
+     the respective parts of `configure`. It is commented thoroughly and
+     reading over the comments should guide you on what to add/remove and
+     where. Note that it is always good to have an option to download the
+     necessary datasets in case the user doesn't have them. But in case
+     your pipeline doesn't need any downloads, you can also remove the
+     sections of `configure' that are for `flock' and the downloader.
+
  - **`README`**: Go through this top-level instruction file and make it fit
      to your pipeline: update the text and etc. Don't forget that your
      colleagues or anyone else, will first be drawn to read this file, so
diff --git a/configure b/configure
index 4f13523..cfc724e 100755
--- a/configure
+++ b/configure
@@ -24,45 +24,103 @@
 
 
 
-# Top level locations
+# Important internal locations
+# ----------------------------
+#
+# These are defined to help make this script more readable.
 cdir=reproduce/config
 pdir=$cdir/pipeline
 pconf=$pdir/LOCAL.mk
 ptconf=$pdir/LOCAL_tmp.mk
 poconf=$pdir/LOCAL_old.mk
-gconf=$cdir/gnuastro/gnuastro-local.conf
+glconf=$cdir/gnuastro/gnuastro-local.conf
 
 
 
 
 
-# Functions.
-function add_top_notice() {
-    if echo "# DO NOT EDIT MANUALLY: this is an automatically generated file." > $1
-    then
-        echo "#"                                                   >> $1
-        echo "# This file is generated from the reproduction"      >> $1
-        echo "# pipeline's './configure' script. Please re-run"    >> $1
-        echo "# that command."                                     >> $1
+# Check mandatory dependencies
+# ----------------------------
+#
+# The list of program names you need for this pipeline is in the `for' loop
+# below. In case you don't need Gnuastro, then remove `astnoisechisel' from
+# the list.
+echo "---------------------"
+echo "Checking dependencies"
+echo "---------------------"
+for prog in cat sed make awk grep flock astnoisechisel pdflatex biber; do
+    if type $prog > /dev/null; then
+        echo "  '$prog' was found."
     else
+        echo
+        echo "ERROR: '$prog' not found in search path."
+        if [ $prog = "flock" ]; then
+            echo
+            echo "'flock' is available on GNU/Linux operating system"
+            echo "repositories, please install it through your package"
+            echo "manager. For other OSs, you can install the "
+            echo "implementation at: https://github.com/discoteq/flock"
+        fi
         exit 1
     fi
-}
+done
+
+
+
+
+
+# Identify the downloader tool
+# ----------------------------
+#
+# If cURL is already present, that will be used, otherwise, we'll use
+# Wget. Since the options specifying the output filename are different
+# between the two, we'll also specify the output option within the
+# `downloader' variable. So it is important to first give the output
+# filename after calling `DOWNLOADER' within the Makefiles, and finish the
+# command with the web address.
+print_downloader_notice=1
+if type curl > /dev/null; then
+    downloader="curl -o"
+elif type wget > /dev/null; then
+    downloader="wget -O";
+else
+    echo
+    echo "======="
+    echo "Warning"
+    echo "======="
+    echo "Couldn't find any of the 'curl' or 'wget' programs. They are used for"
+    echo "downloading necessary data if they aren't already present in the"
+    echo "specified directories. Therefore the pipeline will crash if the"
+    echo "necessary data are not already present on the system."
+    echo "======="
+    echo
+    downloader="no-downloader-found"
+    print_downloader_notice=0
+fi;
+if [ $print_downloader_notice = 1 ]; then
+    prog=$(echo "$downloader" | awk '{print $1}')
+    echo "  '$prog' will be used for downloading files if necessary."
+fi
 
 
 
 
 
-# If `LOCAL.mk' already exists, then copy it to an `.old' file.
+# If `LOCAL.mk' already exists
+# ----------------------------
+#
+# `LOCAL.mk' is the top-most local configuration for the pipeline. If it
+# already exists when this script is run, we'll copy it to a `LOCAL.mk.old'
+# file as backup. For example the user might have ran `./configure' by
+# mistake.
 if [ -f $pconf ]; then
     if mv $pconf $poconf; then
         echo
-        echo "-------"
+        echo "======="
         echo "WARNING"
-        echo "-------"
+        echo "======="
         echo "  Existing configuration moved to '$poconf'."
         echo
-        echo
     else
         exit 1
     fi
@@ -72,27 +130,39 @@ fi
 
 
 
-# Using the base file, prepare the output file.
-cp $pconf.in $ptconf
+# Write values obtained so far
+# ----------------------------
+#
+# We'll start writing of the local configuration file with the values that
+# have been found so far.
+sed -e 's|@downloader[@]|'"$downloader"'|g'           \
+    $pconf.in > $ptconf
 
 
 
 
 
-# Tell the user to edit the directories.
+# Inform the user
+# ---------------
+#
+# Print some basic information so the user gets a feeling of what is going
+# on and is prepared on what will happen next.
 echo
 echo "-----------------------------------------"
 echo "Reproduction pipeline local configuration"
 echo "-----------------------------------------"
 echo
 echo "Local settings include things like top-level directories,"
-echo "or processing steps (e.g., if you want a final PDF output)."
+echo "or processing steps."
 echo
 echo "Pressing 'y' will open the local settings file in an editor"
 echo "so you can modify the default values if you want. Each"
 echo "variable is thoroughly described in the comments (lines"
 echo "starting with a '#') above it."
 echo
+echo "It is strongly recommended to inspect/edit/set the best "
+echo "values for your system (where necessary)."
+echo
 while [ "$userread" != "y" -a "$userread" != "n" ]
 do
     read -p"Edit the default local configuration (y/n)? " userread
@@ -102,7 +172,11 @@ done
 
 
 
-# Open an editor if the user wants to edit the file.
+# Let user to edit local settings
+# -------------------------------
+#
+# We'll open a text editor so the user can read the comments of the
+# necessary local settings and set the top directories manually.
 if [ $userread = "y" ]; then
 
     # Open a text editor to set the given directories
@@ -118,7 +192,7 @@ if [ $userread = "y" ]; then
         echo "Please set the values in the following files manually:"
         echo "  - $pconf"
         # --------- Delete for no Gnuastro ---------
-        echo "  - $gconf"
+        echo "  - $glconf"
         # ------------------------------------------
         echo "================="
         echo
@@ -131,20 +205,53 @@ fi
 
 
 
+
+# Notice for top of files
+# -----------------------
+#
+# In case someone opens the files output from the configuration scripts in
+# a text editor and wants to edit them, it is important to let them know
+# that their changes are not going to be permenant.
+function create_file_with_notice() {
+    if echo "# IMPORTANT: file will be RE-WRITTEN after './configure'" > $1
+    then
+        echo "#"                                                      >> $1
+        echo "# This file was created during the reproduction"        >> $1
+        echo "# pipeline's configuration ('./configure'). Therefore," >> $1
+        echo "# it is not under version control and any manual "      >> $1
+        echo "# changes to it will be over-written if the pipeline "  >> $1
+        echo "# is re-configured."                                    >> $1
+        echo "#"                                                      >> $1
+    else
+        exit 1
+    fi
+}
+
+
+
+
+
 # --------- Delete for no Gnuastro ---------
-# From the input file, set the Gnuastro configuration file.
+# Gnuastro's local configuration settings
+#
+# The `minmapsize' parameter has been set by the user, so we can read it
+# and add it to Gnuastro's local configuration file.
+create_file_with_notice $glconf
 mm=$(awk '$1=="MINMAPSIZE"{print $3}' $ptconf)
-add_top_notice $gconf
-echo "minmapsize $mm" >> $gconf
+echo "# Minimum number of bytes to use HDD/SSD instead of RAM." >> $glconf
+echo " minmapsize $mm"                                          >> $glconf
 # ------------------------------------------
 
 
 
 
 
+# Final pipeline local settings
+# -----------------------------
+#
 # Make the final file that will be used and delete the temporary file along
 # with a possible file ending with `~' that is put there by some editors.
-add_top_notice $pconf
+create_file_with_notice $pconf
 cat $ptconf >> $pconf
 rm -f $ptconf $ptconf"~"
 
@@ -152,7 +259,11 @@ rm -f $ptconf $ptconf"~"
 
 
 
-# Print a final notice.
+# Print a final notice
+# --------------------
+#
+# The configuration is now complete, we can inform the user on the next
+# step(s) to take.
 echo
 if [ $ready = 1 ]; then
     echo "This reproduction pipeline has been configured for this system."
diff --git a/reproduce/config/pipeline/LOCAL.mk.in b/reproduce/config/pipeline/LOCAL.mk.in
index ac8e10e..7a29344 100644
--- a/reproduce/config/pipeline/LOCAL.mk.in
+++ b/reproduce/config/pipeline/LOCAL.mk.in
@@ -68,3 +68,17 @@ BDIR = reproduce/BDIR
 # be defined here. This value will be used in the programs that support
 # this feature.
 MINMAPSIZE = 1000000000
+
+
+
+
+
+# Downloader program
+# ------------------
+#
+# The downloder program (and its output option name) that will be used if
+# any of the necessary datasets aren't already available on the
+# system. This is usually set at an early stage of the configuration system
+# automatically before the file is opened for editing by the user. It is
+# thus recommended to not modify it manually.
+DOWNLOADER = @downloader@
diff --git a/reproduce/src/make/download.mk b/reproduce/src/make/download.mk
index 244bd04..378e5eb 100644
--- a/reproduce/src/make/download.mk
+++ b/reproduce/src/make/download.mk
@@ -24,22 +24,6 @@
 
 
 
-# Identify the downloader tool
-# ----------------------------
-#
-# If cURL is already present, that will be used, otherwise, we'll use
-# Wget. Since the options specifying the output filename are different
-# between the two, we'll also specify the output option within the
-# `downloader' variable. So it is important to first give the output
-# filename after calling `downloader', then the web address.
-downloader := $(shell if type curl > /dev/null; then downloader="curl -o";  \
-	              else                           downloader="wget -O";  \
-	              fi; echo "$$downloader";                               )
-
-
-
-
-
 # Download SURVEY data
 # --------------------
 #
@@ -52,7 +36,7 @@ all-survey = $(foreach f, $(filters-survey),                                 \
                           $(SURVEY)/a-possibly-additional-$(f)-format.fits )
 $(SURVEY):; mkdir $@
 $(all-survey): $(SURVEY)/%: | $(SURVEY) $(lockdir)
-	flock $(lockdir)/download -c "$(downloader) $@ $(web-survey)/$*"
+	flock $(lockdir)/download -c "$(DOWNLOADER) $@ $(web-survey)/$*"
author	Mohammad Akhlaghi <mohammad@akhlaghi.org>	2018-02-20 14:27:27 +0100
committer	Mohammad Akhlaghi <mohammad@akhlaghi.org>	2018-02-20 14:38:19 +0100
commit	8ba0292cd9299e415bc9c2c2a3307d61177f0cf5 (patch)
tree	5eade2f5d05269e767eecf8d9a1fd601c502d7c4
parent	f21104afffe7a5cb28f7824211f90dc778d45b57 (diff)