Pipeline now downloads and uses an input dataset

In most analysis situations (except for simulations), an input dataset is necessary, but that part of the pipeline was just left out and a general `SURVEY' variable was set and never used. So with this commit, we actually use a sample FITS file from the FITS standard webpage, show it (as well as its histogram) and do some basic calculations on it. This preparation of the input datasets is done in a generic way to enable easy addition of more datasets if necessary.
author: Mohammad Akhlaghi <mohammad@akhlaghi.org> 2018-11-25 15:22:48 +0000
committer: Mohammad Akhlaghi <mohammad@akhlaghi.org> 2018-11-25 15:41:00 +0000
commit: e623102768c426e86b0ed73904168006dfea2af9 (patch)
tree: ea5f0d95219398ff47fb0dc8ef92aa5e5173a956
parent: 91eebe85edf38338bc4baed58d6a970c0f6b6b79 (diff)
15 files changed, 414 insertions, 100 deletions
diff --git a/README-pipeline.md b/README-pipeline.md
index ff15094..6effa30 100644
--- a/README-pipeline.md
+++ b/README-pipeline.md
@@ -516,6 +516,7 @@ advanced in later stages of your work.
      them.
 
    - Delete marked part(s) in `configure`.
+   - Delete the `reproduce/config/gnuastro` directory.
    - Delete `astnoisechisel` from the value of `top-level-programs` in `reproduce/src/make/dependencies.mk`. You can keep the rule to build `astnoisechisel`, since its not in the `top-level-programs` list, it (and all the dependencies that are only needed by Gnuastro) will be ignored.
    - Delete marked parts in `reproduce/src/make/initialize.mk`.
    - Delete `and Gnuastro \gnuastroversion` from `tex/preamble-style.tex`.
@@ -526,51 +527,31 @@ advanced in later stages of your work.
      commented thoroughly and reading over the comments should guide you on
      what to add/remove and where.
 
- - **Input dataset (can be done later)**: The user manages the top-level
-     directory of the input data through the variables set in
-     `reproduce/config/pipeline/LOCAL.mk.in` (the user actually edits a
-     `LOCAL.mk` file that is created by `configure` from the `.mk.in` file,
-     but the `.mk` file is not under version control). Datasets are usually
-     large and the users might already have their copy don't need to
-     download them). So you can define a variable (all in capital letters)
-     in `reproduce/config/pipeline/LOCAL.mk.in`. For example if you are
-     working on data from the XDF survey, use `XDF`. You can use this
-     variable to identify the location of the raw inputs on the running
-     system. Here, we'll assume its name is `SURVEY`. Afterwards, change
-     any occurrence of `SURVEY` in the whole pipeline with the new
-     name. You can find the occurrences with a simple command like the ones
-     shown below. We follow the Make convention here that all
-     `ONLY-CAPITAL` variables are those directly set by the user and all
-     `small-caps` variables are set by the pipeline designer. All variables
-     that also depend on this survey have a `survey` in their name. Hence,
-     also correct all these occurrences to your new name in small-caps. Of
-     course, ignore/delete those occurrences that are irrelevant, like
-     those in this file. Note that in the raw version of this template no
-     target depends on these files, so they are ignored. Afterwards, set
-     the webpage and correct the filenames in
-     `reproduce/src/make/download.mk` if necessary.
-
-     ```shell
-     $ grep -r SURVEY ./
-     $ grep -r survey ./
-     ```
-
- - **Other input datasets (can be done later)**: Add any other input
-     datasets that may be necessary for your research to the pipeline based
-     on the example above.
+ - **Input dataset (can be done later)**: The input datasets are managed
+     through the `reproduce/config/pipeline/INPUTS.mk` file. It is best to
+     gather all the information regarding all the input datasets into this
+     one central file. To ensure that the proper dataset is being
+     downloaded and used by the pipeline, its best to also get an MD5
+     checksum (https://en.wikipedia.org/wiki/MD5) of the file and include
+     that in thsi file so you can check it in the pipeline. The preparation
+     of the input datasets is done in
+     `reproduce/src/make/download.mk`. Have a look there to see how these
+     values are to be used. This information about the input datasets is
+     also used in the initial `configure` script (to inform the users), so
+     also modify that file.
 
  - **Delete dummy parts (can be done later)**: The template pipeline
-     contains some parts that are only for the initial/test run, not for
-     any real analysis. The respective files to remove and parts to fix are
-     discussed here.
+     contains some parts that are only for the initial/test run, mainly as
+     a demonstration of important steps. They not for any real
+     analysis. You can remove these parts in the file below
 
      - `paper.tex`: Delete the text of the abstract and the paper's main
        body, *except* the "Acknowledgments" section. This reproduction
        pipeline was designed by funding from many grants, so its necessary
        to acknowledge them in your final research.
 
-     - `Makefile`: Delete the two lines containing `delete-me` in the
-       `foreach` loops. Just make sure the other lines that end in `\` are
+     - `Makefile`: Delete the lines containing `delete-me` in the `foreach`
+       loops. Just make sure the other lines that end in `\` are
        immediately after each other.
 
      - Delete all `delete-me*` files in the following directories:
diff --git a/configure b/configure
index c33d646..2922365 100755
--- a/configure
+++ b/configure
@@ -42,6 +42,7 @@ topdir=$(pwd)
 installedlink=.local
 lbdir=reproduce/build
 cdir=reproduce/config
+optionaldir="/optional/path"
 
 pdir=$cdir/pipeline
 pconf=$pdir/LOCAL.mk
@@ -100,7 +101,7 @@ function create_file_with_notice() {
 # Since the build directory will go into a symbolic link, we want it to be
 # an absolute address. With this function we can make sure of that.
 function absolute_dir() {
-    echo "$(cd "$(dirname "$inbdir")" && pwd )/$(basename "$inbdir")"
+    echo "$(cd "$(dirname "$1")" && pwd )/$(basename "$1")"
 }
 
 
@@ -179,7 +180,8 @@ fi
 # the web address.
 if [ $rewritepconfig = yes ]; then
     if type wget > /dev/null 2>/dev/null; then
-        downloader="wget --no-use-server-timestamps -O";
+        wgetname=$(which wget)
+        downloader="$wgetname --no-use-server-timestamps -O";
     else
         cat <<EOF
 
@@ -256,11 +258,59 @@ fi
 
 
 
+# Input directory
+# ---------------
+indir=$optionaldir
+wfpc2name=$(awk '!/^#/ && $1=="WFPC2IMAGE" {print $3}' $pdir/INPUTS.mk)
+wfpc2md5=$(awk  '!/^#/ && $1=="WFPC2MD5"   {print $3}' $pdir/INPUTS.mk)
+wfpc2size=$(awk '!/^#/ && $1=="WFPC2SIZE"  {print $3}' $pdir/INPUTS.mk)
+wfpc2url=$(awk  '!/^#/ && $1=="WFPC2URL"   {print $3}' $pdir/INPUTS.mk)
+if [ $rewritepconfig = yes ]; then
+    cat <<EOF
+
+----------------------------------
+(OPTIONAL) Input dataset directory
+----------------------------------
+
+This pipeline needs the dataset(s) listed below. If you already have them,
+please specify the directory hosting them on this system. If you don't,
+they will be downloaded automatically. Each file is shown with its total
+volume and its 128-bit MD5 checksum in parenthesis.
+
+  $wfpc2name ($wfpc2size, $wfpc2md5):
+    A 100x100 Hubble Space Telescope WFPC II image used in the FITS
+    standard webpage as a demonstration of this file format.
+    URL: $wfpc2url/$wfpc2name
+
+  $uitname ($uitsize, $uitmd5):
+    A 512x512 Astro1 Ultraviolet Imaging Telescope image used in the FITS
+    standard webpage as a demonstration of this file format.
+    URL: $uiturl/$uitname
+
+NOTE: This directory, or the datasets above, are optional. If it doesn't
+exist, the files will be downloaded in the build directory and used.
+
+TIP: If you have these files in multiple directories on your system and
+don't want to download them or make duplicates, you can create symbolic
+links to them and put those symbolic links in the given top-level
+directory.
+
+EOF
+    read -p"(OPTIONAL) Input datasets directory ($indir): " inindir
+    if [ x$inindir != x ]; then
+        indir=$inindir
+        echo " -- Using '$indir'"
+    fi
+fi
+
+
+
+
+
 # Dependency tarball directory
 # ----------------------------
 if [ $rewritepconfig = yes ]; then
-    junkddir="/optional/path"
-    ddir=$junkddir
+    ddir=$optionaldir
     cat <<EOF
 
 ---------------------------------------
@@ -282,7 +332,6 @@ EOF
         ddir=$tmpddir
         echo " -- Using '$ddir'"
     fi
-    echo
 fi
 
 
@@ -292,7 +341,7 @@ fi
 # Memory mapping minimum size
 # ---------------------------
 if [ $rewritegconfig = yes ]; then
-    defaultminmapsize=1000000000
+    defaultminmapsize=10000000000
     minmapsize=$defaultminmapsize
     cat <<EOF
 
@@ -329,18 +378,57 @@ fi
 if [ $rewritepconfig = yes ]; then
     create_file_with_notice $pconf
     sed -e's|@bdir[@]|'"$bdir"'|'              \
+        -e's|@indir[@]|'"$indir"'|'            \
         -e's|@ddir[@]|'"$ddir"'|'              \
         -e's|@downloader[@]|'"$downloader"'|'  \
         $pconf.in >> $pconf
 else
     # Read the values from existing configuration file.
-    inbdir=$(awk     '$1=="BDIR"             {print $NF}' $pconf)
-    ddir=$(awk       '$1=="DEPENDENCIES-DIR" {print $NF}' $pconf)
-    downloader=$(awk '$1=="DOWNLOADER"       {print $NF}' $pconf)
+    inbdir=$(awk     '$1=="BDIR"             {print $3}' $pconf)
+    downloader=$(awk '$1=="DOWNLOADER"       {print $3}' $pconf)
+
+    # Make sure all necessary variables have a value
+    err=0
+    verr=0
+    novalue=""
+    if [ x"$inbdir"     = x ]; then novalue="BDIR, ";              fi
+    if [ x"$downloader" = x ]; then novalue="$novalue"DOWNLOADER;  fi
+    if [ x"$novalue"   != x ]; then verr=1; err=1;                 fi
 
     # Make sure `bdir' is an absolute path and it exists.
+    berr=0
+    ierr=0
     bdir=$(absolute_dir $inbdir)
-    if ! [ -d $bdir ]; then mkdir $bdir; fi
+
+    if ! [ -d $bdir  ]; then if ! mkdir $bdir; then berr=1; err=1; fi; fi
+    if [ $err = 1 ]; then
+        cat <<EOF
+
+#################################################################
+########  ERORR reading existing configuration file  ############
+#################################################################
+EOF
+        if [ $verr = 1 ]; then
+            cat <<EOF
+
+These variables have no value: $novalue.
+EOF
+        fi
+        if [ $berr = 1 ]; then
+           cat <<EOF
+
+Couldn't create the build directory '$bdir' (value to 'BDIR') in
+'$pconf'.
+EOF
+        fi
+
+        cat <<EOF
+
+Please run the configure script again (accepting to re-write existing
+configuration file) so all the values can be filled and checked.
+#################################################################
+EOF
+    fi
 fi
 
 
diff --git a/paper.tex b/paper.tex
index 53176cd..32a3465 100644
--- a/paper.tex
+++ b/paper.tex
@@ -53,7 +53,9 @@
 
   \textsl{Keywords}: Add some keywords for your research here.
 
-  \textsl{Reproducible paper}: Reproduction pipeline \pipelineversion{}
+  \textsl{Reproducible paper}: All quantitave results (numbers and plots)
+  in this paper are exactly reproducible with reproduction pipeline
+  \pipelineversion{}
   (\url{https://gitlab.com/makhlaghi/reproducible-paper}).}
 
 %% To add the first page's headers.
@@ -69,8 +71,8 @@ Congratulations on running the reproduction pipeline! You can now follow
 the checklist in the \texttt{README.md} file to customize this pipeline to
 your exciting research project.
 
-Just don't forget to \emph{never} use any numbers or fixed strings (for
-example database urls like \url{\websurvey}) directly within your \LaTeX{}
+Just don't forget to \emph{never} use numbers or fixed strings (for example
+database urls like \url{\wfpctwourl}) directly within your \LaTeX{}
 source. Read them directly from your configuration files or outputs of the
 programs as part of the reproduction pipeline and import them into \LaTeX{}
 as macros through the \texttt{tex/pipeline.tex} file. See the several
@@ -83,14 +85,12 @@ or
 in this way, will let you focus clearly on your science and not have to
 worry about fixing this or that number/name in the text.
 
-Just as a demonstration of creating plots within \LaTeX{} (using the
-{\small PGFP}lots package), in Figure \ref{deleteme} we show a simple
-plot, where the Y axis is the square of the X axis. The minimum value
-in this distribution is $\deletememin$, and $\deletememax$ is the
-maximum. Take a look into the \LaTeX{} source and you'll see these
-numbers are actually macros that were calculated from the same dataset
-(they will change if the dataset, or function that produced it,
-changes).
+Figure \ref{deleteme} shows a simple plot as a demonstration of creating
+plots within \LaTeX{} (using the {\small PGFP}lots package). The minimum
+value in this distribution is $\deletememin$, and $\deletememax$ is the
+maximum. Take a look into the \LaTeX{} source and you'll see these numbers
+are actually macros that were calculated from the same dataset (they will
+change if the dataset, or function that produced it, changes).
 
 The individual {\small PDF} file of Figure \ref{deleteme} is available
 under the \texttt{tex/build/tikz/} directory of your build directory. You
@@ -100,15 +100,6 @@ progress or after publishing the work). If you want to directly use the
   KZ} decide if it should be remade or not, you can also comment the
 \texttt{makepdf} macro at the top of this \LaTeX{} source file.
 
-{\small PGFP}lots is a great tool to build the plots within \LaTeX{} and
-removes the necessity to add further dependencies (to create the plots) to
-your reproduction pipeline. High-level language libraries like Matplotlib
-do exist to also generate plots. However, bare in mind that they require
-many dependencies (Python, Numpy and etc). Installing these dependencies
-from source (after several years when the binaries are no longer available
-in common repositories), is not easy and will harm the reproducibility of
-your paper.
-
 \begin{figure}[t]
   \includetikz{delete-me}
 
@@ -116,10 +107,39 @@ your paper.
     demonstration.}
 \end{figure}
 
+Figure \ref{deleteme-wfpc2} is another demonstration of showing images
+(datasets) using PGFPlots. It shows a small crop of an image from the
+Wide-Field Planetary Camera 2, on board the Hubble Space Telescope from
+1993 to 2009. This cropped image is one of the sample FITS files from the
+FITS file standard
+webpage\footnote{\url{https://fits.gsfc.nasa.gov/fits_samples.html}}. Just
+as another basic reporting of measurements on this dataset within the paper
+without using numbers in the \LaTeX{} source, the mean is
+$\deletemewfpctwomean$ and the median is $\deletemewfpctwomedian$. The
+skewness in the histogram of Figure \ref{deleteme-wfpc2}(b) explains this
+difference between the mean and median. Also, the value of quantile
+$\deletemewfpcquantile$ (set in the pipeline configuration file
+\texttt{delete-me-wfpc2-quant.mk}) is $\deletemewfpctwoquantile$. The
+dataset was prepared for demonstration here with Gnuastro's
+\textsf{Convert\-Type} program and the histogram and basic statstics were
+generated with Gnuastro's \textsf{Statistics} program.
+
+{\small PGFP}lots\footnote{\url{https://ctan.org/pkg/pgfplots}} is a great
+tool to build the plots within \LaTeX{} and removes the necessity to add
+further dependencies (to create the plots) to your reproduction
+pipeline. There are high-level language libraries like Matplotlib which
+also generate plots. However, the problem is that they require many
+dependencies (Python, Numpy and etc). Installing these dependencies from
+source, is not easy and will harm the reproducibility of your paper. Note
+that after several years, the binary files of these high-level libraries,
+that you easily install today, will no longer be available in common
+repositories. Therefore building the libraries from source is the only
+option to reproduce your results.
+
 Furthermore, since {\small PGFP}lots is built by \LaTeX{} it respects all
-the properties of your text (for example line width and fonts and etc), so
-the final plot blends in your paper much more nicely. It also has a
-wonderful
+the properties of your text (for example line width and fonts and
+etc). Therefore the final plot blends in your paper much more nicely. It
+also has a wonderful
 manual\footnote{\url{http://mirrors.ctan.org/graphics/pgf/contrib/pgfplots/doc/pgfplots.pdf}}.
 
 This pipeline also defines two \LaTeX{} macros that allow you to mark text
@@ -135,7 +155,15 @@ existing coauthors (who are just interested in the new parts or notes) and
 new co-authors (who don't want to be distracted by these issues in their
 first time reading).
 
+\begin{figure}[t]
+  \includetikz{delete-me-wfpc2}
 
+  \captionof{figure}{\label{deleteme-wfpc2} (a) An example image of the
+    Wide-Field Planetary Camera 2, on board the Hubble Space Telescope from
+    1993 to 2009. This is one of the sample images from the FITS standard
+    webpage, kept as examples for this file format. (b) Histogram of pixel
+    values in (a).}
+\end{figure}
 
 
 
@@ -177,12 +205,12 @@ SUNDIAL ITN, and from the Spanish Ministry of Economy and Competitiveness
 
 The following free software tools were also critical component of this
 research (in alphabetical order): Bzip2 \bziptwoversion, CFITSIO
-\cfitsioversion, CMake \cmakeversion, cURL \curlversion, Git \gitversion,
-GNU Bash \bashversion, GNU Coreutils \coreutilsversion, GNU AWK
-\gawkversion, GNU Grep \grepversion, GNU Libtool \libtoolversion, GNU Make
-\makeversion, GNU Sed \sedversion, GNU Scientific Library (GSL)
-\gslversion, GNU Tar \tarversion, GNU Which \whichversion, Lzip
-\lzipversion, GPL Ghostscript \ghostscriptversion, Libgit2
+\cfitsioversion, CMake \cmakeversion, cURL \curlversion, Discoteq flock
+\flockversion, Git \gitversion, GNU Bash \bashversion, GNU Coreutils
+\coreutilsversion, GNU AWK \gawkversion, GNU Grep \grepversion, GNU Libtool
+\libtoolversion, GNU Make \makeversion, GNU Sed \sedversion, GNU Scientific
+Library (GSL) \gslversion, GNU Tar \tarversion, GNU Which \whichversion,
+Lzip \lzipversion, GPL Ghostscript \ghostscriptversion, Libgit2
 \libgitwoversion, Libtiff \libtiffversion, WCSLIB \wcslibversion, XZ Utils
 \xzversion, and ZLib \zlibversion. The final paper was produced with \TeX{}
 Live \texliveversion, using the following packages: \TeX{} \textexversion,
diff --git a/reproduce/config/gnuastro/astconvertt.conf b/reproduce/config/gnuastro/astconvertt.conf
new file mode 100644
index 0000000..fc3ba04
--- /dev/null
+++ b/reproduce/config/gnuastro/astconvertt.conf
@@ -0,0 +1,31 @@
+# Default parameters (System) for ConvertType.
+# ConvertType is part of GNU Astronomy Utitlies.
+#
+# Use the long option name of each parameter followed by a value. The name
+# and value should be separated by atleast one white-space character (for
+# example ` '[space], or tab). Lines starting with `#' are ignored.
+#
+# For more information, please run these commands:
+#
+#  $ astconvertt --help                  # Full list of options, short doc.
+#  $ astconvertt -P                      # Print all options and used values.
+#  $ info astconvertt                    # All options and input/output.
+#  $ info gnuastro "Configuration files" # How to use configuration files.
+#
+# Copying and distribution of this file, with or without modification, are
+# permitted in any medium without royalty provided the copyright notice and
+# this notice are preserved.  This file is offered as-is, without any
+# warranty.
+
+# Input:
+
+# Output:
+ quality              100
+ widthincm            10.0
+ borderwidth          1
+ output               jpg
+
+# Flux:
+ invert               0
+
+# Common options
diff --git a/reproduce/config/gnuastro/aststatistics.conf b/reproduce/config/gnuastro/aststatistics.conf
new file mode 100644
index 0000000..0bf3b83
--- /dev/null
+++ b/reproduce/config/gnuastro/aststatistics.conf
@@ -0,0 +1,34 @@
+# Default parameters (System) for Statistics.
+# Statistics is part of GNU Astronomy Utitlies.
+#
+# Use the long option name of each parameter followed by a value. The name
+# and value should be separated by atleast one white-space character (for
+# example ` '[space], or tab). Lines starting with `#' are ignored.
+#
+# For more information, please run these commands:
+#
+#  $ aststatistics --help                # Full list of options, short doc.
+#  $ aststatistics -P                    # Print all options and used values.
+#  $ info aststatistics                  # All options and input/output.
+#  $ info gnuastro "Configuration files" # How to use configuration files.
+#
+# Copying and distribution of this file, with or without modification, are
+# permitted in any medium without royalty provided the copyright notice and
+# this notice are preserved.  This file is offered as-is, without any
+# warranty.
+
+# Input image:
+
+# Sky and its STD settings
+ khdu                 1
+ meanmedqdiff     0.005
+ outliersigma        10
+ outliersclip     3,0.2
+ smoothwidth          3
+ sclipparams      3,0.1
+
+# Histogram and CFP settings
+ numasciibins        70
+ asciiheight         10
+ numbins            100
+ mirrordist         1.5
diff --git a/reproduce/config/pipeline/INPUTS.mk b/reproduce/config/pipeline/INPUTS.mk
new file mode 100644
index 0000000..3522ecc
--- /dev/null
+++ b/reproduce/config/pipeline/INPUTS.mk
@@ -0,0 +1,9 @@
+# Input files necessary for this pipeline.
+#
+# This file is read by the configure script and running Makefiles.
+
+
+WFPC2IMAGE = WFPC2ASSNu5780205bx.fits
+WFPC2MD5   = a4791e42cd1045892f9c41f11b50bad8
+WFPC2SIZE  = 62kb
+WFPC2URL   = https://fits.gsfc.nasa.gov/samples
diff --git a/reproduce/config/pipeline/LOCAL.mk.in b/reproduce/config/pipeline/LOCAL.mk.in
index d6bf2c0..89e3e23 100644
--- a/reproduce/config/pipeline/LOCAL.mk.in
+++ b/reproduce/config/pipeline/LOCAL.mk.in
@@ -1,4 +1,8 @@
 # Local pipeline configuration.
+#
+# This is just a template for the `./configure' script to fill in. Please
+# don't make any change to this file.
 BDIR             = @bdir@
+INDIR            = @indir@
 DEPENDENCIES-DIR = @ddir@
 DOWNLOADER       = @downloader@
diff --git a/reproduce/config/pipeline/delete-me-wfpc2-quant.mk b/reproduce/config/pipeline/delete-me-wfpc2-quant.mk
new file mode 100644
index 0000000..2ff7456
--- /dev/null
+++ b/reproduce/config/pipeline/delete-me-wfpc2-quant.mk
@@ -0,0 +1,2 @@
+# Number of samples to create
+delete-me-wfpc2-quantile = 0.65
diff --git a/reproduce/config/pipeline/dependency-versions.mk b/reproduce/config/pipeline/dependency-versions.mk
index f85cdbf..dc45b81 100644
--- a/reproduce/config/pipeline/dependency-versions.mk
+++ b/reproduce/config/pipeline/dependency-versions.mk
@@ -5,6 +5,7 @@ bash-version        = 4.4.18
 bzip2-version       = 1.0.6
 cmake-version       = 3.12.4
 coreutils-version   = 8.30
+flock-version       = 0.2.3
 gawk-version        = 4.2.1
 ghostscript-version = 9.26
 git-version         = 2.19.1
diff --git a/reproduce/config/pipeline/web.mk b/reproduce/config/pipeline/web.mk
deleted file mode 100644
index 5af11a7..0000000
--- a/reproduce/config/pipeline/web.mk
+++ /dev/null
@@ -1,6 +0,0 @@
-# Web server(s) hosting the input data for this pipeline.
-#
-# This is the web page containing the files that must be located in the
-# `SURVEY' directory of `reproduce/config/pipeline/LOCAL.mk' on the local
-# system.
-web-survey = https://some.webpage.com/example/server
diff --git a/reproduce/src/make/delete-me.mk b/reproduce/src/make/delete-me.mk
index 67f0440..9227fde 100644
--- a/reproduce/src/make/delete-me.mk
+++ b/reproduce/src/make/delete-me.mk
@@ -25,8 +25,7 @@
 # Dummy dataset
 # -------------
 #
-# We will use AWK's random number generator to generate a random dataset to
-# be imported by PGFPlots for a plot in the paper.
+# We will use AWK to generate a table showing X and X^2 and draw its plot.
 dmdir = $(texdir)/delete-me
 dm    = $(dmdir)/data.txt
 $(dmdir): | $(texdir); mkdir $@
@@ -43,6 +42,60 @@ $(dm): $(pconfdir)/delete-me-num.mk | $(dmdir)
 
 
 
+# WFPC2 image PDF
+# -----------------
+#
+# For an example image, we'll make a PDF copy of the WFPC II image to
+# display in the paper.
+wfpc2dir = $(texdir)/delete-me-wfpc2
+$(wfpc2dir): | $(texdir); mkdir $@
+wfpc2 = $(wfpc2dir)/wfpc2.pdf
+$(wfpc2): $(indir)/$(WFPC2IMAGE) | $(wfpc2dir)
+
+        # When the plotted values are re-made, it is necessary to also
+        # delete the TiKZ externalized files so the plot is also re-made.
+	rm -f $(tikzdir)/delete-me-wfpc2.pdf
+
+        # Convert the dataset to a PDF.
+	astconvertt --fluxhigh=4 $< -h0 -o$@
+
+
+
+
+
+# Histogram of WFPC2 image
+# ------------------------
+#
+# For an example plot, we'll show the pixel value histogram also.
+wfpc2hist = $(wfpc2dir)/wfpc2-hist.txt
+$(wfpc2hist): $(indir)/$(WFPC2IMAGE) | $(wfpc2dir)
+
+        # When the plotted values are re-made, it is necessary to also
+        # delete the TiKZ externalized files so the plot is also re-made.
+	rm -f $(tikzdir)/delete-me-wfpc2.pdf
+
+        # Generate the pixel value distribution
+	aststatistics --lessthan=5 $< -h0 --histogram -o$@
+
+
+
+
+
+# Basic statistics
+# ----------------
+#
+# This is just as a demonstration on how to get analysic configuration
+# parameters from variables defined in `reproduce/config/pipeline'.
+wfpc2stats = $(wfpc2dir)/wfpc2-stats.txt
+$(wfpc2stats): $(indir)/$(WFPC2IMAGE) $(pconfdir)/delete-me-wfpc2-quant.mk \
+              | $(wfpc2dir)
+	aststatistics $< -h0 --mean --median                        \
+	              --quantile=$(delete-me-wfpc2-quantile) > $@
+
+
+
+
+
 # TeX macros
 # ----------
 #
@@ -50,7 +103,7 @@ $(dm): $(pconfdir)/delete-me-num.mk | $(dmdir)
 #
 # NOTE: In LaTeX you cannot use any non-alphabetic character in a variable
 # name.
-$(mtexdir)/delete-me.tex: $(dm)
+$(mtexdir)/delete-me.tex: $(dm) $(wfpc2) $(wfpc2hist) $(wfpc2stats)
 
         # Write the number of random values used.
 	echo "\newcommand{\deletemenum}{$(delete-me-num)}" > $@
@@ -67,6 +120,16 @@ $(mtexdir)/delete-me.tex: $(dm)
 	           {if($$2>max) max=$$2; if($$2<min) min=$$2;}
 	           END{print min, max}' $(dm));
 	v=$$(echo "$$mm" | awk '{printf "%.3f", $$1}');
-	echo "\newcommand{\deletememin}{$$v}"             >> $@;
+	echo "\newcommand{\deletememin}{$$v}"             >> $@
 	v=$$(echo "$$mm" | awk '{printf "%.3f", $$2}');
 	echo "\newcommand{\deletememax}{$$v}"             >> $@
+
+        # Write the statistics of the WFPC2 image as a macro.
+	q=$(delete-me-wfpc2-quantile)
+	echo "\newcommand{\deletemewfpcquantile}{$$q}"            >> $@
+	mean=$$(awk     '{printf("%.2f", $$1)}' $(wfpc2stats))
+	echo "\newcommand{\deletemewfpctwomean}{$$mean}"          >> $@
+	median=$$(awk   '{printf("%.2f", $$2)}' $(wfpc2stats))
+	echo "\newcommand{\deletemewfpctwomedian}{$$median}"      >> $@
+	quantile=$$(awk '{printf("%.2f", $$3)}' $(wfpc2stats))
+	echo "\newcommand{\deletemewfpctwoquantile}{$$quantile}"  >> $@
diff --git a/reproduce/src/make/dependencies.mk b/reproduce/src/make/dependencies.mk
index 8ed359b..a784883 100644
--- a/reproduce/src/make/dependencies.mk
+++ b/reproduce/src/make/dependencies.mk
@@ -43,7 +43,7 @@ ildir  = $(BDIR)/dependencies/installed/lib
 ilidir = $(BDIR)/dependencies/installed/lib/built
 
 # Define the top-level programs to build (installed in `.local/bin').
-top-level-programs = gawk gs grep sed git astnoisechisel texlive-ready
+top-level-programs = gawk gs grep sed git flock astnoisechisel texlive-ready
 all: $(foreach p, $(top-level-programs), $(ibdir)/$(p))
 
 # Other basic environment settings: We are only including the host
@@ -75,6 +75,7 @@ LD_LIBRARY_PATH := $(ildir)
 tarballs = $(foreach t, cfitsio-$(cfitsio-version).tar.gz             \
                         cmake-$(cmake-version).tar.gz                 \
                         curl-$(curl-version).tar.gz                   \
+	                flock-$(flock-version).tar.xz                 \
 	                gawk-$(gawk-version).tar.lz                   \
 	                ghostscript-$(ghostscript-version).tar.gz     \
 	                git-$(git-version).tar.xz                     \
@@ -111,6 +112,7 @@ $(tarballs): $(tdir)/%:
 	    w=https://heasarc.gsfc.nasa.gov/FTP/software/fitsio/c/cfitsio$$v.tar.gz
 	  elif [ $$n = cmake       ]; then w=https://cmake.org/files/v3.12
 	  elif [ $$n = curl        ]; then w=https://curl.haxx.se/download
+	  elif [ $$n = flock       ]; then w=https://github.com/discoteq/flock/releases/download/v$(flock-version)
 	  elif [ $$n = gawk        ]; then w=http://ftp.gnu.org/gnu/gawk
 	  elif [ $$n = ghostscript ]; then w=https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs926
 	  elif [ $$n = git         ]; then w=https://mirrors.edge.kernel.org/pub/software/scm/git
@@ -244,6 +246,9 @@ $(ibdir)/libtool: $(tdir)/libtool-$(libtool-version).tar.xz
 $(ibdir)/gs: $(tdir)/ghostscript-$(ghostscript-version).tar.gz
 	$(call gbuild, $<, ghostscript-$(ghostscript-version))
 
+$(ibdir)/flock: $(tdir)/flock-$(flock-version).tar.xz
+	$(call gbuild, $<, flock-$(flock-version), static)
+
 $(ibdir)/git: $(tdir)/git-$(git-version).tar.xz \
               $(ilidir)/zlib
 	$(call gbuild, $<, git-$(git-version), static)
diff --git a/reproduce/src/make/download.mk b/reproduce/src/make/download.mk
index 9617a45..180d2cf 100644
--- a/reproduce/src/make/download.mk
+++ b/reproduce/src/make/download.mk
@@ -25,20 +25,51 @@
 
 
 
-# Download SURVEY data
+# Download input data
 # --------------------
 #
-# Data from a survey (for example an imaging survey) usually have a special
-# file-name format which should be set here in the `foreach' loop. Note
-# that the `foreach' function needs the backslash (`\') at the end of the
-# line when it is broken into multiple lines.
-all-survey = $(foreach f, $(filters-survey),                                 \
-                          $(SURVEY)/a-special-format-$(f).fits               \
-                          $(SURVEY)/a-possibly-additional-$(f)-format.fits )
-$(SURVEY):; mkdir $@
-$(all-survey): $(SURVEY)/%: | $(SURVEY) $(lockdir)
-	flock $(lockdir)/download -c "$(DOWNLOADER) $@ $(web-survey)/$*"
+# The input dataset properties are defined in `$(pconfdir)/INPUTS.mk'. For
+# this template pipeline we only have one dataset to enable easy
+# processing, so all the extra checks in this rule may seem
+# redundant.
+#
+# However, in a real project, you will need more than one dataset. In that
+# case, just add them to the target list and add an `elif' statement to
+# define it in the recipe.
+#
+# Download lock file: Most systems have a single connection to the
+# internet, therefore downloading is inherently done in series. As a
+# result, when more than one dataset is necessary for download, if they are
+# done in parallel, the speed will be slower than downloading them in
+# series. We thus use the `flock' program to tie/lock the downloading
+# process with a file and make sure that only one downloading event is in
+# progress at every moment.
+$(indir):; mkdir $@
+inputdatasets = $(foreach i, $(WFPC2IMAGE), $(indir)/$(i))
+$(inputdatasets): $(indir)/%: | $(indir) $(lockdir)
+
+        # Set the necessary parameters for this input file.
+	if   [ $* = $(WFPC2IMAGE) ]; then url=$(WFPC2URL); mdf=$(WFPC2MD5);
+	else
+	echo; echo; echo "Not recognized input dataset: '$*'."
+	echo; echo; exit 1
+	fi
+
+        # Download (or make the link to) the input dataset.
+	if [ -f $(INDIR)/$* ]; then
+	  ln -s $(INDIR)/$* $@
+	else
+	  flock $(lockdir)/download $(DOWNLOADER) $@ $$url/$*
+	fi
 
+        # Check the md5 sum to see if this is the proper dataset.
+	sum=$$(md5sum $@ | awk '{print $$1}')
+	if [ $$sum != $$mdf ]; then
+	  wrongname=$(dir $@)/wrong-$(notdir $@)
+	  mv $@ $$wrongname
+	  echo; echo; echo "Wrong MD5 checksum for '$*' in $$wrongname"
+	  echo; echo; exit 1
+	fi
 
 
 
@@ -49,5 +80,5 @@ $(all-survey): $(SURVEY)/%: | $(SURVEY) $(lockdir)
 #
 # It is very important to mention the address where the data were
 # downloaded in the final report.
-$(mtexdir)/download.tex: $(pconfdir)/web.mk | $(mtexdir)
-	@echo "\\newcommand{\\websurvey}{$(web-survey)}" > $@
+$(mtexdir)/download.tex: $(pconfdir)/INPUTS.mk | $(mtexdir)
+	echo "\\newcommand{\\wfpctwourl}{$(WFPC2URL)}" > $@
diff --git a/reproduce/src/make/initialize.mk b/reproduce/src/make/initialize.mk
index 694aca0..41a5e05 100644
--- a/reproduce/src/make/initialize.mk
+++ b/reproduce/src/make/initialize.mk
@@ -34,6 +34,7 @@
 # parallel. Also, some programs may not be thread-safe, therefore it will
 # be necessary to put a lock on them. This pipeline uses the `flock'
 # program to achieve this.
+indir       = $(BDIR)/inputs
 texdir      = $(BDIR)/tex
 srcdir      = reproduce/src
 lockdir     = $(BDIR)/locks
@@ -224,6 +225,14 @@ $(mtexdir)/initialize.tex: | $(mtexdir)
 	fi;                                                                \
 	echo "\newcommand{\\bziptwoversion}{$(bzip2-version)}" >> $@
 
+        # Unfortunately we couldn't find a way to retrieve the version of
+        # the discoteq `flock' that we are using here. So we'll just repot
+        # the version we downloaded and installed.
+	echo "\newcommand{\\flockversion}{$(flock-version)}" >> $@
+
+
+
+
 
         # Versions of libraries.
 	$(call lvcheck, fitsio.h, $(cfitsio-version), CFITSIO, cfitsioversion)
diff --git a/tex/delete-me-wfpc2.tex b/tex/delete-me-wfpc2.tex
new file mode 100644
index 0000000..95b3105
--- /dev/null
+++ b/tex/delete-me-wfpc2.tex
@@ -0,0 +1,34 @@
+\begin{tikzpicture}
+
+  %% The displayed WFPC2 image.
+  \node[anchor=south west] (img) at (0,0)
+       {\includegraphics[width=0.5\linewidth]
+         {\bdir/tex/delete-me-wfpc2/wfpc2.pdf}};
+
+  %% Its label
+  \node[anchor=south west] at (0.45\linewidth,0.45\linewidth)
+       {\textcolor{white}{a}};
+
+  %% This histogram.
+  \begin{axis}[at={(0.52\linewidth,0.1\linewidth)},
+      no markers,
+      axis on top,
+      xmode=normal,
+      ymode=normal,
+      yticklabels={},
+      scale only axis,
+      xlabel=Pixel value,
+      width=0.5\linewidth,
+      height=0.412\linewidth,
+      enlarge y limits=false,
+      enlarge x limits=false,
+      ]
+    \addplot [const plot mark mid, fill=red]
+    table [x index=0, y index=1]
+    {\bdir/tex/delete-me-wfpc2/wfpc2-hist.txt}
+    \closedcycle;
+  \end{axis}
+
+  %% The histogram's label
+  \node[anchor=south west] at (0.95\linewidth,0.45\linewidth) {b};
+\end{tikzpicture}
author	Mohammad Akhlaghi <mohammad@akhlaghi.org>	2018-11-25 15:22:48 +0000
committer	Mohammad Akhlaghi <mohammad@akhlaghi.org>	2018-11-25 15:41:00 +0000
commit	e623102768c426e86b0ed73904168006dfea2af9 (patch)
tree	ea5f0d95219398ff47fb0dc8ef92aa5e5173a956
parent	91eebe85edf38338bc4baed58d6a970c0f6b6b79 (diff)