aboutsummaryrefslogtreecommitdiff
path: root/reproduce/analysis/config
diff options
context:
space:
mode:
authorMohammad Akhlaghi <mohammad@akhlaghi.org>2022-05-09 13:32:47 +0200
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2022-05-09 23:52:29 +0200
commit9fdeebaacd06d57c479cd69e9937c4bfe5d0a286 (patch)
tree012e6194ad6e25a81a9c99b4d0bd0852bc9a12af /reproduce/analysis/config
parent480184b3da399fab11b50e67f01d2efa6bea0e3e (diff)
parentf51b5e2e500dd6450a5a3425e85df78245fc5c5c (diff)
Imported recent updates in Maneage, conflicts fixed
Until now, Maneage had undergone some updates. With this commit, those updates have been imported and the conflicts that resulted were fixed. They were all cosmetic and had no effect on the analysis. The most significant one was about the change in the format of 'INPUTS.conf'. In the process, I also noticed that the IEEEtran LaTeX package is now called 'ieeetran' (the 'tlmgr' of TeXLive 2022 was failing).
Diffstat (limited to 'reproduce/analysis/config')
-rw-r--r--reproduce/analysis/config/INPUTS.conf113
-rw-r--r--reproduce/analysis/config/metadata.conf2
-rw-r--r--reproduce/analysis/config/pdf-build.conf2
-rw-r--r--reproduce/analysis/config/verify-outputs.conf2
4 files changed, 73 insertions, 46 deletions
diff --git a/reproduce/analysis/config/INPUTS.conf b/reproduce/analysis/config/INPUTS.conf
index fd8ac53..f3d1cd4 100644
--- a/reproduce/analysis/config/INPUTS.conf
+++ b/reproduce/analysis/config/INPUTS.conf
@@ -1,42 +1,70 @@
-# Input files necessary for this project, the variables defined in this
-# file are primarily used in 'reproduce/analysis/make/download.mk'. See
-# there for precise usage of the variables. But comments are also provided
-# here.
-#
-# Necessary variables for each input dataset are listed below. Its good
-# that all the variables of each file have the same base-name (in the
-# example below 'DEMO') with descriptive suffixes, also put a short comment
-# above each group of variables for each dataset, shortly explaining what
-# it is.
-#
-# 1) Local file name ('DEMO-DATA' below): this is the name of the dataset
-# on the local system (in 'INDIR', given at configuration time). It is
-# recommended that it be the same name as the online version of the
-# file like the case here (note how this variable is used in 'DEMO-URL'
-# for the dataset's full URL). However, this is not always possible, so
-# the local and server filenames may be different. Ultimately, the file
-# name is irrelevant, we check the integrity with the checksum.
-#
-# 2) The MD5 checksum of the file ('DEMO-MD5' below): this is very
-# important for an automatic verification of the file. You can
-# calculate it by running 'md5sum' on your desired file. You can also
-# use any other checksum tool that you prefer, just be sure to correct
-# the respective command in 'reproduce/analysis/make/download.mk'.
-#
-# 3) The human-readable size of the file ('DEMO-SIZE' below): this is an
-# optional variable, mainly to help a reader of your project get a
-# sense of the volume they need to download if they don't already have
-# the dataset. So it is highly recommended to add it (future readers of
-# your project's source will appreciate it!). You can get it from the
-# output of 'ls -lh' command on the file. Optionally you can use it in
-# messages during the configuration phase (when Maneage asks for the
-# input data directory), along with other info about the file(s).
-#
-# 4) The full dataset URL ('DEMO-URL' below): this is the full URL
-# (including the file-name) that can be used to download the dataset
-# when necessary. Also, see the description above on local filename.
-#
-# Copyright (C) 2018-2021 Mohammad Akhlaghi <mohammad@akhlaghi.org>
+# This project's input file information (metadata).
+#
+# For each input (external) data file that is used within the project,
+# three variables are suggested here (two of them are mandatory). These
+# variables will be used by 'reproduce/analysis/make/download.mk' to import
+# the dataset into the project (within the build directory):
+#
+# - If the file already exists locally in '$(INDIR)' (the optional input
+# directory that may have been specified at configuration time with
+# '--input-dir'), a symbolic link will be added in '$(indir)' (in the
+# build directory). A symbolic link is used to avoid extra storage when
+# files are large.
+#
+# - If the file doesn't exist in '$(INDIR)', or no input directory was
+# specified at configuration time, then the file is downloaded from a
+# specific URL.
+#
+# In both cases, before placing the file (or its link) in the build
+# directory, 'reproduce/analysis/make/download.mk' will check the SHA256
+# checksum of the dataset and if it differs from the pre-defined value (set
+# for that file, here), it will abort (since this is not the intended
+# dataset).
+#
+# Therefore, the two variables specifying the URL and SHA256 checksum of
+# the file are MANDATORY. The third variable (INPUT-%-size) showing the
+# human-readable size of the file (from 'ls -lh') is optional (but
+# recommended: because it gives future scientists to get a feeling of the
+# volume of data they need to input: will become important if the
+# size/number of files is large).
+#
+# The naming convension is critical for the input files to be properly
+# imported into the project. In the patterns below, the '%' is the full
+# file name (including its prefix): for example in the demo input of this
+# file in the 'maneage' branch, we have 'INPUT-wfpc2.fits-sha256':
+# therefore, the input file (within the project's '$(indir)') is called
+# 'wfpc2.fits'. This allows you to simply set '$(indir)/wfpc2.fits' as the
+# pre-requisite of any recipe that needs the input file: you will rarely
+# (if at all!) need to use these variables directly.
+#
+# INPUT-%-sha256: The sha256 checksum of the file. You can generate the
+# SHA256 checksum of a file with the 'sha256sum FILENAME'
+# command (where 'FILENAME' is the name of your
+# file). this is very important for an automatic
+# verification of the file: that it hasn't changed
+# between different runs of the project (locally or in
+# the URL). There are more robust checksum algorithms
+# like the 'SHA' standards.
+#
+# INPUT-%-url: The URL to download the file if it is not available
+# locally. It can happen that during the first phases of
+# your project the data aren't yet public. In this case, you
+# set a phony URL like this (just as a clear place-holder):
+# 'https://this.file/is/not/yet/public'.
+#
+# INPUT-%-size: The human-readable size of the file (output of 'ls
+# -lh'). This is not used by default but can help other
+# scientists who would like to run your project get a
+# good feeling of the necessary network and storage
+# capacity that is necessary to start the project.
+#
+# The input dataset's name (that goes into the '%') can be different from
+# the URL's file name (last component of the URL, after the last '/'). Just
+# note that it is assumed that the local copy (outside of your project) is
+# also called '%' (if your local copy of the input dataset and the only
+# repository names are the same, be sure to set '%' accordingly).
+#
+# Copyright (C) 2018-2022 Mohammad Akhlaghi <mohammad@akhlaghi.org>
#
# Copying and distribution of this file, with or without modification, are
# permitted in any medium without royalty provided the copyright notice and
@@ -48,7 +76,6 @@
# Dataset used in this analysis and its checksum for integrity checking.
-MK20DATA = menke20.xlsx
-MK20MD5 = 8e4eee64791f351fec58680126d558a0
-MK20SIZE = 1.9MB
-MK20URL = https://www.biorxiv.org/content/biorxiv/early/2020/01/18/2020.01.15.908111/DC1/embed/media-1.xlsx
+INPUT-menke20.xlsx-size = 1.9M
+INPUT-menke20.xlsx-url = https://www.biorxiv.org/content/biorxiv/early/2020/01/18/2020.01.15.908111/DC1/embed/media-1.xlsx
+INPUT-menke20.xlsx-sha256 = 7839cdc2946134773ffc401cbcc78fb58fc489d2caad65375c85d605b2f8b13e
diff --git a/reproduce/analysis/config/metadata.conf b/reproduce/analysis/config/metadata.conf
index caac5c9..f570340 100644
--- a/reproduce/analysis/config/metadata.conf
+++ b/reproduce/analysis/config/metadata.conf
@@ -15,7 +15,7 @@
# and the copyright license name and standard link to the fully copyright
# license.
#
-# Copyright (C) 2020-2021 Mohammad Akhlaghi <mohammad@akhlaghi.org>
+# Copyright (C) 2020-2022 Mohammad Akhlaghi <mohammad@akhlaghi.org>
#
# Copying and distribution of this file, with or without modification, are
# permitted in any medium without royalty provided the copyright notice and
diff --git a/reproduce/analysis/config/pdf-build.conf b/reproduce/analysis/config/pdf-build.conf
index 015bf2e..a57b529 100644
--- a/reproduce/analysis/config/pdf-build.conf
+++ b/reproduce/analysis/config/pdf-build.conf
@@ -12,7 +12,7 @@
# LaTeX. Otherwise, a notice will just printed that, no PDF will be
# created.
#
-# Copyright (C) 2018-2021 Mohammad Akhlaghi <mohammad@akhlaghi.org>
+# Copyright (C) 2018-2022 Mohammad Akhlaghi <mohammad@akhlaghi.org>
#
# Copying and distribution of this file, with or without modification, are
# permitted in any medium without royalty provided the copyright notice and
diff --git a/reproduce/analysis/config/verify-outputs.conf b/reproduce/analysis/config/verify-outputs.conf
index d96f293..37fc43c 100644
--- a/reproduce/analysis/config/verify-outputs.conf
+++ b/reproduce/analysis/config/verify-outputs.conf
@@ -1,6 +1,6 @@
# To enable verification of output datasets set this variable to 'yes'.
#
-# Copyright (C) 2019-2021 Mohammad Akhlaghi <mohammad@akhlaghi.org>
+# Copyright (C) 2019-2022 Mohammad Akhlaghi <mohammad@akhlaghi.org>
#
# Copying and distribution of this file, with or without modification, are
# permitted in any medium without royalty provided the copyright notice and