diff options
Diffstat (limited to 'reproduce/analysis/config')
-rw-r--r-- | reproduce/analysis/config/INPUTS.conf | 115 |
1 files changed, 77 insertions, 38 deletions
diff --git a/reproduce/analysis/config/INPUTS.conf b/reproduce/analysis/config/INPUTS.conf index 75e24de..5970ae3 100644 --- a/reproduce/analysis/config/INPUTS.conf +++ b/reproduce/analysis/config/INPUTS.conf @@ -1,9 +1,10 @@ # This project's input file information (metadata). # # For each input (external) data file that is used within the project, -# three variables are suggested here (two of them are mandatory). These -# variables will be used by 'reproduce/analysis/make/download.mk' to import -# the dataset into the project (within the build directory): +# three variables are suggested here (only the verification variable is +# strictly mandatory). These variables will be used by the download rule of +# 'reproduce/analysis/make/initialize.mk' to import the dataset into the +# project (within the build directory): # # - If the file already exists locally in '$(INDIR)' (the optional input # directory that may have been specified at configuration time with @@ -12,27 +13,53 @@ # files are large. # # - If the file doesn't exist in '$(INDIR)', or no input directory was -# specified at configuration time, then the file is downloaded from a -# specific URL. +# specified at configuration time, then the file is downloaded from the +# specified URL for that dataset. # # In both cases, before placing the file (or its link) in the build -# directory, 'reproduce/analysis/make/download.mk' will check the SHA256 -# checksum of the dataset and if it differs from the pre-defined value (set -# for that file, here), it will abort (since this is not the intended -# dataset). -# -# Therefore, the two variables specifying the URL and SHA256 checksum of -# the file are MANDATORY. The third variable (INPUT-%-size) showing the -# human-readable size of the file (from 'ls -lh') is optional (but -# recommended: because it gives future scientists to get a feeling of the -# volume of data they need to input: will become important if the -# size/number of files is large). +# directory, the download rule of 'reproduce/analysis/make/initialize.mk' +# will check the verification of the dataset and if it differs from the +# pre-defined value (set for that file, here), it will abort (since this is +# not the intended dataset). # +# Verification (two modes) +# ------------------------ +# - SHA256 checksum. This will check the full contents of the file, and +# is generic to any data format. However, if the server inserts custom +# headers like the query date or query code and etc, this form of +# validation is not useful: because every download will have different +# headers. In such cases, you should use the other verification methods +# below. In other words, this method is only good for files that are +# "static" on the server (and left there unchanged). If the file is +# generated at request time, the server usually inserts custom run-time +# dependent headers; making it impossible to verify with an SHA +# checksum of the whole file. +# - The FITS Standard's 'DATASUM' (which will only check the data, not +# the headers). According to the FITS standard, this sum ignores all +# headers, and is only calculated on a HDU's data. By default, this +# will require Gnuastro (which can easily calculate and return the +# value on the command-line), and it assumes HDU number 1 (counting +# from 0). You can modify the defaults by modifying the rule in +# 'reproduce/analysis/make/initialize.mk'. +# +# Automatic writing of verification +# --------------------------------- +# In case you would like Maneage to find the checksum upon downloading, put +# the string '--auto-replace--' instead of a checksum. This can be helpful +# for large datasets; where downloading only for adding the checksum is not +# easy/possible and can be buggy. In this scenario, upon downloading the +# file its checksum will be calculated and will be replaced with the +# '--auto-replace--' in this file. But since this file is under version +# control, be sure to commit all the updated checksums after your downloads +# are finished! +# +# Variable description +# -------------------- # The naming convension is critical for the input files to be properly -# imported into the project. In the patterns below, the '%' is the full -# file name (including its suffix): for example in the demo input of this -# file in the 'maneage' branch, we have 'INPUT-wfpc2.fits-sha256': -# therefore, the input file (within the project's '$(indir)') is called +# imported into Maneage. In the patterns below, the '%' is the full file +# name (including its suffix): for example in the demo input of this file +# in the 'maneage' branch, we have 'INPUT-wfpc2.fits-sha256': therefore, +# the input file (within the project's '$(indir)') is called # 'wfpc2.fits'. This allows you to simply set '$(indir)/wfpc2.fits' as the # pre-requisite of any recipe that needs the input file: you will rarely # (if at all!) need to use these variables directly. @@ -40,23 +67,17 @@ # INPUT-%-sha256: The sha256 checksum of the file. You can generate the # SHA256 checksum of a file with the 'sha256sum FILENAME' # command (where 'FILENAME' is the name of your -# file). this is very important for an automatic -# verification of the file: that it hasn't changed -# between different runs of the project (locally or in -# the URL). There are more robust checksum algorithms -# like the 'SHA' standards. -# -# AUTOMATIC CHEKSUM CALCULATION: In case you would like -# Maneage to find the checksum upon downloading, put the -# string '--auto-replace--' instead of a checksum. This -# can be helpful for large datasets; where downloading -# only for adding the checksum is not easy/possible and -# can be buggy. In this scenario, upon downloading the -# file its checksum will be calculated and will be -# replaced with the '--auto-replace--' in this file. But -# since this file is under version control, be sure to -# commit all the updated checksums after your downloads -# are finished! +# file). Don't use this if you give the 'fitsdatasum' +# keyvalue. +# +# INPUT-%-fitsdatasum: The FITS standard DATASUM value for HDU number 1 +# of the FITS file (counting from 0). Don't use this +# if you give the 'sha256' keyword. +# +# INPUT-%-fitshdu: The HDU identifier (counter from 0, or name) to use +# for the verification. This is only relevant in the +# 'fitsdatasum' verification method and optional (if not +# given, HDU number 1 is used; counting from 0). # # INPUT-%-url: The URL to download the file if it is not available # locally. It can happen that during the first phases of @@ -70,6 +91,13 @@ # good feeling of the necessary network and storage # capacity that is necessary to start the project. # +# Therefore, the the verification variable is MANDATORY in any case. The +# variable with a URL is only necessary if you do not have the file +# locally. However, The size variable is optional (but recommended: because +# it gives future scientists a feeling of the volume of data they need to +# input to run your project: will become important if the size/number of +# files is large). +# # The input dataset's name (that goes into the '%') can be different from # the URL's file name (last component of the URL, after the last '/'). Just # note that it is assumed that the local copy (outside of your project) is @@ -87,7 +115,18 @@ -# Demo dataset used in the histogram plot (remove when customizing). +# Demo dataset used in the histogram plot +# --------------------------------------- +# +# Remove this part while you are entering your project's datasets. +# +# Since the demonstration dataset is a FITS file, we have also added the +# two '$(INPUT-%-fits*)' variables as a demonstration. But they are +# commented because the SHA256 method is also possible for this file (its +# not generated on the server at query time; it is a static file on the +# server). INPUT-wfpc2.fits-size = 62K INPUT-wfpc2.fits-url = https://fits.gsfc.nasa.gov/samples/WFPC2ASSNu5780205bx.fits INPUT-wfpc2.fits-sha256 = 9851bc2bf9a42008ea606ec532d04900b60865daaff2f233e5c8565dac56ad5f +#INPUT-wfpc2.fits-fitshdu = 0 +#INPUT-wfpc2.fits-fitsdatasum = 2218330266 |