diff options
Diffstat (limited to 'reproduce/analysis/config/INPUTS.conf')
-rw-r--r-- | reproduce/analysis/config/INPUTS.conf | 166 |
1 files changed, 122 insertions, 44 deletions
diff --git a/reproduce/analysis/config/INPUTS.conf b/reproduce/analysis/config/INPUTS.conf index b969945..1090e44 100644 --- a/reproduce/analysis/config/INPUTS.conf +++ b/reproduce/analysis/config/INPUTS.conf @@ -1,42 +1,110 @@ -# Input files necessary for this project, the variables defined in this -# file are primarily used in 'reproduce/analysis/make/download.mk'. See -# there for precise usage of the variables. But comments are also provided -# here. -# -# Necessary variables for each input dataset are listed below. Its good -# that all the variables of each file have the same base-name (in the -# example below 'DEMO') with descriptive suffixes, also put a short comment -# above each group of variables for each dataset, shortly explaining what -# it is. -# -# 1) Local file name ('DEMO-DATA' below): this is the name of the dataset -# on the local system (in 'INDIR', given at configuration time). It is -# recommended that it be the same name as the online version of the -# file like the case here (note how this variable is used in 'DEMO-URL' -# for the dataset's full URL). However, this is not always possible, so -# the local and server filenames may be different. Ultimately, the file -# name is irrelevant, we check the integrity with the checksum. -# -# 2) The MD5 checksum of the file ('DEMO-MD5' below): this is very -# important for an automatic verification of the file. You can -# calculate it by running 'md5sum' on your desired file. You can also -# use any other checksum tool that you prefer, just be sure to correct -# the respective command in 'reproduce/analysis/make/download.mk'. -# -# 3) The human-readable size of the file ('DEMO-SIZE' below): this is an -# optional variable, mainly to help a reader of your project get a -# sense of the volume they need to download if they don't already have -# the dataset. So it is highly recommended to add it (future readers of -# your project's source will appreciate it!). You can get it from the -# output of 'ls -lh' command on the file. Optionally you can use it in -# messages during the configuration phase (when Maneage asks for the -# input data directory), along with other info about the file(s). -# -# 4) The full dataset URL ('DEMO-URL' below): this is the full URL -# (including the file-name) that can be used to download the dataset -# when necessary. Also, see the description above on local filename. -# -# Copyright (C) 2018-2021 Mohammad Akhlaghi <mohammad@akhlaghi.org> +# This project's input file information (metadata). +# +# For each input (external) data file that is used within the project, +# three variables are suggested here (only the verification variable is +# strictly mandatory). These variables will be used by the download rule of +# 'reproduce/analysis/make/initialize.mk' to import the dataset into the +# project (within the build directory): +# +# - If the file already exists locally in '$(INDIR)' (the optional input +# directory that may have been specified at configuration time with +# '--input-dir'), a symbolic link will be added in '$(indir)' (in the +# build directory). A symbolic link is used to avoid extra storage when +# files are large. +# +# - If the file doesn't exist in '$(INDIR)', or no input directory was +# specified at configuration time, then the file is downloaded from the +# specified URL for that dataset. +# +# In both cases, before placing the file (or its link) in the build +# directory, the download rule of 'reproduce/analysis/make/initialize.mk' +# will check the verification of the dataset and if it differs from the +# pre-defined value (set for that file, here), it will abort (since this is +# not the intended dataset). +# +# Verification (two modes) +# ------------------------ +# - SHA256 checksum. This will check the full contents of the file, and +# is generic to any data format. However, if the server inserts custom +# headers like the query date or query code and etc, this form of +# validation is not useful: because every download will have different +# headers. In such cases, you should use the other verification methods +# below. In other words, this method is only good for files that are +# "static" on the server (and left there unchanged). If the file is +# generated at request time, the server usually inserts custom run-time +# dependent headers; making it impossible to verify with an SHA +# checksum of the whole file. +# - The FITS Standard's 'DATASUM' (which will only check the data, not +# the headers). According to the FITS standard, this sum ignores all +# headers, and is only calculated on a HDU's data. By default, this +# will require Gnuastro (which can easily calculate and return the +# value on the command-line), and it assumes HDU number 1 (counting +# from 0). You can modify the defaults by modifying the rule in +# 'reproduce/analysis/make/initialize.mk'. +# +# Automatic writing of verification +# --------------------------------- +# In case you would like Maneage to find the checksum upon downloading, put +# the string '--auto-replace--' instead of a checksum. This can be helpful +# for large datasets; where downloading only for adding the checksum is not +# easy/possible and can be buggy. In this scenario, upon downloading the +# file its checksum will be calculated and will be replaced with the +# '--auto-replace--' in this file. But since this file is under version +# control, be sure to commit all the updated checksums after your downloads +# are finished! +# +# Variable description +# -------------------- +# The naming convension is critical for the input files to be properly +# imported into Maneage. In the patterns below, the '%' is the full file +# name (including its suffix): for example in the demo input of this file +# in the 'maneage' branch, we have 'INPUT-wfpc2.fits-sha256': therefore, +# the input file (within the project's '$(indir)') is called +# 'wfpc2.fits'. This allows you to simply set '$(indir)/wfpc2.fits' as the +# pre-requisite of any recipe that needs the input file: you will rarely +# (if at all!) need to use these variables directly. +# +# INPUT-%-sha256: The sha256 checksum of the file. You can generate the +# SHA256 checksum of a file with the 'sha256sum FILENAME' +# command (where 'FILENAME' is the name of your +# file). Don't use this if you give the 'fitsdatasum' +# keyvalue. +# +# INPUT-%-fitsdatasum: The FITS standard DATASUM value for HDU number 1 +# of the FITS file (counting from 0). Don't use this +# if you give the 'sha256' keyword. +# +# INPUT-%-fitshdu: The HDU identifier (counter from 0, or name) to use +# for the verification. This is only relevant in the +# 'fitsdatasum' verification method and optional (if not +# given, HDU number 1 is used; counting from 0). +# +# INPUT-%-url: The URL to download the file if it is not available +# locally. It can happen that during the first phases of +# your project the data aren't yet public. In this case, you +# set a phony URL like this (just as a clear place-holder): +# 'https://this.file/is/not/yet/public'. +# +# INPUT-%-size: The human-readable size of the file (output of 'ls +# -lh'). This is not used by default but can help other +# scientists who would like to run your project get a +# good feeling of the necessary network and storage +# capacity that is necessary to start the project. +# +# Therefore, the the verification variable is MANDATORY in any case. The +# variable with a URL is only necessary if you do not have the file +# locally. However, The size variable is optional (but recommended: because +# it gives future scientists a feeling of the volume of data they need to +# input to run your project: will become important if the size/number of +# files is large). +# +# The input dataset's name (that goes into the '%') can be different from +# the URL's file name (last component of the URL, after the last '/'). Just +# note that it is assumed that the local copy (outside of your project) is +# also called '%' (if your local copy of the input dataset and the only +# repository names are the same, be sure to set '%' accordingly). +# +# Copyright (C) 2018-2023 Mohammad Akhlaghi <mohammad@akhlaghi.org> # # Copying and distribution of this file, with or without modification, are # permitted in any medium without royalty provided the copyright notice and @@ -47,8 +115,18 @@ -# Demo dataset used in the histogram plot (remove when customizing). -DEMO-DATA = WFPC2ASSNu5780205bx.fits -DEMO-MD5 = a4791e42cd1045892f9c41f11b50bad8 -DEMO-SIZE = 62K -DEMO-URL = https://fits.gsfc.nasa.gov/samples/$(DEMO-DATA) +# Demo dataset used in the histogram plot +# --------------------------------------- +# +# Remove this part while you are entering your project's datasets. +# +# Since the demonstration dataset is a FITS file, we have also added the +# two '$(INPUT-%-fits*)' variables as a demonstration. But they are +# commented because the SHA256 method is also possible for this file (its +# not generated on the server at query time; it is a static file on the +# server). +INPUT-wfpc2.fits-size = 62K +INPUT-wfpc2.fits-url = https://fits.gsfc.nasa.gov/samples/WFPC2ASSNu5780205bx.fits +INPUT-wfpc2.fits-sha256 = 9851bc2bf9a42008ea606ec532d04900b60865daaff2f233e5c8565dac56ad5f +#INPUT-wfpc2.fits-fitshdu = 0 +#INPUT-wfpc2.fits-fitsdatasum = 2218330266 |