aboutsummaryrefslogtreecommitdiff
path: root/reproduce/analysis/config/INPUTS.conf
blob: 5970ae329e3264c79ccefb46ce4e19d85342f537 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# This project's input file information (metadata).
#
# For each input (external) data file that is used within the project,
# three variables are suggested here (only the verification variable is
# strictly mandatory). These variables will be used by the download rule of
# 'reproduce/analysis/make/initialize.mk' to import the dataset into the
# project (within the build directory):
#
#   - If the file already exists locally in '$(INDIR)' (the optional input
#     directory that may have been specified at configuration time with
#     '--input-dir'), a symbolic link will be added in '$(indir)' (in the
#     build directory). A symbolic link is used to avoid extra storage when
#     files are large.
#
#   - If the file doesn't exist in '$(INDIR)', or no input directory was
#     specified at configuration time, then the file is downloaded from the
#     specified URL for that dataset.
#
# In both cases, before placing the file (or its link) in the build
# directory, the download rule of 'reproduce/analysis/make/initialize.mk'
# will check the verification of the dataset and if it differs from the
# pre-defined value (set for that file, here), it will abort (since this is
# not the intended dataset).
#
# Verification (two modes)
# ------------------------
#   - SHA256 checksum. This will check the full contents of the file, and
#     is generic to any data format. However, if the server inserts custom
#     headers like the query date or query code and etc, this form of
#     validation is not useful: because every download will have different
#     headers. In such cases, you should use the other verification methods
#     below. In other words, this method is only good for files that are
#     "static" on the server (and left there unchanged). If the file is
#     generated at request time, the server usually inserts custom run-time
#     dependent headers; making it impossible to verify with an SHA
#     checksum of the whole file.
#   - The FITS Standard's 'DATASUM' (which will only check the data, not
#     the headers). According to the FITS standard, this sum ignores all
#     headers, and is only calculated on a HDU's data. By default, this
#     will require Gnuastro (which can easily calculate and return the
#     value on the command-line), and it assumes HDU number 1 (counting
#     from 0). You can modify the defaults by modifying the rule in
#     'reproduce/analysis/make/initialize.mk'.
#
# Automatic writing of verification
# ---------------------------------
# In case you would like Maneage to find the checksum upon downloading, put
# the string '--auto-replace--' instead of a checksum. This can be helpful
# for large datasets; where downloading only for adding the checksum is not
# easy/possible and can be buggy. In this scenario, upon downloading the
# file its checksum will be calculated and will be replaced with the
# '--auto-replace--' in this file. But since this file is under version
# control, be sure to commit all the updated checksums after your downloads
# are finished!
#
# Variable description
# --------------------
# The naming convension is critical for the input files to be properly
# imported into Maneage. In the patterns below, the '%' is the full file
# name (including its suffix): for example in the demo input of this file
# in the 'maneage' branch, we have 'INPUT-wfpc2.fits-sha256': therefore,
# the input file (within the project's '$(indir)') is called
# 'wfpc2.fits'. This allows you to simply set '$(indir)/wfpc2.fits' as the
# pre-requisite of any recipe that needs the input file: you will rarely
# (if at all!) need to use these variables directly.
#
#   INPUT-%-sha256: The sha256 checksum of the file. You can generate the
#                   SHA256 checksum of a file with the 'sha256sum FILENAME'
#                   command (where 'FILENAME' is the name of your
#                   file). Don't use this if you give the 'fitsdatasum'
#                   keyvalue.
#
#   INPUT-%-fitsdatasum: The FITS standard DATASUM value for HDU number 1
#                        of the FITS file (counting from 0). Don't use this
#                        if you give the 'sha256' keyword.
#
#   INPUT-%-fitshdu: The HDU identifier (counter from 0, or name) to use
#                    for the verification. This is only relevant in the
#                    'fitsdatasum' verification method and optional (if not
#                    given, HDU number 1 is used; counting from 0).
#
#   INPUT-%-url: The URL to download the file if it is not available
#                locally. It can happen that during the first phases of
#                your project the data aren't yet public. In this case, you
#                set a phony URL like this (just as a clear place-holder):
#                'https://this.file/is/not/yet/public'.
#
#   INPUT-%-size: The human-readable size of the file (output of 'ls
#                 -lh'). This is not used by default but can help other
#                 scientists who would like to run your project get a
#                 good feeling of the necessary network and storage
#                 capacity that is necessary to start the project.
#
# Therefore, the the verification variable is MANDATORY in any case. The
# variable with a URL is only necessary if you do not have the file
# locally. However, The size variable is optional (but recommended: because
# it gives future scientists a feeling of the volume of data they need to
# input to run your project: will become important if the size/number of
# files is large).
#
# The input dataset's name (that goes into the '%') can be different from
# the URL's file name (last component of the URL, after the last '/'). Just
# note that it is assumed that the local copy (outside of your project) is
# also called '%' (if your local copy of the input dataset and the only
# repository names are the same, be sure to set '%' accordingly).
#
# Copyright (C) 2018-2022 Mohammad Akhlaghi <mohammad@akhlaghi.org>
#
# Copying and distribution of this file, with or without modification, are
# permitted in any medium without royalty provided the copyright notice and
# this notice are preserved.  This file is offered as-is, without any
# warranty.





# Demo dataset used in the histogram plot
# ---------------------------------------
#
# Remove this part while you are entering your project's datasets.
#
# Since the demonstration dataset is a FITS file, we have also added the
# two '$(INPUT-%-fits*)' variables as a demonstration. But they are
# commented because the SHA256 method is also possible for this file (its
# not generated on the server at query time; it is a static file on the
# server).
INPUT-wfpc2.fits-size = 62K
INPUT-wfpc2.fits-url  = https://fits.gsfc.nasa.gov/samples/WFPC2ASSNu5780205bx.fits
INPUT-wfpc2.fits-sha256 = 9851bc2bf9a42008ea606ec532d04900b60865daaff2f233e5c8565dac56ad5f
#INPUT-wfpc2.fits-fitshdu = 0
#INPUT-wfpc2.fits-fitsdatasum = 2218330266