aboutsummaryrefslogtreecommitdiff
path: root/reproduce/analysis/config/INPUTS.conf
AgeCommit message (Collapse)AuthorLines
2023-05-07Copyright years: updated to 2023, accompanied by some minor fixesMohammad Akhlaghi-1/+1
SUMMARY: just house-cleaning, no need to do anything major in your branch. Just update the copyright years in files that you have added. Until now, the latest copyright years of the whole Maneage source code was 2022! As of this commit, we have already moved to 2023 for 5 months! Furthermore, there were a few other minor issues that needed correction: - The URL to download input datasets wasn't quoted in 'initialize.mk' or the download script! As a result, when the input URL had characters that are meaningful to the shell (like '&'), the download command would not work. - The only program that had 'make check' in the 'basic.mk' programs was MPFR. At that stage, we still haven't built our own compiler at this stage, this is not accurate. - The 'pyerfa' and 'extension-helpers' packages in Python need 'setuptools_scm' on some systems. But until now, it was not in the list of their prerequisites. With this commit, all the issues above have been corrected.
2022-09-02Added server authentication and FITS DATASUM for verficiationMohammad Akhlaghi-38/+77
SUMMARY: Nothing special is necessary for your existing projects. This commit just addds two new features (read the commit description for more): 1. To provide a user and password to servers that need authentication before they allow downloading of proprietary data, 2. To use the FITS Standard's DATASUM for file verification (for cases where the file is not static on the server, and is generated upon receiving your download request). Until now, Maneage didn't have any infrastructure for databases that require authentication (through a user or password, when calling 'wget'). Furthermore, when the downloaded file is automatically generated by the server upon request, the server usually adds metadata (like file date, or query number and etc) in the header. Therefore the simple SHA256 checksum of the file would differ on every download! This made it very hard to verify if the data (not headers) are unchanged. With this commit, both these problems have been addressed: - Server authentication: the 'reproduce/software/config/LOCAL.conf' now contains three new variables for this purpose. With them, you can give your username and password, along with the authentication method of the server. The comments on top of these three variables give a full description of their usage. - Verifying only the data in a file (ignoring the headers): The 'reproduce/analysis/config/INPUTS.conf' now accepts two new optional variables for each input file using the FITS standard's DATASUM convention: 'INPUT-%-fitsdatasum' and 'INPUT-%-fitshdu'. If the SHA256 isn't specified for a file, Maneage will use these to verify the file. With the latter, you specify the HDU of the data you want to verify and with the former you give the DATASUM value for that HDU. As the name suggests, this is only valid for FITS files. If we find other formats that support a similar behavior, we can add this feature for those formats also. This is also thoroughly discussed in the comments of 'reproduce/analysis/config/INPUTS.conf'. This commit was done with the help of Pedram Ashofte Ardakani, Sepideh Eskandarlou and Mohammadreza Khellat.
2022-06-11IMPORTANT: download.mk removed, content moved to initialize.mkMohammad Akhlaghi-0/+12
SUMMARY: no special action should be necessary; but its an important update in low-level Maneage infra-structure (related with downloading and setting input checksums). Until now, we had a separate 'download.mk' as one of the default sub-Makefiles that should have been loaded in all the 'top-*.mk' files after 'initialize.mk'. This was due to historic reasons: until Commit 91799fe4b6d, we had to manually make some changes in 'download.mk' for every input file we defined in 'INPUTS.mk' (which was very inconvenient, and not easily possible for a large number of files!). But since Commit 91799fe4b6d, those manual changes are no longer necessary, and a normal user will hardly ever need to touch the contents of 'download.mk' (which also had one effective rule). Furthermore, based on shared projects with Zohre Ghaffari and Sepideh Eskandarlou (which involved a large number of large files), we recognized that it is very inconvenient to download a file once, update its checksum, and re-run Maneage (so the validation works). A robust solution was necesary to let project authors download the data and automatically update the checksum. With this commit, to help in high-level project management in Maneage, the single, and generic rule of 'download.mk' has been moved to 'initialize.mk', enabling us to fully remove this extra sub-Makefile from Maneage's source. Furthermore, with this commit, a usable solution to the automatic updating of the checksum has also been implemented (which has been described in the comments of 'INPUTS.conf'): the users can now set the checksum to '--auto-replace--'. In this case, the download rule (now in 'initialize.mk') will automatically update that line of 'INPUTS.conf' and add the checksum instead. After './project make' is complete, when the user runs 'git diff', they can see all the updated checksums in the source of their project and commit the updated 'INPUTS.conf' into the source so this will not be necessary later. Two other smaller issues have also been addressed in this commit: - There was an extra ',' in the call to 'filter-out' when we defined 'prepare-dep' in 'reproduce/analysis/make/prepare.mk'. This would cause a crash (with Make complaining that there is no rule for target 'initialize.mk,': notice the extra ','). With this commit, that extra ',' has been removed and the problem was solved. - The build recipe of Imfit (in 'reproduce/software/make/high-level.mk'), had two SPACE characters after '--no-openmp' which would make the reading hard. They have been updated to one SPACE.
2022-06-10Housekeeping: some portability issues fixed; four software updatesMohammad Akhlaghi-1/+1
Until now, there were several portability issues in Maneage: 1. Maneage would crash on older operating systems (checked on Debian 6), where Wget didn't have the '--no-use-server-timestamps'. 2. On a Linux kernel 2.6.32 (of the same Debian 6 above) some features in 'util-linux' (like 'swapon' or 'libmount') wouldn't build and wouldn't let 'util-linux' complete. These features need root permissions to be useful, so the wouldn't be used in Maneage any way! But they wouldn't let Maneage get built 3. The './project shell' command would still read the host's '~/.bashrc', letting the host environment leak-in to Maneage's interactive shell. 4. The building of Flex 2.64 wouldn't complete due to a segmentation fault an Ubuntu, but NetPBM (which depends on Flex) would crash with a wrong usage of 'yyunput'. This had actually caused a non-update to Flex in a previous Maneage software update. 5. The update Astrometry.net would assume SExtractor's executable name is 'source-extractor'; causing a crash in usage. This forced the users to manually create a 'source-extractor' symbolic link in the '.local/bin' directory. 6. The 'reproduce/software/shell/tarball-prepare.sh' script (that is used for making Maneage-standard tarballs) wouldn't accept option values with an '=' between the option name and value! It also didnt' print sufficiently informative messages and errors (for example it would say "skipping ..." (making the user think there is a problem!), but it was actually that the file already existed! 7. The 'reproduce/analysis/make/prepare.mk' and 'reproduce/analysis/make/verify.mk' Makefiles that needed to reject some of the 'makesrc' sub-Makefiles would simply substitute their names with nothing. But this would cause problems when the name is part of the name of another sub-Makefile. 8. On the Debian 6 system mentioned above the raw 'df' command's output wasn't in the expected format; so Maneage would fail to properly detect the free space in the disk. With these commit, all the issues above have been solved: for 1, A check has been added to avoid using that option. For 2, those 'util-linux' features have been disabled. For 3, the '--norc' and '--noprofile' options have beed added to the call to Bash. For 4, see below. For 5, the symbolic link is now automatically made with SExtractor. For 6, the option reading components of that script have been fully re-written and more robust sanity checks are also added, with more informative warnings. For 7, the 'subst' function of Make was replaced with 'filter-out' and this fixed the problem. For 8, 'df' is called with the '-P' option so it has a unified format in all versions. For 4, the versions of 'flex' and 'netpbm' have been updated. Since they were the dependency of 'astrometrynet', that has also been updated. In the process, we discovered that 'lzip' has a new version which claims to be faster, so that is also updated. lzip 1.22 --> 1.23 astrometrynet 0.85 --> 0.89 flex 2.6.4 --> 2.6.4-410-74a89fd netpbm 10.73.39 --> 10.73.39 NetPBM needed some manual manipulation in its source (to remove the extra line), so the necessary steps have been added to its build recipe in 'reproduce/software/make/high-level.mk'.
2022-04-15IMPORTANT: more generic, robust and secure INPUTS.conf and download.mkMohammad Akhlaghi-41/+68
SUMMARY: it is necessary to update your 'INPUTS.conf' and 'download.mk'. Until now, adding an input file involved several steps that needed manual (and inconvenient!) intervention: for every file, you needed to define four variables in 'INPUTS.conf', and in 'reproduce/analysis/make/download.mk' you had to use a (complex for large number of files) shell 'if/elif/else' condition to link the names of the input files to those variables. Besides inconvenience, this could cause bugs (typos!). Furthermore, a basic MD5 checksum was used for verifying the files. With this commit, a new structure has been defined for 'INPUTS.conf' that (thanks to some pretty useful GNU Make features), removes the need for users to manually edit 'reproduce/analysis/make/download.mk', and reduces the number of variables necessary for each file to three (from four). Furthermore, we now use the SHA256 checksum for input data validation. Regarding the trick used in 'INPUTS.conf' (form the newly added description in 'download.mk'): In GNU Make, '.VARIABLES' "... expands to a list of the names of all global variables defined so far" (from the "Other Special Variables" section of the GNU Make manual). Assuming that the pattern 'INPUT-%-sha256' is only used for input files, we find all the variables that contain the input file names (the '%' is the filename). Finally, using the pattern-substitution function ('patsubst'), we remove the fixed string at the start and end of the variable name. Steps you need to take: - INPUTS.conf: translate your old format to the new format (after carefully reading the description in the comments at the start of the file). After applying the new standards, you don't need to use the variables of 'INPUTS.conf' directly in your Makefiles! For example if one of your input datasets is called 'abc.fits', the checksum variable will be 'INPUT-abc.fits-sha256' and in your high-level Makefiles, you can simply set '$(indir)/abc.fits' as a prerequisite (like you probably did already). - reproduce/analysis/make/download.mk: for the definition and rule of 'inputdatasets', simply use the Maneage branch, and remove anything you had added in your project. In the process, I also noticed that 'README-hacking.md' still referred to 'master' as the main project branch, while we have used 'main' in the paper (and is the common convention with Git).
2022-01-21IMPORTANT: Updates to almost all softwareMohammad Akhlaghi-1/+1
This commit primarily affects the configuration step of Maneage'd projects, and in particular, updated versions of the many of the software (see P.S.). So it shouldn't affect your high-level analysis other than the version bumps of the software you use (and the software's possibly improve/changed behavior). The following software (and thus their dependencies) couldn't be updated as described below: - Cryptography: isn't building because it depends on a new setuptools-rust package that has problems (https://savannah.nongnu.org/bugs/index.php?61731), so it has been commented in 'versions.conf'. - SecretStorage: because it depends on Cryptography. - Keyring: because it depends on SecretStorage. - Astroquery: because it depends on Keyring. This is a "squashed" commit after rebasing a development branch of 60 commits corresponding to a roughly two-month time interval. The following people contributed to this branch. - Boudewijn Roukema added all the R software infrastructure and the R packages, as well as greatly helping in fixing many bugs during the update. - Raul Infante-Sainz helped in testing and debugging the build. - Pedram Ashofteh Ardakani found and fixed a bug. - Zahra Sharbaf helped in testing and found several bugs. Below a description of the most noteworthy points is given. - Software tarballs: all updated software now have a unified format tarball (ustar; if not possible, pax) and unified compression (Lzip) in Maneage's software repository in Zenodo (https://doi.org/10.5281/zenodo.3883409). For more on this See https://savannah.nongnu.org/task/?15699 . This won't affect any extra software you would like to add; you can use any format recognized by GNU Tar, and all common compression algorithms. This new requirement is only for software that get merged to the core Maneage branch. - Metastore (and thus libbsd and libmd) moved to highlevel: Metastore (and the packages it depends on) is a high-level product that is only relevant during the project development (like Emacs!): when the user wants the file meta data (like dates) to be unchanged after checking out branches. So it should be considered a high-level software, not basic. Metastore also usually causes many more headaches and error messages, so personally, I have stopped using it! Instead I simply merge my branches in a separate clone, then pull the merge commit: in this way, the files of my project aren't re-written during the checkout phase and therefore their dates are untouched (which can conflict with Make's dates on configuration files). - The un-official cloned version of Flex (2.6.4-91 until this commit) was causing problems in the building of Netpbm, so with this commit, it has been moved back to version 2.6.4. - Netpbm's official page had version 10.73.38 as the latest stable tarball that was just released in late 2021. But I couldn't find our previously-used version 10.86.99 anywhere (to see when it was released and why we used it! Its at last more than one year old!). So the official stable version is being used now. - Improved instructions in 'README.md' for building software environment in a Docker container (while having project source and output data products on the local system; including the usage of the host's '/dev/shm' to speed up temporary operations). - Until now, the convention in Maneage was to put eight SPACE characters before the comment lines within recipes. This was done because by default GNU Emacs (also many other editors) show a TAB as eight characters. However, in other text editors, online browsers, or even the Git diff, a TAB can correspond to a different number of characters. In such cases, the Maneage recipes wouldn't look too interesting (the comments and the recipe commands would show a different indentation!). With this commit, all the comment lines in the Makefiles within the core Maneage branch have a hash ('#') as their first character and a TAB as the second. This allows the comment lines in recipes to have the same indentation as code; making the code much more easier to read in a general scenario including a 'git diff' (editor agnostic!). P.S. List of updated software with their old and new versions - Software with no version update are not mentioned. - The old version of newly added software are shown with '--'. Name (Basic) Old version New version ------------ ----------- ----------- Bzip2 1.0.6 1.0.8 CURL 7.71.1 7.79.1 Dash 0.5.10.2 0.5.11.5 File 5.39 5.41 Flock 0.2.3 0.4.0 GNU Bash 5.0.18 5.1.8 GNU Binutils 2.35 2.37 GNU Coreutils 8.32 9.0 GNU GCC 10.2.0 11.2.0 GNU M4 1.4.18 1.4.19 GNU Readline 8.0 8.1.1 GNU Tar 1.32 1.34 GNU Texinfo 6.7 6.8 GNU diffutils 3.7 3.8 GNU findutils 4.7.0 4.8.0 GNU gmp 6.2.0 6.2.1 GNU grep 3.4 3.7 GNU gzip 1.10 1.11 GNU libunistring 0.9.10 1.0 GNU mpc 1.1.0 1.2.1 GNU mpfr 4.0.2 4.1.0 GNU nano 5.2 6.0 GNU ncurses 6.2 6.3 GNU wget 1.20.3 1.21.2 Git 2.28.0 2.34.0 Less 563 590 Libxml2 2.9.9 2.9.12 Lzip 1.22-rc2 1.22 OpenSLL 1.1.1a 3.0.0 Patchelf 0.10 0.13 Perl 5.32.0 5.34.0 Podlators -- 4.14 Name (Highlevel) Old version New version ---------------- ----------- ----------- Apachelog4cxx 0.10.0-603 0.12.1 Astrometry.net 0.80 0.85 Boost 1.73.0 1.77.0 CFITSIO 3.48 4.0.0 Cmake 3.18.1 3.21.4 Eigen 3.3.7 3.4.0 Expat 2.2.9 2.4.1 FFTW 3.3.8 3.3.10 Flex 2.6.4-91 2.6.4 Fontconfig 2.13.1 2.13.94 Freetype 2.10.2 2.11.0 GNU Astronomy Utilities 0.12 0.16.1-e0f1 GNU Autoconf 2.69.200-babc 2.71 GNU Automake 1.16.2 1.16.5 GNU Bison 3.7 3.8.2 GNU Emacs 27.1 27.2 GNU GDB 9.2 11.1 GNU GSL 2.6 2.7 GNU Help2man 1.47.11 1.48.5 Ghostscript 9.52 9.55.0 ICU -- 70.1 ImageMagick 7.0.8-67 7.1.0-13 Libbsd 0.10.0 0.11.3 Libffi 3.2.1 3.4.2 Libgit2 1.0.1 1.3.0 Libidn 1.36 1.38 Libjpeg 9b 9d Libmd -- 1.0.4 Libtiff 4.0.10 4.3.0 Libx11 1.6.9 1.7.2 Libxt 1.2.0 1.2.1 Netpbm 10.86.99 10.73.38 OpenBLAS 0.3.10 0.3.18 OpenMPI 4.0.4 4.1.1 Pixman 0.38.0 0.40.0 Python 3.8.5 3.10.0 R 4.0.2 4.1.2 SWIG 3.0.12 4.0.2 Util-linux 2.35 2.37.2 Util-macros 1.19.2 1.19.3 Valgrind 3.15.0 3.18.1 WCSLIB 7.3 7.7 Xcb-proto 1.14 1.14.1 Xorgproto 2020.1 2021.5 Name (Python) Old version New version ------------- ----------- ----------- Astropy 4.0 5.0 Beautifulsoup4 4.7.1 4.10.0 Beniget -- 0.4.1 Cffi 1.12.2 1.15.0 Cryptography 2.6.1 36.0.1 Cycler 0.10.0 0.11.0+} Cython 0.29.21 0.29.24 Esutil 0.6.4 0.6.9 Extension-helpers -- 0.1 Galsim 2.2.1 2.3.3 Gast -- 0.5.3 Jinja2 -- 3.0.3 MPI4py 3.0.3 3.1.3 Markupsafe -- 2.0.1 Numpy 1.19.1 1.21.3 Packaging -- 21.3 Pillow -- 8.4.0 Ply -- 3.11 Pyerfa -- 2.0.0.1 Pyparsing 2.3.1 3.0.4 Pythran -- 0.11.0 Scipy 1.5.2 1.7.3 Setuptools 41.6.0 58.3.0 Six 1.12.0 1.16.0 Uncertainties 3.1.2 3.1.6 Wheel -- 0.37.0 Name (R) Old version New version -------- ----------- ----------- Cli -- 2.5.0 Colorspace -- 2.0-1 Cowplot -- 1.1.1 Crayon -- 1.4.1 Digest -- 0.6.27 Ellipsis -- 0.3.2 Fansi -- 0.5.0 Farver -- 2.1.0 Ggplot2 -- 3.3.4 Glue -- 1.4.2 GridExtra -- 2.3 Gtable -- 0.3.0 Isoband -- 0.2.4 Labeling -- 0.4.2 Lifecycle -- 1.0.0 Magrittr -- 2.0.1 MASS -- 7.3-54 Mgcv -- 1.8-36 Munsell -- 0.5.0 Pillar -- 1.6.1 R-Pkgconfig -- 2.0.3 R6 -- 2.5.0 RColorBrewer -- 1.1-2 Rlang -- 0.4.11 Scales -- 1.1.1 Tibble -- 3.1.2 Utf8 -- 1.2.1 Vctrs -- 0.3.8 ViridisLite -- 0.4.0 Withr -- 2.4.2
2021-01-02Copyright year updated in all source filesMohammad Akhlaghi-1/+1
Having entered 2021, it was necessary to update the copyright years at the top of the source files. We recommend that you do this for all your project-specific source files also.
2020-07-04Better names and comments in INPUTS.confMohammad Akhlaghi-20/+23
Until now, the dataset's configuration names had a 'WFPC2' prefix. But this very alien to anyone that is not familiar with the history of the Hubble Space Telescope (the camera is no longer used! Its just used here since its one of the standard FITS files from the FITS standard webpage). With this commit the variable names have been modified to be more readable and clear (having a 'DEMO-' prefix). Also the comments of 'INPUTS.conf' (describing the purpose of each variable) were edited and made more clear.
2020-06-10IMPORTANT: bug fix in default data download script of download.mkMohammad Akhlaghi-3/+39
Summary of possible semantic conflicts 1. The recipe to download input datasets has been modified. You have to re-set the old 'origname' variable to 'localname' (to avoid confusion) and the default dataset URL should now be complete (including the actual filename). See the newly added descriptions in 'INPUTS.conf' for more on this. Until now, when the dataset was already present on the host system, a link couldn't be made to it, causing the project to crash in the checksum phase. This has been fixed with properly naming the main variable as 'localname' to avoid the confusion that caused it. Some other problems have been fixed in this recipe in the meantime: - When the checksum is different, the expected and calculated checksums are printed. - In the default paper, we now print the full URL of the dataset, not just the server, so the checksum of the 'download.tex' step has been updated.
2020-01-20IMPORTANT!!! Configuration Makefiles now have a .conf suffixMohammad Akhlaghi-0/+15
Until now, the configuration Makefiles (in `reproduce/software/config/installation' and `reproduce/analysis/config') had a `.mk' suffix, similar to the workhorse Makefiles. Although they are indeed Makefiles, but given their nature (to only keep configuration parameters), it is confusing (especially to early users) for them to also have a `.mk' (similar to the analysis or software building Makefiles). To address this issue, with this commit, all the configuration Makefiles (in those directories) are now given a `.conf' suffix. This is also assumed for all the files that are loaded. The configuration (software building) and running of the template have been checked with this change from scratch, but please report any error that may not have been noticed. THIS IS AN IMPORTANT CHANGE AND WILL CAUSE CRASHES OR UNEXPECTED BEHAVIORS FOR PROJECTS THAT HAVE BRANCHED FROM THIS TEMPLATE. PLEASE CORRECT THE SUFFIX OF ALL YOUR PROJECT'S CONFIGURATION MAKEFILES (IN THE DIRECTORIES ABOVE), OTHERWISE THEY AREN'T AUTOMATICALLY LOADED ANYMORE.