From 8463df97c6f26ec4d22cd5828bb0574fd5e450d2 Mon Sep 17 00:00:00 2001 From: Mohammad Akhlaghi Date: Mon, 4 Oct 2021 02:51:45 +0200 Subject: IMPORTANT: Updates to almost all software This commit primarily affects the configuration step of Maneage'd projects, and in particular, updated versions of the many of the software (see P.S.). So it shouldn't affect your high-level analysis other than the version bumps of the software you use (and the software's possibly improve/changed behavior). The following software (and thus their dependencies) couldn't be updated as described below: - Cryptography: isn't building because it depends on a new setuptools-rust package that has problems (https://savannah.nongnu.org/bugs/index.php?61731), so it has been commented in 'versions.conf'. - SecretStorage: because it depends on Cryptography. - Keyring: because it depends on SecretStorage. - Astroquery: because it depends on Keyring. This is a "squashed" commit after rebasing a development branch of 60 commits corresponding to a roughly two-month time interval. The following people contributed to this branch. - Boudewijn Roukema added all the R software infrastructure and the R packages, as well as greatly helping in fixing many bugs during the update. - Raul Infante-Sainz helped in testing and debugging the build. - Pedram Ashofteh Ardakani found and fixed a bug. - Zahra Sharbaf helped in testing and found several bugs. Below a description of the most noteworthy points is given. - Software tarballs: all updated software now have a unified format tarball (ustar; if not possible, pax) and unified compression (Lzip) in Maneage's software repository in Zenodo (https://doi.org/10.5281/zenodo.3883409). For more on this See https://savannah.nongnu.org/task/?15699 . This won't affect any extra software you would like to add; you can use any format recognized by GNU Tar, and all common compression algorithms. This new requirement is only for software that get merged to the core Maneage branch. - Metastore (and thus libbsd and libmd) moved to highlevel: Metastore (and the packages it depends on) is a high-level product that is only relevant during the project development (like Emacs!): when the user wants the file meta data (like dates) to be unchanged after checking out branches. So it should be considered a high-level software, not basic. Metastore also usually causes many more headaches and error messages, so personally, I have stopped using it! Instead I simply merge my branches in a separate clone, then pull the merge commit: in this way, the files of my project aren't re-written during the checkout phase and therefore their dates are untouched (which can conflict with Make's dates on configuration files). - The un-official cloned version of Flex (2.6.4-91 until this commit) was causing problems in the building of Netpbm, so with this commit, it has been moved back to version 2.6.4. - Netpbm's official page had version 10.73.38 as the latest stable tarball that was just released in late 2021. But I couldn't find our previously-used version 10.86.99 anywhere (to see when it was released and why we used it! Its at last more than one year old!). So the official stable version is being used now. - Improved instructions in 'README.md' for building software environment in a Docker container (while having project source and output data products on the local system; including the usage of the host's '/dev/shm' to speed up temporary operations). - Until now, the convention in Maneage was to put eight SPACE characters before the comment lines within recipes. This was done because by default GNU Emacs (also many other editors) show a TAB as eight characters. However, in other text editors, online browsers, or even the Git diff, a TAB can correspond to a different number of characters. In such cases, the Maneage recipes wouldn't look too interesting (the comments and the recipe commands would show a different indentation!). With this commit, all the comment lines in the Makefiles within the core Maneage branch have a hash ('#') as their first character and a TAB as the second. This allows the comment lines in recipes to have the same indentation as code; making the code much more easier to read in a general scenario including a 'git diff' (editor agnostic!). P.S. List of updated software with their old and new versions - Software with no version update are not mentioned. - The old version of newly added software are shown with '--'. Name (Basic) Old version New version ------------ ----------- ----------- Bzip2 1.0.6 1.0.8 CURL 7.71.1 7.79.1 Dash 0.5.10.2 0.5.11.5 File 5.39 5.41 Flock 0.2.3 0.4.0 GNU Bash 5.0.18 5.1.8 GNU Binutils 2.35 2.37 GNU Coreutils 8.32 9.0 GNU GCC 10.2.0 11.2.0 GNU M4 1.4.18 1.4.19 GNU Readline 8.0 8.1.1 GNU Tar 1.32 1.34 GNU Texinfo 6.7 6.8 GNU diffutils 3.7 3.8 GNU findutils 4.7.0 4.8.0 GNU gmp 6.2.0 6.2.1 GNU grep 3.4 3.7 GNU gzip 1.10 1.11 GNU libunistring 0.9.10 1.0 GNU mpc 1.1.0 1.2.1 GNU mpfr 4.0.2 4.1.0 GNU nano 5.2 6.0 GNU ncurses 6.2 6.3 GNU wget 1.20.3 1.21.2 Git 2.28.0 2.34.0 Less 563 590 Libxml2 2.9.9 2.9.12 Lzip 1.22-rc2 1.22 OpenSLL 1.1.1a 3.0.0 Patchelf 0.10 0.13 Perl 5.32.0 5.34.0 Podlators -- 4.14 Name (Highlevel) Old version New version ---------------- ----------- ----------- Apachelog4cxx 0.10.0-603 0.12.1 Astrometry.net 0.80 0.85 Boost 1.73.0 1.77.0 CFITSIO 3.48 4.0.0 Cmake 3.18.1 3.21.4 Eigen 3.3.7 3.4.0 Expat 2.2.9 2.4.1 FFTW 3.3.8 3.3.10 Flex 2.6.4-91 2.6.4 Fontconfig 2.13.1 2.13.94 Freetype 2.10.2 2.11.0 GNU Astronomy Utilities 0.12 0.16.1-e0f1 GNU Autoconf 2.69.200-babc 2.71 GNU Automake 1.16.2 1.16.5 GNU Bison 3.7 3.8.2 GNU Emacs 27.1 27.2 GNU GDB 9.2 11.1 GNU GSL 2.6 2.7 GNU Help2man 1.47.11 1.48.5 Ghostscript 9.52 9.55.0 ICU -- 70.1 ImageMagick 7.0.8-67 7.1.0-13 Libbsd 0.10.0 0.11.3 Libffi 3.2.1 3.4.2 Libgit2 1.0.1 1.3.0 Libidn 1.36 1.38 Libjpeg 9b 9d Libmd -- 1.0.4 Libtiff 4.0.10 4.3.0 Libx11 1.6.9 1.7.2 Libxt 1.2.0 1.2.1 Netpbm 10.86.99 10.73.38 OpenBLAS 0.3.10 0.3.18 OpenMPI 4.0.4 4.1.1 Pixman 0.38.0 0.40.0 Python 3.8.5 3.10.0 R 4.0.2 4.1.2 SWIG 3.0.12 4.0.2 Util-linux 2.35 2.37.2 Util-macros 1.19.2 1.19.3 Valgrind 3.15.0 3.18.1 WCSLIB 7.3 7.7 Xcb-proto 1.14 1.14.1 Xorgproto 2020.1 2021.5 Name (Python) Old version New version ------------- ----------- ----------- Astropy 4.0 5.0 Beautifulsoup4 4.7.1 4.10.0 Beniget -- 0.4.1 Cffi 1.12.2 1.15.0 Cryptography 2.6.1 36.0.1 Cycler 0.10.0 0.11.0+} Cython 0.29.21 0.29.24 Esutil 0.6.4 0.6.9 Extension-helpers -- 0.1 Galsim 2.2.1 2.3.3 Gast -- 0.5.3 Jinja2 -- 3.0.3 MPI4py 3.0.3 3.1.3 Markupsafe -- 2.0.1 Numpy 1.19.1 1.21.3 Packaging -- 21.3 Pillow -- 8.4.0 Ply -- 3.11 Pyerfa -- 2.0.0.1 Pyparsing 2.3.1 3.0.4 Pythran -- 0.11.0 Scipy 1.5.2 1.7.3 Setuptools 41.6.0 58.3.0 Six 1.12.0 1.16.0 Uncertainties 3.1.2 3.1.6 Wheel -- 0.37.0 Name (R) Old version New version -------- ----------- ----------- Cli -- 2.5.0 Colorspace -- 2.0-1 Cowplot -- 1.1.1 Crayon -- 1.4.1 Digest -- 0.6.27 Ellipsis -- 0.3.2 Fansi -- 0.5.0 Farver -- 2.1.0 Ggplot2 -- 3.3.4 Glue -- 1.4.2 GridExtra -- 2.3 Gtable -- 0.3.0 Isoband -- 0.2.4 Labeling -- 0.4.2 Lifecycle -- 1.0.0 Magrittr -- 2.0.1 MASS -- 7.3-54 Mgcv -- 1.8-36 Munsell -- 0.5.0 Pillar -- 1.6.1 R-Pkgconfig -- 2.0.3 R6 -- 2.5.0 RColorBrewer -- 1.1-2 Rlang -- 0.4.11 Scales -- 1.1.1 Tibble -- 3.1.2 Utf8 -- 1.2.1 Vctrs -- 0.3.8 ViridisLite -- 0.4.0 Withr -- 2.4.2 --- reproduce/analysis/make/download.mk | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) (limited to 'reproduce/analysis/make/download.mk') diff --git a/reproduce/analysis/make/download.mk b/reproduce/analysis/make/download.mk index 0ea3432..e652c17 100644 --- a/reproduce/analysis/make/download.mk +++ b/reproduce/analysis/make/download.mk @@ -5,7 +5,7 @@ # recipes in this Makefile all use a single file lock to have one download # script running at every instant. # -# Copyright (C) 2018-2021 Mohammad Akhlaghi +# Copyright (C) 2018-2022 Mohammad Akhlaghi # # This Makefile is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -28,12 +28,12 @@ # -------------------- # # The input dataset properties are defined in -# `$(pconfdir)/INPUTS.conf'. For this template we only have one dataset to +# '$(pconfdir)/INPUTS.conf'. For this template we only have one dataset to # enable easy processing, so all the extra checks in this rule may seem # redundant. # # In a real project, you will need more than one dataset. In that case, -# just add them to the target list and add an `elif' statement to define it +# just add them to the target list and add an 'elif' statement to define it # in the recipe. # # Files in a server usually have very long names, which are mainly designed @@ -48,7 +48,7 @@ # internet, therefore downloading is inherently done in series. As a # result, when more than one dataset is necessary for download, if they are # done in parallel, the speed will be slower than downloading them in -# series. We thus use the `flock' program to tie/lock the downloading +# series. We thus use the 'flock' program to tie/lock the downloading # process with a file and make sure that only one downloading event is in # progress at every moment. $(indir):; mkdir $@ @@ -56,7 +56,7 @@ downloadwrapper = $(bashdir)/download-multi-try inputdatasets = $(foreach i, wfpc2, $(indir)/$(i).fits) $(inputdatasets): $(indir)/%.fits: | $(indir) $(lockdir) - # Set the necessary parameters for this input file. +# Set the necessary parameters for this input file. if [ $* = wfpc2 ]; then localname=$(DEMO-DATA); url=$(DEMO-URL); mdf=$(DEMO-MD5); else @@ -64,13 +64,13 @@ $(inputdatasets): $(indir)/%.fits: | $(indir) $(lockdir) echo; echo; exit 1 fi - # Download (or make the link to) the input dataset. If the file - # exists in `INDIR', it may be a symbolic link to some other place - # in the filesystem. To avoid too many links when using these files - # during processing, we'll use `readlink -f' so the link we make - # here points to the final file directly (note that `readlink' is - # part of GNU Coreutils). If its not a link, the `readlink' part - # has no effect. +# Download (or make the link to) the input dataset. If the file +# exists in 'INDIR', it may be a symbolic link to some other place in +# the filesystem. To avoid too many links when using these files +# during processing, we'll use 'readlink -f' so the link we make here +# points to the final file directly (note that 'readlink' is part of +# GNU Coreutils). If its not a link, the 'readlink' part has no +# effect. unchecked=$@.unchecked if [ -f $(INDIR)/$$localname ]; then ln -fs $$(readlink -f $(INDIR)/$$localname) $$unchecked @@ -80,7 +80,7 @@ $(inputdatasets): $(indir)/%.fits: | $(indir) $(lockdir) $(lockdir)/download $$url $$unchecked fi - # Check the md5 sum to see if this is the proper dataset. +# Check the md5 sum to see if this is the proper dataset. sum=$$(md5sum $$unchecked | awk '{print $$1}') if [ $$sum = $$mdf ]; then mv $$unchecked $@ -- cgit v1.2.1 From 91799fe4b6d62230e99a1520a23a0d30c3eb963e Mon Sep 17 00:00:00 2001 From: Mohammad Akhlaghi Date: Fri, 15 Apr 2022 04:57:54 +0200 Subject: IMPORTANT: more generic, robust and secure INPUTS.conf and download.mk SUMMARY: it is necessary to update your 'INPUTS.conf' and 'download.mk'. Until now, adding an input file involved several steps that needed manual (and inconvenient!) intervention: for every file, you needed to define four variables in 'INPUTS.conf', and in 'reproduce/analysis/make/download.mk' you had to use a (complex for large number of files) shell 'if/elif/else' condition to link the names of the input files to those variables. Besides inconvenience, this could cause bugs (typos!). Furthermore, a basic MD5 checksum was used for verifying the files. With this commit, a new structure has been defined for 'INPUTS.conf' that (thanks to some pretty useful GNU Make features), removes the need for users to manually edit 'reproduce/analysis/make/download.mk', and reduces the number of variables necessary for each file to three (from four). Furthermore, we now use the SHA256 checksum for input data validation. Regarding the trick used in 'INPUTS.conf' (form the newly added description in 'download.mk'): In GNU Make, '.VARIABLES' "... expands to a list of the names of all global variables defined so far" (from the "Other Special Variables" section of the GNU Make manual). Assuming that the pattern 'INPUT-%-sha256' is only used for input files, we find all the variables that contain the input file names (the '%' is the filename). Finally, using the pattern-substitution function ('patsubst'), we remove the fixed string at the start and end of the variable name. Steps you need to take: - INPUTS.conf: translate your old format to the new format (after carefully reading the description in the comments at the start of the file). After applying the new standards, you don't need to use the variables of 'INPUTS.conf' directly in your Makefiles! For example if one of your input datasets is called 'abc.fits', the checksum variable will be 'INPUT-abc.fits-sha256' and in your high-level Makefiles, you can simply set '$(indir)/abc.fits' as a prerequisite (like you probably did already). - reproduce/analysis/make/download.mk: for the definition and rule of 'inputdatasets', simply use the Maneage branch, and remove anything you had added in your project. In the process, I also noticed that 'README-hacking.md' still referred to 'master' as the main project branch, while we have used 'main' in the paper (and is the common convention with Git). --- reproduce/analysis/make/download.mk | 64 ++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 33 deletions(-) (limited to 'reproduce/analysis/make/download.mk') diff --git a/reproduce/analysis/make/download.mk b/reproduce/analysis/make/download.mk index e652c17..6e67962 100644 --- a/reproduce/analysis/make/download.mk +++ b/reproduce/analysis/make/download.mk @@ -27,22 +27,20 @@ # Download input data # -------------------- # -# The input dataset properties are defined in -# '$(pconfdir)/INPUTS.conf'. For this template we only have one dataset to -# enable easy processing, so all the extra checks in this rule may seem -# redundant. +# 'reproduce/analysis/config/INPUTS.conf' contains the input dataset +# properties. In most cases, you will not need to edit this rule (or +# file!). Simply follow the instructions of 'INPUTS.conf' and set the +# variables names according to the described standards. # -# In a real project, you will need more than one dataset. In that case, -# just add them to the target list and add an 'elif' statement to define it -# in the recipe. -# -# Files in a server usually have very long names, which are mainly designed -# for helping in data-base management and being generic. Since Make uses -# file names to identify which rule to execute, and the scope of this -# research project is much less than the generic survey/dataset, it is -# easier to have a simple/short name for the input dataset and work with -# that. In the first condition of the recipe below, we connect the short -# name with the raw database name of the dataset. +# TECHNICAL NOTE on the '$(foreach, n ...)' loop of 'inputdatasets': we are +# using several (relatively complex!) features particular to Make: In GNU +# Make, '.VARIABLES' "... expands to a list of the names of all global +# variables defined so far" (from the "Other Special Variables" section of +# the GNU Make manual). Assuming that the pattern 'INPUT-%-sha256' is only +# used for input files, we find all the variables that contain the input +# file name (the '%' is the filename). Finally, using the +# pattern-substitution function ('patsubst'), we remove the fixed string at +# the start and end of the variable name. # # Download lock file: Most systems have a single connection to the # internet, therefore downloading is inherently done in series. As a @@ -53,16 +51,16 @@ # progress at every moment. $(indir):; mkdir $@ downloadwrapper = $(bashdir)/download-multi-try -inputdatasets = $(foreach i, wfpc2, $(indir)/$(i).fits) -$(inputdatasets): $(indir)/%.fits: | $(indir) $(lockdir) +inputdatasets = $(foreach i, \ + $(patsubst INPUT-%-sha256,%, \ + $(filter INPUT-%-sha256,$(.VARIABLES))), \ + $(indir)/$(i)) +$(inputdatasets): $(indir)/%: | $(indir) $(lockdir) -# Set the necessary parameters for this input file. - if [ $* = wfpc2 ]; then - localname=$(DEMO-DATA); url=$(DEMO-URL); mdf=$(DEMO-MD5); - else - echo; echo; echo "Not recognized input dataset: '$*.fits'." - echo; echo; exit 1 - fi +# Set the necessary parameters for this input file as shell variables +# (to help in readability). + url=$(INPUT-$*-url) + sha=$(INPUT-$*-sha256) # Download (or make the link to) the input dataset. If the file # exists in 'INDIR', it may be a symbolic link to some other place in @@ -72,25 +70,25 @@ $(inputdatasets): $(indir)/%.fits: | $(indir) $(lockdir) # GNU Coreutils). If its not a link, the 'readlink' part has no # effect. unchecked=$@.unchecked - if [ -f $(INDIR)/$$localname ]; then - ln -fs $$(readlink -f $(INDIR)/$$localname) $$unchecked + if [ -f $(INDIR)/$* ]; then + ln -fs $$(readlink -f $(INDIR)/$*) $$unchecked else touch $(lockdir)/download $(downloadwrapper) "wget --no-use-server-timestamps -O" \ $(lockdir)/download $$url $$unchecked fi -# Check the md5 sum to see if this is the proper dataset. - sum=$$(md5sum $$unchecked | awk '{print $$1}') - if [ $$sum = $$mdf ]; then +# Check the checksum to see if this is the proper dataset. + sum=$$(sha256sum $$unchecked | awk '{print $$1}') + if [ $$sum = $$sha ]; then mv $$unchecked $@ echo "Integrity confirmed, using $@ in this project." else echo; echo; - echo "Wrong MD5 checksum for input file '$$localname':" + echo "Wrong SHA256 checksum for input file '$*':" echo " File location: $$unchecked"; \ - echo " Expected MD5 checksum: $$mdf"; \ - echo " Calculated MD5 checksum: $$sum"; \ + echo " Expected SHA256 checksum: $$sha"; \ + echo " Calculated SHA256 checksum: $$sum"; \ echo; exit 1 fi @@ -104,4 +102,4 @@ $(inputdatasets): $(indir)/%.fits: | $(indir) $(lockdir) # It is very important to mention the address where the data were # downloaded in the final report. $(mtexdir)/download.tex: $(pconfdir)/INPUTS.conf | $(mtexdir) - echo "\\newcommand{\\wfpctwourl}{$(DEMO-URL)}" > $@ + echo "\\newcommand{\\wfpctwourl}{$(INPUT-wfpc2.fits-url)}" > $@ -- cgit v1.2.1