paper-concept.git - Paper (Towards Long-term and Archivable Reproducibility)

Age	Commit message (Collapse)	Author	Lines
2020-08-28	Plain text editors: nano in basic, emacs and vim in high-level	Mohammad Akhlaghi	-7/+74
	While a project is under development, the raw analysis software are not the only necessary software in a project. We also need tools to all the edit plain-text files within the Maneaged project. Usually people use their operating system's plain-text editor. However, when working on the project on a new computer, or in a container, the plain-text editors will have different versions, or may not be present at all! This can be very annoying and frustrating! With this commit, Maneage now installs GNU Nano as part of the basic tools. GNU Nano is a very simple and small plain text editor (the installed size is only ~3.5MB, and it is friendly to new users). Therefore, any Maneaged project can assume atleast Nano will be present (in particular when no editor is available on the running system!). GNU Emacs and VIM (both without extra dependencies, in particular without GUI support) are also optionally available in 'high-level.mk' (by adding them to 'TARGETS.conf'). The basic idea for the more advanced editors (Emacs and VIM) is that project authors can add their favorite editor while they are working on the project, but upon publication they can remove them from 'TARGETS.conf'. A few other minor things came up during this work and are now also fixed: - The 'file' program and its libraries like 'libmagic' were linking to system's 'libseccomp'! This dependency then leaked into Nano (which depends on 'libmagic'). But this is just an extra feature of 'file', only for the Linux kernel. Also, we have no dependency on it so far. So 'file' is not configured to not build with 'libseccomp'. - A typo was fixed in the line where the physical core information is being read on macOS. - The top-level directories when running './project shell' are now quoted (in case they have special characters).
2020-08-27	Machine architecture and byte-order available as LaTeX macro	Mohammadreza Khellat	-129/+232
	Until now, no machine-related specifications were being documented in the workflow. This information can become helpful when observing differences in the outcome of both software and analysis segments of the workflow by others (some software may behave differently based on host machine). With this commit, the host machine's 'hardware class' and 'byte-order' are collected and now available as LaTeX macros for the authors to use in the paper. Currently it is placed in the acknowledgments, right after mentioning the Maneage commit. Furthermore, the project and configuration scripts are now capable of dealing with input directory names that have SPACE (and other special characters) by putting them inside double-quotes. However, having spaces and metacharacters in the address of the build directory could cause build/install failure for some software source files which are beyond the control of Maneage. So we now check the user's given build directory string, and if the string has any '@', '#', '$', '%', '^', '&', '*', '(', ')', '+', ';', and ' ' (SPACE), it will ask the user to provide a different directory.
2020-08-20	Imported recent updates in Maneage, minor conflicts fixed	Mohammad Akhlaghi	-354/+524
	Some very minor conflicts came up and were easily corrected. They were mostly in parts that are also shared with the demonstration in the core Maneage branch.
2020-08-17	Minor typo correction in comments of paper.mk	Mohammad Akhlaghi	-1/+1
	The '.bbl' suffix in the comment of one call to LaTeX was incorrectly written as '.bb'.
2020-08-08	Software tarballs saved as symlinks if already in filesystem	Mohammad Akhlaghi	-5/+18
	Until now, if the software source tarballs already existed on the system they would be copied inside the project. However, the software source tarballs are sometimes/mostly larger than their actual product and can consume significant space (~375 MB in the core branch!). With this commit, when the software are present on the system, their symbolic link will be placed in 'BDIR/software/tarballs', not a full copy. Also, because the tarballs in software tarball directory may themselves be links, we use 'realpath' to find the final place of the actual file and link to that location. Therefore if 'realpath' can't be found (prior to installing Coreutils in Maneage), we will copy the tarballs from the given software tarball directory. After Maneage has installed Coreutils, the project's own 'realpath' will be used. Of course, if the software are downloaded, their full downloaded copy will be kept in 'BDIR/software/tarballs', nothing has changed in the downloading scenario.
2020-08-08	IMPORTANT: New software versions (17 basic, 16 high-level and 7 Python)	Mohammad Akhlaghi	-170/+187
	It was a long time that the Maneage software versions hadn't been updated. With this commit, the versions of all basic software were checked and 17 of that had newer versions were updated. Also, 16 high-level programs and libraries were updated as well as 7 Python modules. The full list is available below. Basic Software (affecting all projects) --------------------------------------- bash 5.0.11 -> 5.0.18 binutils 2.32 -> 2.35 coreutils 8.31 -> 8.32 curl 7.65.3 -> 7.71.1 file 5.36 -> 5.39 gawk 5.0.1 -> 5.1.0 gcc 9.2.0 -> 10.2.0 gettext 0.20.2 -> 0.21 git 2.26.2 -> 2.28.0 gmp 6.1.2 -> 6.2.0 grep 3.3 -> 3.4 libbsd 0.9.1 -> 0.10.0 ncurses 6.1 -> 6.2 perl 5.30.0 -> 5.32.0 sed 4.7 -> 4.8 texinfo 6.6 -> 6.7 xz 5.2.4 -> 5.2.5 Custom programs/libraries ------------------------- astrometrynet 0.77 -> 0.80 automake 0.16.1 -> 0.16.2 bison 3.6 -> 3.7 cfitsio 3.47 -> 3.48 cmake 3.17.0 -> 3.18.1 freetype 2.9 -> 2.10.2 gdb 8.3 -> 9.2 ghostscript 9.50 -> 9.52 gnuastro 0.11 -> 0.12 libgit2 0.28.2 -> 1.0.1 libidn 1.35 -> 1.36 openmpi 4.0.1 -> 4.0.4 R 3.6.2 -> 4.0.2 python 3.7.4 -> 3.8.5 wcslib 6.4 -> 7.3 yaml 0.2.2 -> 0.2.5 Python modules -------------- cython 0.29.6 -> 0.29.21 h5py 2.9.0 -> 2.10.0 matplotlib 3.1.1 -> 3.3.0 mpi4py 3.0.2 -> 3.0.3 numpy 1.17.2 -> 1.19.1 pybind11 2.4.3 -> 2.5.0 scipy 1.3.1 -> 1.5.2
2020-08-08	Configuration fail if gfortran necessary, but not built or available	Boud Roukema	-25/+41
	When the host C compiler is used (either by calling '--host-cc' or on OSs that we can't build the GNU C Compiler), Maneage will also not build the Fortran compiler 'gfortran'. Until now, the './project configure' script would give a big warning about the need for 'gfortran' and the fact that it is missing, and would for 5 seconds, but it would continue anyway. For projects that don't need 'gfortran', this can be confusing to the users and for those that need 'gfortran', it means that a lot of time and cpu cycles are wasted compiling non-fortran software that are unusable in the end. With this commit, the 'need_gfortarn' variable has been added 'reproduce/software/shell/configure.sh', in a new part that is devoted to project-specific features. If it equals '0', then the 'gfortran' test (and message!) isn't done at all, but if it is set to '1', then the configure stage will halt immediately gfortran is not found and not built. The default operations of the core Maneage branch don't need 'gfortran', so by default it is set to 0. But 'gfortran' is necessary for all projects that use Numpy (Python's numeric library) for example. So if your project needs 'gfortran', please set this new variable to 1. As mentioned in the comments of 'configure.sh', ideally we should detect this automatically, but we haven't had the time to implement it yet.
2020-08-02	initialize.mk: accounting for no maneage branch	Boud Roukema	-3/+8
	One of the LaTeX macros reported by 'initialize.mk' is the git commit hash of the most recent 'maneage' branch that the project has been branched from. However, not all projects will retain the maneage reference. This can happen for example when people don't push the 'maneage' reference to their repository and then clone from their own repository to a second computer. Therefore, until now, in such situations, Maneage would break with an error. With this commit, in such scenarios, a place holder string is used instead, clearly highlighting that there is no 'maneage' reference.
2020-08-02	OpenMPI build with slurm compatibility	Boud Roukema	-1/+4
	Prior to this commit, compilation of OpenMPI used the default OpenMPI choices of deciding which libraries should be used in relating to a job scheduler [1] (such as Slurm [2]). Given that the user on a multi-user cluster has to accept the sysadmin's choice of a job scheduler, the question of whether to (1) link with OpenMPI's own libraries (and increase the reproducibility of the science project) or rather (2) link with the sysadmin managed libraries (more likely to be compatible with the host's job scheduler), is an open question of which the best strategy for reproducibility needs to be debated and studied. In this commit, strategy (1) is adopted. The options '--withpmix=internal' and '--with-hwloc=internal' are added to the configure command. The working assumption is that the Maneage version of OpenMPI is likely to be modern enough to be compatible with the native job scheduler such as Slurm. Compilation without any 'pmix' option gave a fail in at least one case; it appears that an external pmix library was sought by the configure script. As of OpenMPI 4.0.1, the internal libevent library is used by default, so there appears to be no option to force it to be chosen internally. This commit also includes the option '--without-verbs'. This option removes a library related to "infiniband", "verbs", "openib" and "BTL"; this library appears to be deprecated. See [3], [4] for discussion. Please add feedback and discussion to the Maneage task about openmpi linking strategies (1) (internal) and (2) (external) at Savannah [5]. [1] https://en.wikipedia.org/wiki/Job_scheduler#Batch_queuing_for_HPC_clusters [2] https://en.wikipedia.org/wiki/Slurm_Workload_Manager - To avoid a name clash, 'slurm-wlm' is the metapackage in Debian for the client commands, the compute node daemon, and the central node daemon. An unrelated package 'slurm' also exists. [3] https://www-lb.open-mpi.org/faq/?category=openfabrics#ofa-device-error [4] https://www-lb.open-mpi.org/faq/?category=building [5] https://savannah.nongnu.org/task/index.php?15737
2020-07-21	Printing location when downloaded input data checksum is different	Boud Roukema	-0/+1
	There are many different directory trees involved in Maneage system: the top directory, the 'reproduce/' directory and its sub-directories, '.build/' (that point to a user-defined build area), and a possibly user-defined input directory. Until now, in the case of a download checksum failure, it was not immediately obvious [1] to the user where the file with a failed checksum is. To clarify to the user where the suspicious file is now located, this commit adds a line to 'reproduce/analysis/make/download.mk' to print out this full path location: '$$unchecked' along with the expected and calculated checksums. [1] Euphemism for me spending lots of time debugging and being confused.
2020-07-20	README-hacking.md: clarify Zenodo usage in publication checklist	Boud Roukema	-2/+2
	This commit clarifies the initial usage of Zenodo for reserving a Zenodo identifier and starting an 'unpublished' upload. Some other minor wording changes are done here.
2020-07-20	make dist: only archive files that are under version control	Boud Roukema	-17/+31
	Until this commit, the '$(project-package-contents)' rules in 'reproduce/analysis/make/initialize.mk' included a line to provide all contents, recursively, of the directory 'reproduce/' in the package for further distribution. This could potentially lead to the distribution of private working files that are used during development and not intended for general distribution. With this commit, only those files in 'reproduce/' and 'tex/src' that are under version control are copied to the temporary directory (that is later used for creating an archive). With this change, the archiving commands actually became more clean (we don't have to manually remove 'LOCAL.conf' or other temporary files). Extensive comments have also been added above each step to clarify each step's purpose and method.
2020-07-07	Fixed typo that lead to crash when building healpy	Marius Peper	-1/+1
	Until now, if a project needed the healpy software package, Maneage would crash with the following error message (abridged for full name in build directory). This was caused by a typo in the version of 'healpix' (the dependency of 'healpy'). make: *** No rule to make target '.../version-info/proglib/healpix-' With this commit, the typo in line 334 of 'python.mk' is fixed, so that when '$(ipydir)/healpy-$(healpy-version)' gets called it correctly searches for a rule to make '$(ibidir)/healpix-$(healpix-version)'.
2020-07-07	Project distribution tarball can account for no PDFs in tex/tikz	Mohammad Akhlaghi	-1/+6
	Until now the './project make dist' command implicitly assumed that the 'tex/tikz' directory always contains PDF files (because of the 'cp tex/tikz/*.pdf $$dir/tex/tikz' line). This was annoying for projects that don't use TiKZ or PGFPlots to generate their plots, and they had to manually comment this line. With this commit a check has been placed to see if any PDF files exist in there at all. If there aren't PDF files, the 'cp' command above is ignored.
2020-07-05	Configure script prefers clang for macOS systems	Mohammad Akhlaghi	-96/+129
	In the previous commit (Commit 1bc00c9: Only using clang in macOS systems that also have GCC) we set the used C compiler for high-level programs to be 'clang' on macOS systems. But I forgot to do the same kind of change in the configure script (to prefer 'clang' when we are testing for a C compiler on the host). With this commit, the compiler checking phases of the configure script have been improved, so on macOS systems, we now first search for 'clang', then search for 'gcc'. While doing this, I also noticed that the 'rpath' checking command was done before we actually define 'instdir'!!! So in effect, the 'rpath' directory was being set to '/lib'! So with this commit, this test has been taken to after defining 'instdir'.
2020-07-05	Removing possibly existing paper.bbl before remaking it	Mohammad Akhlaghi	-0/+6
	Until now, when the bibliography file ('paper.bbl') had a LaTeX-related error (for example the journal name was a LaTeX macro that isn't defined), the first 'pdflatex' command that is run before 'biber' would crash, not allowing the project to reach 'biber'. So the user would have to manually remove 'paper.bbl' before running './project make'. With this commit, we remove any possibly existing 'paper.bbl' file before rebuilding it. Generally, this also helps in keeping things clean during the generation of the new bibliography. This bug was found by Mahdieh Nabavi.
2020-07-05	Only using clang in macOS systems that also have GCC	Mohammad Akhlaghi	-27/+58
	Until now, when Maneage was built on a macOS that had both a clang and GCC, we would make links to both. But this cause many conflicts in some high-level programs (for example Numpy and etc, all the programs where we have explicity set 'export CC=clang' before the build recipe). This happens because the GCC that is built on a macOS isn't complete for some operations. To fix this problem, when we are on a macOS, we explicity set 'gcc' to point to 'clang' and 'g++' to point to 'clang++'. We also don't link to the host's C-preprocessor ('cpp') on macOS systems because this is only a GNU feature and using the GNU CPP is also known to have some basic problems. For example this was reported by Mahdieh Nabavi (which was the main trigger for this work): ld: Symbol not found: ___keymgr_global Referenced from: /Users/Mahdieh/build/software/installed/bin/cpp Expected in: /usr/lib/libSystem.B.dylib Also, to avoid linking to another link on the host tools (in the 'makelink' function of 'basic.mk'), we are now using 'realpath'.
2020-07-04	Commit hash of Maneage branch used to build project as LaTeX macro	Mohammad Akhlaghi	-0/+6
	To help in the documentation, the Git hash of the Maneage branch commit that the project has most recently merged with (or branched from) is now also provided as a LaTeX macro ('\maneageversion'). It is calculated in 'reproduce/analysis/make/initialize.mk' (in the recipe to 'initialize.tex').
2020-07-04	Better names and comments in INPUTS.conf	Mohammad Akhlaghi	-28/+32
	Until now, the dataset's configuration names had a 'WFPC2' prefix. But this very alien to anyone that is not familiar with the history of the Hubble Space Telescope (the camera is no longer used! Its just used here since its one of the standard FITS files from the FITS standard webpage). With this commit the variable names have been modified to be more readable and clear (having a 'DEMO-' prefix). Also the comments of 'INPUTS.conf' (describing the purpose of each variable) were edited and made more clear.
2020-07-04	Improved comments in paper.mk and README.md	Mohammad Akhlaghi	-1/+5
	In 'README.md' I tried to explain a little better that TeXLive will only install its necessary packages, not the full TeXLive library! Also in paper.mk, I slightly improved the comments with very minor edits. Both these parts are slated to go into the core Maneage branch, so its important to maintain them here for now.
2020-07-01	Properly accounting for space characters in host's PATH	Mohammad Akhlaghi	-12/+20
	Until now, when reading the host's PATH environment variable we weren't accounting for directory names with a space character. This was most prominently visible in the 'low-level-links' step where we put links to some core system components into the project's build directory (mainly for prorietary systems like macOS). To address the problem, double quotations have been placed around the part that we extract 'ccache' from the PATH, and the part where we make the symbolic link. In the process the comments above 'makelink' were made more clear and 'low-level-links' now depends on 'grep' (which is the highest-level program it uses). This bug was reported by Mahdieh Navabi.
2020-07-01	Minor typo corrected in referencing Libidn	Raul Infante-Sainz	-1/+2
	Until this commit, once Libidn was installed, insted of its own name and version, the name and version of Libjpeg were saved (in the target if Libidn). This robably come from a copy/paste of the rule. With this commit, this minor bug has been corrected. I also added my name as an author of `reproduce/software/make/xorg.mk' Makefile since I added some code there.
2020-06-30	Imported Maneage infrastructure, no conflicts	Mohammad Akhlaghi	-3/+5
	There weren't any conflicts in this merge.
2020-06-30	Proper deletion of util-linux source after successfully building it	Mohammad Akhlaghi	-1/+3
	After recently adding util-linux to Maneage build-tree, we had forgot to delete the unpacked and built source directory after it was installed! This has been corrected with this commit.
2020-06-30	Entered data and software directories stored as absolute addresses	Mohammad Akhlaghi	-2/+2
	Until now, when the user specified an input and software directory, the raw string they entered was used. But when this string was a relative location, this could be problematic in general scenarios. With this commit, the same function that finds the absolute location of the build directory is used to find the absolute address of the data and software directories.
2020-06-30	The distclean target accounts for non-existance of git hooks	Mohammad Akhlaghi	-1/+1
	Until now, when the user wanted to complete remove all built files (including software), the './project make distclean' command would fail if the git hooks weren't installed. They are present when the project's configuration has been successfully finished, but this bug can happen when trying to re-do an incomplete build. With this commit, this is fixed by adding an '-f' has been added before the 'rm' command for the Git hooks. This commit was also done in the core Maneage branch.
2020-06-30	The distclean target accounts for non-existance of git hooks	Mohammad Akhlaghi	-1/+1
	Until now, when the user wanted to complete remove all built files (including software), the './project make distclean' command would fail if the git hooks weren't installed. They are present when the project's configuration has been successfully finished, but this bug can happen when trying to re-do an incomplete build. With this commit, this is fixed by adding an '-f' has been added before the 'rm' command for the Git hooks.
2020-06-30	Imported recent improvements in Maneage, no conflicts	Mohammad Akhlaghi	-52/+382
	There weren't any conflicts in this merge.
2020-06-30	Core Xorg libraries necessary for Ghostscript now included	Mohammad Akhlaghi	-50/+380
	Until now, in order to build Ghostscript, the project used the host's Xorg libraries. This was because we hadn't yet added the necessary build rules for them. With this commit, the instructions to build the necessary Xorg libraries for Ghostscript have also been added. Also, the shared Ghostscript library has been built with this commit and two sets of standard fonts are also included, setting us on the path to build TeXLive from source later. This task was done with the help and support of Raul Infante-Sainz.
2020-06-28	Bison installation on macOS fixed by updating to version 3.6	Raul Infante-Sainz	-2/+2
	Until this commit, there was a problem when building Bison in parallel in macOS systems. With this commit, this problem has been fixed by updating Bison to its most recent version (3.6).
2020-06-28	Zenodo identifier is extracted automatically from metadata.conf	Mohammad Akhlaghi	-1/+5
	Until now, the Zenodo identifier was manually written in the paper. But now we have the Zenodo DOI in 'metadata.conf', so its much more robust to get it from there (in case updated versions of the paper is published).
2020-06-27	Imported recent work in master, minor conflict fixed in paper.mk	Mohammad Akhlaghi	-2224/+2602
	Only two conflicts came up in the newly added comments of 'paper.mk' in the Maneage branch. It happened because in this project we don't use 'pdflatex', but 'latex' alone.
2020-06-27	IMPORTANT: many improvements to low-level software building phase	Mohammad Akhlaghi	-2171/+2405
	POSSIBLE EFFECT ON YOUR PROJECT: The changes in this commit may only cause conflicts to your project if you have changed the software building Makefiles in your project's branch (e.g., 'basic.mk', 'high-level.mk' and 'python.mk'). If your project has only added analysis, it shouldn't be affected. This is a large commit, involving a long series of corrections in a differnt branch which is now finally being merged into the core Maneage branch. All changes were related and came up naturally as the low-level infrastructure was improved. So separating them in the end for the final merge would have been very time consuming and we are merging them as one commit. In general, the software building Makefiles are now much more easier to read, modify and use, along with several new features that have been added. See below for the full list. - Until now, Maneage needed the host to have a 'make' implementation because Make was necessary to build Lzip. Lzip is then used to uncompress the source of our own GNU Make. However, in the minimalist/slim versions of operating systems (for example used to build Docker images) Make isn't included by default. Since Lzip was the only program before our own GNU Make was installed, we consulting Antonio Diaz Diaz (creator of Lzip) and he kindly added the necessary functionality to a new version of Lzip, which we are using now. Hence we don't need to assume a Make implementation on the host any more. With this commit, Lzip and GNU Make are built without Make, allowing everything else to be safely built with our own custom version of GNU Make and not using the host's 'make' at all. - Until recently (Commit 3d8aa5953c4) GNU Make was built in 'basic.mk'. Therefore 'basic.mk' was written in a way that it can be used with other 'make' implementations also (i.e., important shell commands starting with '&&' and ending in '\' without any comments between them!). Furthermore, to help in style uniformity, the rules in 'high-level.mk' and 'python.mk' also followed a similar structure. But due to the point above, we can now guarantee that GNU Make is used from the very first Makefile, so this hard-to-read structure has been removed in the software build recipes and they are much more readable and edit-friendly now. - Until now, the default backup servers where at some fixed URLs, on our own pages or on Gitlab. But recently we uploaded all the necessary software to Zenodo (https://doi.org/10.5281/zenodo.3883409) which is more suitable for this task (it promises longevity, has a fixed DOI, while allowing us to add new content, or new software tarball versions). With this commit, a small script has been written to extract the most recent Zenodo upload link from the Zenodo DOI and use it for downloading the software source codes. - Until now, we primarily used the webpage of each software for downloading its tarball. But this caused many problems: 1) Some of them needed Javascript before the download, 2) Some URLs had a complex dependency on the version number, 3) some servers would be randomly down for maintenance and etc. So thanks to the point above, we now use the Zenodo server as the primary download location. However, if a user wants to use a custom software that is not (yet!) in Zenodo, the download script gives priority to a custom URL that the users can give as Make variables. If that variable is defined, then the script will use that URL before going onto Zenodo. We now have a special place for such URLs: 'reproduce/software/config/urls.conf'. The old URLs (which are a good documentation themselves) are preserved here, but are commented by default. - The software source code downloading and checksum verification step has been moved into a Make function called 'import-source' (defined in the 'build-rules.mk' and loaded in all software Makefiles). Having taken all the low-level steps there, I noticed that there is no more need for having the tarball as a separate target! So with this commit, a single rule is the only place that needs to be edited/added (greatly simplifying the software building Makefiles). - Following task #15272, A new option has been added to the './project' script called '--all-highlevel'. When this option is given, the contents of 'TARGETS.conf' are ignored and all the software in Maneage are built (selected by parsing the 'versions.conf' file). This new option was added to confirm the extensive changes made in all the software building recipes and is great for development/testing purposes. - Many of the software hadn't been tested for a long time! So after using the newly added '--all-highlevel', we noticed that some need to be updated. In general, with this commit, 'libpaper' and 'pcre' were added as new software, and the versions of the following software was updated: 'boost', 'flex', 'libtirpc', 'openblas' and 'lzip'. A 'run-parts.in' shell script was added in 'reproduce/software/shell/' which is installed with 'libpaper'. - Even though we intentionally add the necessary flags to add RPATH inside the built executable at compilation time, some software don't do it (different software on different operating systems!). Until now, for historical reasons this check was done in different ways for different software on GNU/Linux sytems. But now it is unified: if 'patchelf' is present we apply it. Because of this, 'patchelf' has been put as a top-level prerequisite, right after Tar and is installed before anything else. - In 'versions.conf', GNU Libtool is recognized as 'libtool', but in 'basic.mk', it was 'glibtool'! This caused many confusions and is corrected with this commit (in 'basic.mk', it is also 'libtool'). - A new argument is added to the './project' script to allow easy loading of the project's shell and environment for fast/temporary testing of things in the same environment as the project. Before activating the project's shell, we completely remove all host environment variables to simulate the project's environment. It can be called with this command: './project shell'. A simple prompt has also been added to highlight that the user is using the Maneage shell!
2020-06-25	Check if there is enough available in selected build directory	Pedram Ashofteh Ardakani	-2/+45
	Until now, Maneage would accept the given build directory, regardless of the free memory available there. This could cause confusing situations for new users who don't know about the minimum storage requirement. With this commit, after all other checks on the given build directory are completed, the configure script will check the available space and warns the user if there is less than almost 5GB free space available in the build directory (with a 5 second delay). It won't cause a crash because some projects may require roughly smaller than this space (the default only needs roughly 2GB). But we also don't want the host's partition to get too close to being full, causing them problems elsewhere. We can change the behavior as desired in future commits.
2020-06-19	Removing preparation-done.mk when cleaning by ./project make clean	Raul Infante-Sainz	-0/+1
	Until this commit, the file `BDIR/software/preparation-done.mk' were not removed when cleaning the project with `./project make clean'. This file is generated in the preparation of the data during the analysis step. However, the cleaning is expected to remove anything generated in the analysis process! Step by step, with the commands: ./project make ---> Will make the preparation and analysis ./project make clean ---> Will remove all analysis outputs (but not `preparation-done.mk') ./project make ---> Won't do the preparation, only analysis! However, in the last step it should do the preparation again, because the input data could have change for any reason. With this commit, the file `BDIR/software/preparation-done.mk' is removed when cleaning the project, and consequently, in the analysis step the input data is prepared.
2020-06-18	Fixed small bug that was introduced four commits ago	Raul Infante-Sainz	-4/+4
	In Commit 105467fe6402 (Software tarballs are downloaded even if not built), we introduced tests to download the tarballs of software even if they don't need to be built on the respective host. However some small typos in the checks existed that could cause a crash on macOS. In particular in the building of PatchELF and libbsd we had forgot to add the necessary 'x' before the 'yes' in the conditional to check if a we are on macOS or not. With this commit these two checks have been corrected. Also, in the building of 'isl' and 'mpc', we now check for 'host_cc' (signifying that the user wants to use their host C compiler for the high-level step) instead of 'on_mac_os'. The reason is that even on non-macOS systems, a user may not want to build the C compiler from scratch and use the '--host-cc' option. In such cases, they don't need to compile 'isl' and 'mpc'.
2020-06-17	Text surrounding software acknowledgements as a configuration file	Boud Roukema	-11/+124
	Until now, the English texts that embeds the list of software to acknowledge in the paper was hard-wired into the low-level coding ('reproduce/software/shell/configure.sh' to be more specific). But this file is very low-level, thus discouraging users to modify this surrounding text. While the list of software packages can be considered to be 'data' and is fixed, the surrounding text to describe the lists is something the authors should decide on. Authors of a scientific research paper take responsibility for the full paper, including for the style of the acknowledgments, even if these may well evolve into some standard text. With this commit, authors who do not modify 'reproduce/software/config/acknowledge_software.sh' will have a default text, with only a minor English correction from earlier versions of Maneage. However, Authors choosing to use their own wording should be able to modify the text parameters in `reproduce/software/config/acknowledge_software.sh` in the obvious way. This is much more modular than asking project authors to go looking into the long and technical 'configure.sh' script. Systematic issues: the file `reproduce/software/config/acknowledge_software.sh` is an executable shell script, because it has to be called by `reproduce/software/shell/configure.sh`, which, in principle, does not yet have access to `GNU make` (if I understand the bootstrap sequence correctly). It is placed in `config/` rather than `shell/`, because the user will expect to find configuration files in `config/`, not in `shell/`. A possible alternative to avoid having a shell script as a configure file would be to let `reproduce/software/config/acknowledge_software.sh` appear to be a `make` file, but analyse it in `configure.sh` using `sed` to remove whitespace around `=`, and adding other hacks to switch from `make` syntax to `shell` syntax. However, this risks misleading the user, who will not know whether s/he should follow `make` conventions or `shell` conventions.
2020-06-17	Security risk of LaTeX's -shell-escape option explained in comment	Boud Roukema	-0/+9
	The 'pdflatex' program is used to build the default Maneage-branch paper. But since the default paper uses PGFPlots to build the figures within LaTeX as an external PDF, PGFPlots requires 'pdflatex' to be called with the '-shell-escape' option. Generally, this option can be considered as a security risk (in particular when 'pdflatex' is being run by an external LaTeX file: a malicious LaTeX writer may embed commands in the LaTeX source that will be executed on the host if this option is present). This is not too serious of an issue in Maneage, because when someone runs Maneage, they intentionally let it run many on their system. Hence if someone wants to exploit a host system, they can add the necessary commands long before 'pdflatex' is run. After all, all commands in Maneage are run with the calling user's permissions, hence they have access to many parts of the user's accounts. If someone is worried about security on a non-trusted Maneage project they should act the same as they do with any software: define a new user for it, and call it with that user (as a weak-level security), or run it in a virtual machine or container. However, since this option has been explicity mentioned as a security risk before, it helps if we have a comment explaining its usage in 'paper.mk'. With this commit, the concerned user will read a brief explanation and can read the brief discussion at [1] and possibly re-open the discussion or propose ways of mitigating the security risk(s). [1] https://savannah.nongnu.org/task/?15694
2020-06-17	Software tarballs are downloaded even if not built	Mohammad Akhlaghi	-56/+36
	Some low-level software aren't necessary on some operating systems, for example GCC can't be built on macOS, hence we don't build it and the GCC-only dependencies. Also, on GNU/Linux systems users could configure with '--host-cc' to avoid all the time it takes to build GCC when doing a fast test. Until now, in such cases not only was the software not installed, but the tarballs of the software were also not downloaded. Hence making the output of '--dist-software' incomplete (as in bug #58561). With this commit, we now import all the necessary tarballs, when the software isn't necessary for the particular system, it won't be built or cited, but its tarball will be present anyway, thus allowing the output of '--dist-software' to be complete.
2020-06-17	New target --dist-software to package all necessary software tarballs	Mohammad Akhlaghi	-6/+2
	When publishing a project, it is necessary to also publish the source code of all necessary software of the project. We had recently added a new './project make' target called 'dist-software' for this job, but had forgotten to add it in the output of './project --help'! There was also a small bug inside of it that didn't allow the successful copying of the created tarball to the top project directory. With this commit, an explanation for this target has been added in the output of './project --help' and that bug has been fixed.
2020-06-17	Corrected symbolic link to Gnuastro's configuration files	Mohammad Akhlaghi	-1/+1
	Until now, when making the link to Gnuastro's configuration files, the 'configure.sh' script would incorrectly link to the old configuration directory under the 'reproduce/software' directory. With this commit, it is moved to the proper directory under 'reproduce/analysis'.
2020-06-16	Imported recent improvements in Maneage, minor conflicts fixed	Mohammad Akhlaghi	-30/+56
	Some minor conflicts came up in 'download.mk' and 'verify.mk' (as expected in the "IMPORTANT" commits) and were easily fixed by choosing this branch's values.
2020-06-16	XLSX I/O properly accounts for local build	Mohammad Akhlaghi	-2/+2
	Until now, when adding the necessary library flags to the build of XLSX I/O, we were effectively over-writing the 'LDFLAGS' variables. So the compiler was effectively not being told where to look for the necessary libraries. With this commit, to fix the problem, we now append the new linking flags to LDFLAGS in XLSX I/O's build, not over-write it.
2020-06-15	OpenSSL now built after Perl	Mohammad Akhlaghi	-1/+1
	After trying a clean build of Maneage in a Docker image (with a minimal debian:stable-20200607-slim OS), I noticed that the building of OpenSSL is failing because it doesn't find the proper Perl functionality. To fix it, with this commit, Perl is set as a prerequisite of OpenSSL and this fixed the problem.
2020-06-15	Configure script now accounts for non-interactive shells	Mohammad Akhlaghi	-6/+33
	The project configuration requires a build-directory at configuration time, two other directories can optionally be given to avoid downloading the project's necessary data and software. It is possible to give these three directories as command-line options, or by interactively giving them after running the configure script. Until now, when these directories weren't given as command-line options, and the running shell was non-interactive, the configure script would crash on the line trying to interactively read the user's given directories (the 'read' command). With this commit, all the 'read' commands for these three directories are now put within an 'if' statement. Therefore, when 'read' fails (the shell is non-interactive), instead of a quiet crash, a descriptive message is printed, telling the user that cause of the problem, and suggesting a fix. This bug was found by Michael R. Crusoe.
2020-06-14	Better description for input data directory, pointing to INPUTS.conf	Mohammad Akhlaghi	-19/+13
	Until now, the description of the input-data directory at configure time included a description of the input data (created by reading the values of 'INPUTS.conf'). Maintaining this is easy for a single dataset, but it becomes hard for a general project which may need many input datasets. To avoid extra complexity (for maintaining this list), the description now points a user of the project to the 'INPUTS.conf' file and asks them to look inside of it for seeing the necessary data. This infact helps with the users becoming familiar with the internal structure of Maneage and will allow the authors to focus on not having to worry about updating the low-level 'configure.sh' script.
2020-06-14	Better explanation in the start of project configuration	Mohammad Akhlaghi	-3/+7
	When './project configure' is run, after the basic checks of the compiler, a small statement is printed telling the user that some configuration questions will now be asked to start building Maneage on the system. Until now this description was confusing: it lead the reader to think that the local configuration (which was recommended to read before continuing) is in another file. With this commit, the text has been edited to explictly mention that the description of the steps following this notice should be read carefully. Thus avoiding that confusion. This issue was mentioned by Michael R. Crusoe.
2020-06-10	Corrected bug in using local copy of input dataset	Mohammad Akhlaghi	-13/+47
	As described in Maneage's commit 2bd2e2f18 (which I found while testing this project), the existing download recipe had problems when using a local copy of the input dataset. It was first fixed here, then implemented there. Also, to clarify things for a new user, some long comments were added at the top of 'INPUTS.conf' to describe each of the variables, that comment has also been put here (and is also in commit 2bd2e2f18 of Maneage).
2020-06-10	IMPORTANT: bug fix in default data download script of download.mk	Mohammad Akhlaghi	-14/+54
	Summary of possible semantic conflicts 1. The recipe to download input datasets has been modified. You have to re-set the old 'origname' variable to 'localname' (to avoid confusion) and the default dataset URL should now be complete (including the actual filename). See the newly added descriptions in 'INPUTS.conf' for more on this. Until now, when the dataset was already present on the host system, a link couldn't be made to it, causing the project to crash in the checksum phase. This has been fixed with properly naming the main variable as 'localname' to avoid the confusion that caused it. Some other problems have been fixed in this recipe in the meantime: - When the checksum is different, the expected and calculated checksums are printed. - In the default paper, we now print the full URL of the dataset, not just the server, so the checksum of the 'download.tex' step has been updated.
2020-06-09	Minor edit printing arXiv URL in plain text metadata	Mohammad Akhlaghi	-1/+1
	Until now, in the 'print-copyright' function of 'initialize.mk' (that prints a fixed set of common meta necessary in plain-text files), we were simply printing this line: # Pre-print server: arXiv:1234.56789 But given that all the other elements are click-able URLs, it now prints: # Pre-print server: https://arxiv.org/abs/1234.56789