paper-concept.git - Paper (Towards Long-term and Archivable Reproducibility)

Age	Commit message (Collapse)	Author	Lines
2020-04-13	Installation year removed from TeXLive installation	Mohammad Akhlaghi	-1/+1
	TeXLive recently transitioned from its 2019 version to its 2020 version thanks to Elham Saremi's trial of the this project. The fact that traditionally Maneage installs all TeXLive packages in a per-year directory is very annoying and required an update in the core Maneage system every year. So I suddently recognized that we can fix this by setting a different name for the directory holding the release year. This has been implemented with this commit. I have also done this change in the main Maneage branch for other projects to also benefit from this correction.
2020-04-13	Full existing contents summaried, only discussion to go	Mohammad Akhlaghi	-0/+8
	The contents until two commits ago when I started to summarize the paper are now in a new and shorter format: previously the discussion started on page 25, but now it starts on page 17. It is still a little longer than 8000 words, but not as significantly as before. I will add the discussion and also try to summarize it futher before submission.
2020-04-02	Imported recent work on Maneage, minor conflicts fixed	Mohammad Akhlaghi	-33/+51
	A few minor conflicts occurred and were fixed.
2020-03-23	Analysis and configuration file sections complete	Mohammad Akhlaghi	-18/+74
	With this commit a description of these two important parts have been added to the project, along with several figures showing various parts of the files that are discussed. I also done some other restructuring of the figures and files to make things fit better into the the description of the paper.
2020-03-08	Menke+20 example: properly count number of papers with software	Mohammad Akhlaghi	-4/+15
	Until now, I was mistakenly multiplying the fraction of papers in that journal. This is corrected with this commit.
2020-03-02	Described the first analysis phase with a demo subMakefile	Mohammad Akhlaghi	-11/+11
	Until now, there was no explanation on an actual analysis phase, therefore with this commit an example scenario with a readable Makefile is included. The Data lineage graph was also simplified to both be more readable, and also to correspond to this new explanation and subMakefile. Some random edits/typos were also corrected and some references added for discussion.
2020-02-29	IMPORTANT: re-preparation can only be done with --prepare-redo	Mohammad Akhlaghi	-13/+2
	Until now, the preparation phase was always executed before the final build phase when running `./project make'. But when it becomes necessary, project preparation can be slow and will un-necessarily slow down the project while the project is growing (focus is on the analysis that is done after preparation). With this commit, preparation will be done automatically the first time that the project is run (`.build/software/preparation-done.mk' doesn't exist). However, after preperation is complete once, future runs of `./project make' won't do preparation any more (by calling `top-prepare.mk'). They will directly call `top-make.mk' for the analysis. To manually invoke preparation after the first attempt, the `./project make' script should be run with the new `--prepare-redo' option. Also, since the preparation phase is now automatically done before the analysis phase, the long notice that describes running `./project make' at the end of the preparation phase has been removed in `top-prepare.mk'. It now just prints a short line, saying the preparation has been complete. Finally, when the project has not been run with the proper group configuration, it ends with an `exit 1' so the main `./project' script doesn't proceed any further.
2020-02-20	Preparation phase: prepare.tex not needed to finish preparation	Mohammad Akhlaghi	-1/+3
	Until now, the final preparation target of the preparation phase depended on all the `$(makesrc)' files. This caused a problem because we were telling it to also depend on `prepare.tex' (which is the same file that is being built). With this commit, we are applying the same solution we have already done in `paper.mk' (for `paper.tex'): we are removing `prepare' from the list of prerequisites. This bug was found by Zahra Sharbaf.
2020-02-16	Two values from the input dataset are now written into the paper	Mohammad Akhlaghi	-2/+15
	This was done just to get going with describing the analysis process.
2020-02-16	Menke+2020 data is now imported and ready for later steps in plain text	Mohammad Akhlaghi	-10/+76
	The main problems with this dataset was the names of the journals (which sometimes have single quotes or apostrophes in them that is really annoying for SED)! But ultimately, for the simple study we want to do here, the journal names are irrelevant, so in the end I just ignored the names. Later we can set an identifier for the journals if necessary. But now we have the basic information in a way that is usable in a plot to show in this paper.
2020-02-11	Using backup server when original download server fails	Mohammad Akhlaghi	-1/+21
	Until now, the main download script could only check one server for the given URL. However, ultimately the actual server that a file is downloaded from is irrelevant for this project: we actually check its checksum. Especially in the case of software (which are distributed over many servers), this can usually be very annoying: the servers may not properly communicate with the running system and even the 10 trials won't be enough. With this commit, the download script `reproduce/analysis/bash/download-multi-try' can take a new optional argument (a 5th argument). It assumes this argument is a space-separated list of server(s) to use as backup for the original URL. When downloading from the original URL fails, it will look into this list and try downloading the same file from each given server.
2020-02-01	IMPORTANT: reproduce/software/bash renamed to reproduce/software/shell	Mohammad Akhlaghi	-2/+2
	Until now the shell scripts in the software building phase were in the `reproduce/software/bash' directory. But given our recent change to a POSIX-only start, the `configure.sh' shell script (which is the main component of this directory) is no longer written with Bash. With this commit, to fix that problem, that directory's name has been changed to `reproduce/software/shell'.
2020-01-31	Configure step: compiler checks done before basic settings	Mohammad Akhlaghi	-0/+6
	Until now, the project would first ask for the basic directories, then it would start testing the compiler. But that was problematic because the build directory can come from a previous setting (with `./project configure -e'). Also, it could confuse users to first ask for details, then suddently tell them that you don't have a working C library! We also need to store the CPATH variable in the `LOCAL.conf' because in some cases, the compiler won't work without it. With this commit, the compiler checking has been moved at the start of the configure script. Instead of putting the test program in the build directory, we now make a temporary hidden directory in the source directory and delete that directory as soon as the tests are done. In the process, I also noticed that the copyright year of the two hidden files weren't updated and corrected them.
2020-01-20	IMPORTANT!!! Configuration Makefiles now have a .conf suffix	Mohammad Akhlaghi	-19/+20
	Until now, the configuration Makefiles (in `reproduce/software/config/installation' and `reproduce/analysis/config') had a `.mk' suffix, similar to the workhorse Makefiles. Although they are indeed Makefiles, but given their nature (to only keep configuration parameters), it is confusing (especially to early users) for them to also have a `.mk' (similar to the analysis or software building Makefiles). To address this issue, with this commit, all the configuration Makefiles (in those directories) are now given a `.conf' suffix. This is also assumed for all the files that are loaded. The configuration (software building) and running of the template have been checked with this change from scratch, but please report any error that may not have been noticed. THIS IS AN IMPORTANT CHANGE AND WILL CAUSE CRASHES OR UNEXPECTED BEHAVIORS FOR PROJECTS THAT HAVE BRANCHED FROM THIS TEMPLATE. PLEASE CORRECT THE SUFFIX OF ALL YOUR PROJECT'S CONFIGURATION MAKEFILES (IN THE DIRECTORIES ABOVE), OTHERWISE THEY AREN'T AUTOMATICALLY LOADED ANYMORE.
2020-01-18	Raw draft (until now as a separate repository) imported	Mohammad Akhlaghi	-3/+4
	Until now, I was writing the paper without the template. But we will soon be adding a tutorial to the template, and I thought it will be good to have an example demonstration here too. So I just brought the hole project into the template structure, allowing us to add the template analysis later when its ready, and also allowing us to easily reproduce this paper ofcourse (without having to worry about the host's TeXLive installation.
2020-01-18	First set of customizations done	Mohammad Akhlaghi	-135/+2
	The unnecessary parts were removed and the project now runs.
2020-01-18	README-hacking.md: edits and corrections for easier customization	Mohammad Akhlaghi	-2/+2
	The checklist descriptions were slightly edited to be more clear. Also, while following them, I noticed that while removing the "delete-me" parts on `verify.mk', would cause an error: the `if [ $$m == delete-me ];' statement we were saying to delete cause an error because `elif' was the first statement Bash would see. So with this commit, the `download' conditional (which isn't instructed to be deleted) was set to be the top (with an `if') and the `delete-me' conditional now has an `elif'.
2020-01-01	Verification function checks if file exists	Mohammad Akhlaghi	-3/+13
	Until now, if the file to be verified didn't exist, a different checksum would be generated, and it would stop, but it wasn't immediately clear if the differing checksum is because the file doesn't exist at all! With this commit, before calculating the checksum, we first make sure if the file exists. If it doesn't exist an explicit error is printed and thus will help the project editor to find the cause of the problem.
2020-01-01	Verification of output values and data added within template	Mohammad Akhlaghi	-22/+146
	Until now, the only verification that the template provided was the published PDF. Users had to manually compare the published and generated PDFs (numbers, plots, tables) and see if they obtained the same result. However, this type of manual verification is not good and is prone to frustration and missing important differences. With this commit, a new Makefile has been added in the analysis steps: `verify.mk'. It provides facilities to easily verify the results that go into the paper. For example tables that go into making the paper's plots, or the LaTeX macros that blend into the text. See the updated parts in `README-hacking.md` for a more complete explanation. This completes task #15497.
2020-01-01	Copyright statements updated to include 2020	Mohammad Akhlaghi	-10/+10
	Now that its 2020, its necessary to include this year in the copyright statements.
2019-11-29	Download links directly to actual file if it exists in INDIR	Mohammad Akhlaghi	-2/+8
	Until now, when an input dataset already exists in `INDIR', the template would just make a symbolic link to it in the build directory. However, in many cases, the files in INDIR will actually be links to other locations on the filesystem and some programs have problems following too many links. With this commit, the template is now using the `readlink' program (part of GNU Coreutils) to follow a possible link and point the link in the build directory directly to an actual non-link file.
2019-10-31	Minor corrections in distribution and autoconf prerequisite of automake	Mohammad Akhlaghi	-0/+1
	Some minor corrections were made in the template: - When making the distribution, `.swp' files (created by Vim) are also removed. - Autoconf is set as a prerequisite of Automake I was also trying to add the Apache log4cxx, but its default 0.10.0 tarball needs some patches, so I have just left it half done until someone actually needs it and we apply the patch.
2019-10-19	Minor improvments in packaging of project with make dist	Mohammad Akhlaghi	-12/+14
	The steps to package the project have been made slightly more clear and also the temporary directory that is created for packaging is deleted after the tarball is made.
2019-10-11	Properly working make clean when in group mode	Mohammad Akhlaghi	-3/+4
	Until now, when you ran `make clean', all the directories under `$(BDIR)/tex/' would be deleted except for `macros' and `build'. This was good for the single-user mode. But in group mode, this would delete the user-specific TeX build directory because its called `build-USER', not `build'. With this commit, to fix the problem, we define the new `texbtopdir' and based on the group condition, and use that to specify which directory to not delete.
2019-10-02	Possibile to use download-multi-try script without locks	Mohammad Akhlaghi	-4/+11
	Until now, this script would always only work with a file-lock. But in some scenarios, we might want to download in parallel. For example when the system has multiple ports to the internet. With this commit, we have added this feature: when the lockfile name is `nolock', it won't lock and will download in parallel.
2019-10-01	Minor corrections in configure and prepare phase	Mohammad Akhlaghi	-10/+10
	Since ImageMagick can take long to build, we are now building it in parallel. Also, the part where we replace an `_' with `\_' in the software version at the end of the configure script was removed. It is more clear/readable that the actual rule that includes such a name deals with the underline (as is the case for `sip_tpv' which already dealt with it). Finally, I noticed that the checks at the start of `top-prepare' were missing new-lines. I had forgot that the Make single-shell variable isn't activated in this stage yet.
2019-10-01	Infrastructure to keep preparation results	Mohammad Akhlaghi	-17/+50
	A special directory is now defined in `initialize.mk' that can be used in both the preparation and build phases. Also, the contents of prepared results can now be conditionally read during `./project make'.
2019-10-01	Preparation phase added before final building	Mohammad Akhlaghi	-0/+126
	In many real-world scenarios, `./project make' can really benefit from having some basic information about the data before being run. For example when quering a server. If we know how many datasets were downloaded and their general properties, it can greatly optmize the process when we are designing the solution to be run in `./project make'. Therefore with this commit, a new phase has been added to the template's design: `./project prepare'. In the raw template this is empty, because the simple analysis done in the template doesn't warrant it. But everything is ready for projects using the template to add preparation phases prior to the analysis.
2019-09-26	Working project when downloaded from arXiv	Mohammad Akhlaghi	-2/+4
	Until now, we were assuming that the users would just clone the project in Git. But after submitting arXiv:1909.11230, and trying to build directly from the arXiv source, I noticed several problems that wouldn't allow users to build it automatically. So I tried the build step by step and was able to find a fix for the several issues that came up. The scripting parts of the fix were primarily related to the fact that the unpacked arXiv tarball isn't under version control, so some checks had to be put there. Also, we wanted to make it easy to remove the extra files, so an extra `--clean-texdit' option was added to `./project'. Finally, some manual corrections were necessary (prior to running `./project', which are now described in `README.md'. Most of the later steps can be automated and we should do it later, I just don't have enough time now.
2019-09-25	Won't copy previous distribution builds in new distribution	Mohammad Akhlaghi	-1/+1
	Until now, the pipeline was instructed to only ignore the current temporary project distribution directory. So if there were directories from previous builds, they were wrongly included in the current tarball. With this commit, we don't just ignore the directory of the current distribution, but generally, all directories starting with `paper-v*'.
2019-09-16	Git checksum printed even when on a tag	Mohammad Akhlaghi	-2/+2
	Until now, when the commit was tagged, `git describe' would just print the tag and no longer the commit checksum. This is bad because the checksum is a much more robust way to confirm the point in history. With this commit the `--long' option has been added to `git describe' to fix this issue. From now on, when we are on a tag, it will print the tag followed by a `-0-' and the first characters of the checksum.
2019-09-16	Distribution tarball now builds in arXiv	Mohammad Akhlaghi	-8/+25
	`./project make dist' will package all the LaTeX-specific files (and analysis source files) into one `tar.gz' file that is ready to upload to servers like arXiv. However, it wasn't updated for some time, so running it would complain about not having a `configure' script in the top of the project. With this commit, it now works with the new file-structure of the project and also copies all the BibLaTeX source files and `paper.bbl' into the top tarball directory, which allows arXiv to build the paper as intended. The output of `./project make dist' has been uploaded and tested on arXiv and it is built by arXiv perfectly. Also, a short description of all the special `make' targets was added to the output of `./project --help'.
2019-08-22	OpenMPI environment variable used to disable need for OpenSSH	Mohammad Akhlaghi	-1/+5
	Until now, OpenMPI would complain about not having `ssh' or `rsh' as a remote shell feature. However, such features should not be necessary in a reproducible scenario and they also have major security issues. With this commit, we are now using OpenMPI's `OMPI_MCA_plm_rsh_agent' environment variable to disable any remote shell dependency for it (as suggested by Boud). Therefore, any dependency for OpenSSH has been removed. But I thought to keep the build instructions incase it may be useful under some un-foreseen scenario. However, to discourage people from building it, a notice was added ontop of the build instructions. This bug was found, tested and solved thanks to Roberto Baena Gallé and Boud Roukema. This fixes bug #56724.
2019-08-01	Git hooks removed after doing a distclean	Mohammad Akhlaghi	-0/+5
	Until now, when you needed to completely clean a project (with `./project make distclean') the Git hooks that are installed during configure time would cause problems when committing (the `pre-commit' hook in particular won't allow you to commit anything!). With this commit, before deleting the software, the template first removes these Git hooks.
2019-08-01	Bash startup script for every recipe	Mohammad Akhlaghi	-0/+4
	Until now the only way to define the environment of the Make recipes was through the exported Make variables (mostly in `initialize.mk' for the analysis steps for example). However, there is only so much you can do with environment variables! In some situations you want slightly more complicated environment control, like setting an alias or running of scripts (things that are commonly done in the `~/.bashrc' file of users to configure their interactive, non-login shells). With this commit, a `reproduce/software/bash/bashrc.sh' has been defined for this job (which is currently empty!). Every major Make step of the project adds this file as the `BASH_ENV' environment variable, so the shell that is created to execute a recipe first executes this file, then the recipe. Each top-level Makefile also defines a `PROJECT_STATUS' environment variable that enables users to limit their envirnoment setup based on the condition it is being setup (in particular in the early phase of `basic.mk', where the user can't make any assumption about the programs and has to write a portable shell script).
2019-07-28	Corrected typo in environment before running make	Mohammad Akhlaghi	-3/+3
	We recently moved the system's `rm' program absolute address to a shell variable that is found during the `./project' script. But I had forgot to account for the difference between the Make and Bash variable naming differences. I had also forgot to add a value to the HOME variable. With this commit both are corrected: the system's `rm' path is now called `sys_rm' and the HOME variable is set.
2019-07-28	Single wrapper instead of old ./configure, Makefile and ./for-group	Mohammad Akhlaghi	-18/+20
	Until now, to work on a project, it was necessary to `./configure' it and build the software. Then we had to run `.local/bin/make' to run the project and do the analysis every time. If the project was a shared project between many users on a large server, it was necessary to call the `./for-group' script. This way of managing the project had a major problem: since the user directly called the lower-level `./configure' or `.local/bin/make' it was not possible to provide high-level control (for example limiting the environment variables). This was especially noticed recently with a bug that was related to environment variables (bug #56682). With this commit, this problem is solved using a single script called `project' in the top directory. To configure and build the project, users can now run these commands: $ ./project configure $ ./project make To work on the project with other users in a group these commands can be used: $ ./project configure --group=GROUPNAME $ ./project make --group=GROUPNAME The old options to both configure and make the project are still valid. Run `./project --help' to see a list. For example: $ ./project configure -e --host-cc $ ./project make -j8 The old `configure' script has been moved to `reproduce/software/bash/configure.sh' and is called by the new `./project' script. The `./project' script now just manages the options, then passes control to the `configure.sh' script. For the "make" step, it also reads the options, then calls Make. So in the lower-level nothing has changed. Only the `./project' script is now the single/direct user interface of the project. On a parallel note: as part of bug #56682, we also found out that on some macOS systems, the `DYLD_LIBRARY_PATH' environment variable has to be set to blank. This is no problem because RPATH is automatically set in macOS and the executables and libraries contain the absolute address of the libraries they should link with. But having `DYLD_LIBRARY_PATH' can conflict with some low-level system libraries and cause very hard to debug linking errors (like that reported in the bug report). This fixes bug #56682.
2019-07-27	DYLD_LIBRARY_PATH also fixed for macOS systems	Mohammad Akhlaghi	-7/+8
	Until now we were only setting the `LD_LIBRARY_PATH' environment variable for GNU/Linux systems. But macOS systems use the `DYLD_LIBRARY_PATH'. With this commit, for better control over the environment, we are also fixing `DYLD_LIBRARY_PATH' in all the places that we are setting the general environment variables.
2019-06-29	Added citation for TIDES, sorted progs alphabetically	Mohammad Akhlaghi	-6/+10
	While reviewing Prasenjit's commits, I noticed that we had forgot to add the citation for TIDES, also to make things clear, the program/library build rules are now sorted alphabetically. Finally, I noticed that after building the TiKZ PDF figures, it is crashing (like on Prasenjit's computer). After looking around, I noticed its because we were setting the of the `TEXINPUTS' environment variable to be the installed TeX Live directory (which was ultimately redundant because by default TeX will look into where it was installed). The important thing is just that we remove any possible value the host system has, not to set new directories.
2019-06-28	tides library added	Prasenjit Saha	-1/+1
	TIDES is an ODE integrator with multiple-precision arithmetic.
2019-06-28	Corrections to basic build	Prasenjit Saha	-1/+1
	Several corrections were necessary in the basic build: 1) the version of GCC on some systems includes an `_' which would cause a crash when building the PDF. 2) libcharset had to be manually added to the Git build.
2019-06-13	Permission flags of top.mk set to 644 like others, not 755	Mohammad Akhlaghi	-0/+0
	Until now, for some reason, the permission flags of `top.mk' were 755 (good for an executable), not 644 (which is what they should be for a plain text file that is run by another program). This is corrected with this commit.
2019-05-21	Imported Matplotlib installation, no conflicts	Mohammad Akhlaghi	-0/+0
	There weren't any conflicts in this merge.
2019-05-21	ImageMagick is now included into the project	Raul Infante-Sainz	-0/+0
	With this commit, ImageMagick software has been added into the project. This software is useful to deal and treat images from the command line. Since it is widely used and a lot of other programs rely on it, it is worth to have it into the project.
2019-05-21	Source directory links to build directory all managed in configure	Mohammad Akhlaghi	-5/+2
	Until now, the `tex/build' symbolic link was put in the clone/source tree when the build-directory's `tex' directory was being built. Thanks to Roberto Baena, we just found a bug because of this behavior: when a second group member is trying to build the pipeline, since the build directory's `tex' directory already exists, no `tex/build' will be put in their clone/source directory. As a result, the PDF building will crash. To fix this (and keep things organized), the two `tex/build' and `tex/tikz' links (to the build directory) are now built in the configure step while it is building all the top-level directories. They are no longer built within the Makefiles. Also, a comment was added on top of every directory built during the configuration phase to be clear. This fixes bug #56362.
2019-05-09	download-multi-try now starts with a /bin/bash shebang	Mohammad Akhlaghi	-0/+2
	Until now, the `download-multi-try' script assumed GNU Bash features (when comparing the number of attempts at downloading), but it didn't explicitly ask the operating system to be run with Bash. As a result, when weaker shells were used (like the default Debian minimalist `dash' shell), the `>' ("larger than" operator in a math context) is interpreted a redirection and two extra files are created: `1' and `maxcount'! With this commit, we now start this script with `/bin/bash'. Ofcourse, this will assume that the host has GNU Bash installed, but we are also making this assumption in the configure script. So atleast for now, Bash (any version) is a critical dependency of this template anyway.
2019-04-30	End-of-line Backslashs no longer right under each other	Mohammad Akhlaghi	-9/+9
	When we need to quote the new-line character we end the line with a backslash (`\'). Until now, our convention has been to put all such backslashes under each other to help in visual inspection. But this causes a lot of confusion in version control: if only one line's length is larger, the whole block will be marked as changed and thus makes it hard to visually see the actual change. It also makes debuging the code (adding some temporary lines) hard. With this commit, I went through all the files and tried to fix all such cases so only a single white space character is between the last command character and the backslash. Where there was an empty line (ending with a backslash, to help in visually separating the code into blocks), I put the backslash right under the previous line's. This completes task #15259.
2019-04-29	Fixed a few architecture remnants in initialize.mk	Mohammad Akhlaghi	-33/+3
	In a few cases, `reproduce/analysis/make/initialize.mk' still assumed the old architecture. With this commit, they have been corrected.
2019-04-17	Corrected bibtex entry for Astrometry-net and Swarp	Raul Infante-Sainz	-1/+1
	Until now, there were erros in the citation of Astrometry-net and Scamp papers. With this commit, we fix these problems. The Swarp bibtex has also been modify to follow the stetic of the citation style we have right now in the project. We also added the `dependency-bib.tex' as a prerequisite of `paper.bbl'.
2019-04-15	New architecture to separate software-building and analysis steps	Mohammad Akhlaghi	-0/+981
	Until now, the software building and analysis steps of the pipeline were intertwined. However, these steps (of how to build a software, and how to use it) are logically completely independent. Therefore with this commit, the pipeline now has a new architecture (particularly in the `reproduce' directory) to emphasize this distinction: The `reproduce' directory now has the two `software' and `analysis' subdirectories and the respective parts of the previous architecture have been broken up between these two based on their function. There is also no more `src' directory. The `config' directory for software and analysis is now mixed with the language-specific directories. Also, some of the software versions were also updated after some checks with their webpages. This new architecture will allow much more focused work on each part of the pipeline (to install the software and to run them for an analysis).