paper-concept.git - Paper (Towards Long-term and Archivable Reproducibility)

Age	Commit message (Collapse)	Author	Lines
2020-06-17	Better color for project branch in branching figure	Mohammad Akhlaghi	-1/+1
	Until now the color for the branching figure'e "project" branch was too close to the Derved project branch. With this commit, I am using a slightly darker shade of brown that is sufficiently differnet from the core Maneage branch and the derived project branch.
2020-06-17	Text surrounding software acknowledgements as a configuration file	Boud Roukema	-16/+130
	Until now, the English texts that embeds the list of software to acknowledge in the paper was hard-wired into the low-level coding ('reproduce/software/shell/configure.sh' to be more specific). But this file is very low-level, thus discouraging users to modify this surrounding text. While the list of software packages can be considered to be 'data' and is fixed, the surrounding text to describe the lists is something the authors should decide on. Authors of a scientific research paper take responsibility for the full paper, including for the style of the acknowledgments, even if these may well evolve into some standard text. With this commit, authors who do not modify 'reproduce/software/config/acknowledge_software.sh' will have a default text, with only a minor English correction from earlier versions of Maneage. However, Authors choosing to use their own wording should be able to modify the text parameters in `reproduce/software/config/acknowledge_software.sh` in the obvious way. This is much more modular than asking project authors to go looking into the long and technical 'configure.sh' script. Systematic issues: the file `reproduce/software/config/acknowledge_software.sh` is an executable shell script, because it has to be called by `reproduce/software/shell/configure.sh`, which, in principle, does not yet have access to `GNU make` (if I understand the bootstrap sequence correctly). It is placed in `config/` rather than `shell/`, because the user will expect to find configuration files in `config/`, not in `shell/`. A possible alternative to avoid having a shell script as a configure file would be to let `reproduce/software/config/acknowledge_software.sh` appear to be a `make` file, but analyse it in `configure.sh` using `sed` to remove whitespace around `=`, and adding other hacks to switch from `make` syntax to `shell` syntax. However, this risks misleading the user, who will not know whether s/he should follow `make` conventions or `shell` conventions.
2020-06-17	Security risk of LaTeX's -shell-escape option explained in comment	Boud Roukema	-0/+9
	The 'pdflatex' program is used to build the default Maneage-branch paper. But since the default paper uses PGFPlots to build the figures within LaTeX as an external PDF, PGFPlots requires 'pdflatex' to be called with the '-shell-escape' option. Generally, this option can be considered as a security risk (in particular when 'pdflatex' is being run by an external LaTeX file: a malicious LaTeX writer may embed commands in the LaTeX source that will be executed on the host if this option is present). This is not too serious of an issue in Maneage, because when someone runs Maneage, they intentionally let it run many on their system. Hence if someone wants to exploit a host system, they can add the necessary commands long before 'pdflatex' is run. After all, all commands in Maneage are run with the calling user's permissions, hence they have access to many parts of the user's accounts. If someone is worried about security on a non-trusted Maneage project they should act the same as they do with any software: define a new user for it, and call it with that user (as a weak-level security), or run it in a virtual machine or container. However, since this option has been explicity mentioned as a security risk before, it helps if we have a comment explaining its usage in 'paper.mk'. With this commit, the concerned user will read a brief explanation and can read the brief discussion at [1] and possibly re-open the discussion or propose ways of mitigating the security risk(s). [1] https://savannah.nongnu.org/task/?15694
2020-06-17	Software tarballs are downloaded even if not built	Mohammad Akhlaghi	-56/+36
	Some low-level software aren't necessary on some operating systems, for example GCC can't be built on macOS, hence we don't build it and the GCC-only dependencies. Also, on GNU/Linux systems users could configure with '--host-cc' to avoid all the time it takes to build GCC when doing a fast test. Until now, in such cases not only was the software not installed, but the tarballs of the software were also not downloaded. Hence making the output of '--dist-software' incomplete (as in bug #58561). With this commit, we now import all the necessary tarballs, when the software isn't necessary for the particular system, it won't be built or cited, but its tarball will be present anyway, thus allowing the output of '--dist-software' to be complete.
2020-06-17	New target --dist-software to package all necessary software tarballs	Mohammad Akhlaghi	-6/+5
	When publishing a project, it is necessary to also publish the source code of all necessary software of the project. We had recently added a new './project make' target called 'dist-software' for this job, but had forgotten to add it in the output of './project --help'! There was also a small bug inside of it that didn't allow the successful copying of the created tarball to the top project directory. With this commit, an explanation for this target has been added in the output of './project --help' and that bug has been fixed.
2020-06-17	Corrected symbolic link to Gnuastro's configuration files	Mohammad Akhlaghi	-1/+1
	Until now, when making the link to Gnuastro's configuration files, the 'configure.sh' script would incorrectly link to the old configuration directory under the 'reproduce/software' directory. With this commit, it is moved to the proper directory under 'reproduce/analysis'.
2020-06-16	Imported recent improvements in Maneage, minor conflicts fixed	Mohammad Akhlaghi	-30/+56
	Some minor conflicts came up in 'download.mk' and 'verify.mk' (as expected in the "IMPORTANT" commits) and were easily fixed by choosing this branch's values.
2020-06-16	Using type to see if pdftotext exists or not	Mohammad Akhlaghi	-1/+1
	Until now we were using 'which' for this job, but throughout Maneage, we have used 'type', so to help in consistancy, we also use 'type' for this final command for this project also.
2020-06-16	XLSX I/O properly accounts for local build	Mohammad Akhlaghi	-2/+2
	Until now, when adding the necessary library flags to the build of XLSX I/O, we were effectively over-writing the 'LDFLAGS' variables. So the compiler was effectively not being told where to look for the necessary libraries. With this commit, to fix the problem, we now append the new linking flags to LDFLAGS in XLSX I/O's build, not over-write it.
2020-06-16	Acknowledged contributions of Marios Karouzos	Mohammad Akhlaghi	-31/+36
	Marios had read the first draft of the paper (Commit f990bba) and provided valuable feedback (shown below) that ultimately helped in the current version. But because of all the work that was necessary in those days, I forgot to actually thank him in the acknowledgment, while I had implemented most of his thoughts. Following Marios' thoughts on the Git branching figure, with this commit, I am also adding a few sentences at the end of the caption with a very rough summary of Git. I also changed the branch commit-colors to shades of brown (incrementally becoming lighter as higher-level branches are shown) to avoid the confusion with the blue and green signs within the schematic papers shown in the figure. Marios' comments (April 28th, 2020, on Commit f990bba) ------------------------------------------------------ I think the structure of the paper is more or less fine. There are two places that I thought could be improved: 1) Section 3 (Principles) was somewhat confusing to me in the way that it was structured. I think the main source of confusion is the mixing of what Maeage is about and what other programs have done. I would suggest to separate the two. I would have short intro for the section, similar to what you have now. However, I would suggest to highlight the underlying goals motivating the principles that follow: reproducibility, open science, something else? Then I would go into the details of the seven principles. Some of the principles are less clear to me than others. For example, why is simplicity a guiding principle? Then some other principles appear to be related, for example modularity, minimal complexity and scalability to my eyes are not necessarily separate. Finally, I would separate the comparison with other software and either dedicate a section to that somewhere toward the end of the paper (perhaps a subsection for section 5) or at least condense it and put it as a closing paragraph for Section 3. As it is now I think it draws focus from Maneage and also includes some repetitions. 2) Section 4 (Maneage) was at times confusing because it is written, I think in part as a demonstration of Maneage (i.e., including examples that showed how Maneage was used to write this or other papers) and a manual/description of the software. I wonder whether these two aspects can be more cleanly separated. Perhaps it would be possible to first have a section 4 where each of the modules/units of Maneage are listed and explained and then have the following section discuss a working example of Maneage using this or another paper. 3) I found Figure 7 [the git branching figure] and its explanation not very intuitive. This probably has to do with my zero knowledge of github and how versioning there works, but perhaps the description can be a bit more "user friendly" even for those who are not familiar with the tool. 4) I find Section 6 to be rather inconsequential. It does not add anything and it more or less is just a summary of what was discussed. I would personally remove it and include a very short summary of the ideals/principles/goals of Maneage at the beginning of Section 5, before the discussion.
2020-06-15	OpenSSL now built after Perl	Mohammad Akhlaghi	-1/+1
	After trying a clean build of Maneage in a Docker image (with a minimal debian:stable-20200607-slim OS), I noticed that the building of OpenSSL is failing because it doesn't find the proper Perl functionality. To fix it, with this commit, Perl is set as a prerequisite of OpenSSL and this fixed the problem.
2020-06-15	Configure script now accounts for non-interactive shells	Mohammad Akhlaghi	-6/+33
	The project configuration requires a build-directory at configuration time, two other directories can optionally be given to avoid downloading the project's necessary data and software. It is possible to give these three directories as command-line options, or by interactively giving them after running the configure script. Until now, when these directories weren't given as command-line options, and the running shell was non-interactive, the configure script would crash on the line trying to interactively read the user's given directories (the 'read' command). With this commit, all the 'read' commands for these three directories are now put within an 'if' statement. Therefore, when 'read' fails (the shell is non-interactive), instead of a quiet crash, a descriptive message is printed, telling the user that cause of the problem, and suggesting a fix. This bug was found by Michael R. Crusoe.
2020-06-14	Better description for input data directory, pointing to INPUTS.conf	Mohammad Akhlaghi	-19/+13
	Until now, the description of the input-data directory at configure time included a description of the input data (created by reading the values of 'INPUTS.conf'). Maintaining this is easy for a single dataset, but it becomes hard for a general project which may need many input datasets. To avoid extra complexity (for maintaining this list), the description now points a user of the project to the 'INPUTS.conf' file and asks them to look inside of it for seeing the necessary data. This infact helps with the users becoming familiar with the internal structure of Maneage and will allow the authors to focus on not having to worry about updating the low-level 'configure.sh' script.
2020-06-14	Better explanation in the start of project configuration	Mohammad Akhlaghi	-3/+7
	When './project configure' is run, after the basic checks of the compiler, a small statement is printed telling the user that some configuration questions will now be asked to start building Maneage on the system. Until now this description was confusing: it lead the reader to think that the local configuration (which was recommended to read before continuing) is in another file. With this commit, the text has been edited to explictly mention that the description of the steps following this notice should be read carefully. Thus avoiding that confusion. This issue was mentioned by Michael R. Crusoe.
2020-06-14	Better comments for the top macros of paper.tex	Mohammad Akhlaghi	-46/+50
	The default 'paper.tex' starts by defining some macros and comments describing them. Until now, the text was not too clear and could be confusing for someone that is not at all familiar with Maneage. With this commit, the comments have been edited to be more clear for a first-time reader. For example they all start with FULL CAPS summaries. Two other small things were corrected in 'tex/src/preamble-necessary.tex': - Until now 'project.tex' was included in this preamble. However, because of its importance in Maneage, and prominent place in the demonstration plot of the paper introducing Maneage, it is now included directly in 'paper.tex'. This also allows users to safely ignore/delete this preamble file if their LaTeX style is different. - I noticed that some macros for some astronomical software names from the very first commits in Maneage were still present here! They are no longer used, so they have been removed.
2020-06-14	Corrected the relation of POSIX and IEEE	Mohammad Akhlaghi	-2/+3
	Until now, we were saying "POSIX is defined by the IEEE", but in issue #12, Michael Crusoe pointed out that this is not accurate. It is actually jointly developed and operated by the IEEE, The Open Group and ISO/IEC JTC 1/SC 22, which together form the Austin Group. So the sentence was modified to say tha the IEEE (potential publisher of this paper) is part of the Austin Group that develops the POSIX standard. Thanks a lot for bringing this up Michael.
2020-06-13	Custom-built EPS icons in branching figure	Marjan Akbari	-433/+2271
	Until now, we were using three EPS (created from SVG) that were downloaded from https://www.flaticon.com. Therefore it was necessary to acknowledge the creators and put a link to the webpage. This consumed space in the caption and decreased the originality of the plot. Another problem was that the "collaboration" icon (with three people in it) had arrows, and some of those arrows pointed downwards, make ambiguity in relation to the top-ward arrows under the commits. With this commit, three alternative icons are added that I made from scratch, using Inkscape. The collaboration icon now is two figures and two speech-bubbles, without any arrows.
2020-06-13	Two small edits in demo listing and paragraph after it	Mohammad Akhlaghi	-3/+3
	Recently, by default, Maneage will not take the title directly in the PDF, the title should be given in the 'metdata.conf' file and it is passed onto LaTeX as a variable. So the comment to "add project title" in the listing could be confusing. To avoid confusing, I edited it to "Set your name as author". The comments above the '\title' part is very complete and users will clearly be able to modify the title if they want. Also, we had an extra ')' in the line just under it which is now corrected.
2020-06-10	Corrected bug in using local copy of input dataset	Mohammad Akhlaghi	-13/+47
	As described in Maneage's commit 2bd2e2f18 (which I found while testing this project), the existing download recipe had problems when using a local copy of the input dataset. It was first fixed here, then implemented there. Also, to clarify things for a new user, some long comments were added at the top of 'INPUTS.conf' to describe each of the variables, that comment has also been put here (and is also in commit 2bd2e2f18 of Maneage).
2020-06-10	Updated text of default paper.tex, putting more recent examples	Mohammad Akhlaghi	-100/+165
	The text of the default paper hadn't been changed for a very long time! In this time, three papers using Maneage have been published (which can be very good as an example), Maneage also now has a webpage! With these commit these examples and the webpage have been added and generally it was also polished a little to hopefully be more useful.
2020-06-10	IMPORTANT: bug fix in default data download script of download.mk	Mohammad Akhlaghi	-14/+54
	Summary of possible semantic conflicts 1. The recipe to download input datasets has been modified. You have to re-set the old 'origname' variable to 'localname' (to avoid confusion) and the default dataset URL should now be complete (including the actual filename). See the newly added descriptions in 'INPUTS.conf' for more on this. Until now, when the dataset was already present on the host system, a link couldn't be made to it, causing the project to crash in the checksum phase. This has been fixed with properly naming the main variable as 'localname' to avoid the confusion that caused it. Some other problems have been fixed in this recipe in the meantime: - When the checksum is different, the expected and calculated checksums are printed. - In the default paper, we now print the full URL of the dataset, not just the server, so the checksum of the 'download.tex' step has been updated.
2020-06-09	Two minor typos corrected	Mohammad Akhlaghi	-2/+2
	Two words were corrected in the text that made the sentences grammatically wrong (they were actually typos! historically they were correct, but we later changed the later part of the sentence without fixing the first part).
2020-06-09	Minor edit printing arXiv URL in plain text metadata	Mohammad Akhlaghi	-1/+1
	Until now, in the 'print-copyright' function of 'initialize.mk' (that prints a fixed set of common meta necessary in plain-text files), we were simply printing this line: # Pre-print server: arXiv:1234.56789 But given that all the other elements are click-able URLs, it now prints: # Pre-print server: https://arxiv.org/abs/1234.56789
2020-06-09	Two minor corrections to avoid warnings in make and make clean	Mohammad Akhlaghi	-15/+12
	There were two small warnings that are removed with this commit: - In the end, when we print the number of words in the PDF, we hadn't accounted for the fact that 'paper.pdf' doesn't always exist (for example when './project make clean' is run). So a check was added to only print the number of words when a PDF exists. - I noticed that the '$(texdir)/to-publish' directory was being built both in 'initialize.mk' and in 'demo-plot.mk'. So the one in 'demo-plot.mk' has been removed.
2020-06-09	Imported Maneage, minor conflicts fixed, a bug found and fixed	Mohammad Akhlaghi	-78/+507
	Some minor conflicts came up in 'initialize.mk' and 'verify.mk'. For the former, I chose the version on Maneage, for the latter, I kept the 'master' version on the checksums of this project, but kept the Maneage version for the rest of the improvements there (like printing the verified files as LaTeX comments in 'verify.tex'. While testing the conflicts, I noticed a bug (in the LaTeX macro for the number of years in the Menke+20 paper) in the previous build, thanks to the verification step :-)! Fortunately it wasn't actually printed in the PDF, so a normal reader won't recognize. The bug was caused by the recently added meta-data/commented lines in the 'tools-per-year.txt' file: when calculating the number of years studied in that paper, we were simply counting all the lines and we had forgot to correct this after adding comments. As a result, the un-used LaTeX macro file was saying that they have studied 47 years instead of the real 31 years! This element was actually used in the very first (+40 page!) draft of the paper that was summarized to fit into the journal limits.
2020-06-07	Added SoftwareHeritage link, minor typo corrections and clarifications	Mohammad Akhlaghi	-24/+27
	The git history of the project is now archived on SoftwareHeritage and a link to it as was added in the "Reproducible supplement" tag just under the abstract. Also, some corrections were also made in the text. In particular, the part explaining the separation of software and data reproducibility was slightly clarified to be more clear
2020-06-06	IMPORTANT: Added publication checklist, improved relevant infrastructure	Mohammad Akhlaghi	-172/+727
	Possible semantic conflicts (that may not show up as Git conflicts but may cause a crash in your project after the merge): 1) The project title (and other basic metadata) should be set in 'reproduce/analysis/conf/metadata.conf'. Please include this file in your merge (if it is ignored because of '.gitattributes'!). 2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if you have added all analysis Makefiles to the '.gitattributes' file (thus not merging any change in them with your branch). For example with this command: git diff master...maneage -- reproduce/analysis/make/initialize.mk 3) The old 'verify-txt-no-comments-leading-space' function has been replaced by 'verify-txt-no-comments-no-space'. The new function will also remove all white-space characters between the columns (not just white space characters at the start of the line). Thus the resulting check won't involve spacing between columns. A common set of steps are always necessary to prepare a project for publication. Until now, we would simply look at previous submissions and try to follow them, but that was prone to errors and could cause confusion. The internal infrastructure also didn't have some useful features to make good publication possible. Now that the submission of a paper fully devoted to the founding criteria of Maneage is complete (arXiv:2006.03018), it was time to formalize the necessary steps for easier submission of a project using Maneage and implement some low-level features that can make things easier. With this commit a first draft of the publication checklist has been added to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018 and zenodo.3872248. To help guide users on implementing the good practices for output datasets, the outputs of the default project shown in the paper now use the new features). After reading the checklist, please inspect these. Some other relevant changes in this commit: - The publication involves a copy of the necessary software tarballs. Hence a new target ('dist-software') was also added to package all the project's software tarballs in one tarball for easy distribution. - A new 'dist-lzip' target has been defined for those who want to distribute an Lzip-compressed tarball. - The '\includetikz' LaTeX macro now has a second argument to allow configuring the '\includegraphics' call when the plot should not be built, but just imported.
2020-06-06	Summarized abstract to be less than 150 words	Mohammad Akhlaghi	-16/+15
	Upon submission to CiSE we were informed that the abstract has to be less than 150 words to be processed. So with this commit, I am shrinking the abstract slightly, trying to remove some points that are less important and trying to shrink some of the sentences. Also, to avoid confusion and be more clear, the term "temporal provenance" has been replaced by "Recorded history".
2020-06-04	Scale element in includegraphics for roughly similar-sized figures	Mohammad Akhlaghi	-9/+11
	Until now, when the figures were built directly from EPS ('\newcommand{\makepdf}{}' was commented), they would take the full line-width becoming a little too large! I noticed this after letting arXiv build the PDF. With this commit, the 'includetikz' tool takes a second argument to be a parameter given to 'includegraphics' (which is scale in this case).
2020-06-04	Final full reading, and minor edits to submit to Zenodo and arXiv	Mohammad Akhlaghi	-58/+57
	Everything else regarding the submission to arXiv and Zenodo has been complete, so I done a final read, making some minor edits to hopefully make the text easier to read.
2020-06-04	README.md, separated scenarios of building from tarball	Mohammad Akhlaghi	-23/+60
	The previous explanation was not too clear and simply following it was confusing. The issue was that with the tarball you have three scenarios: 1) only build the PDF using existing figures. 2) only build the PDF, but build the figures yourself, 3) build the full Maneaged project. Hopefully this distinction is now more clear from the README.md file.
2020-06-04	README.md: improved points on building from tarball	Mohammad Akhlaghi	-14/+11
	Some extra explanation can help the user understand the difference between a Git-based project and a distributed tarball.
2020-06-04	tex/build and tex/tikz treated properly in tarball	Mohammad Akhlaghi	-1/+14
	When the project is being re-built from the tarball (not the Git repository), the 'tex/build' and 'tex/tikz' addresses are actual directories, not symbolic links. In this case, when someone runs './project configure', it will complain about not being able to delete them (it assumes they are symbolic links!). So with this commit, we first check if they are deletable without '-r'. If so, then they are full directories and we rename them to a backup directory to allow the rest of the project to continue building a link there.
2020-06-04	Minor improvements in the make dist command for this paper	Mohammad Akhlaghi	-7/+12
	This paper doesn't use pdflatex or biblatex, so it was necessary to make some small corrections in the make-dist rule of initialize.mk. Also, while testing the upload on arXiv, I noticed that it complains about an empty 'verify.tex' file, so that is also corrected.
2020-06-04	Verification activated, README added, Proper metadata in plot data	Mohammad Akhlaghi	-44/+117
	All the steps following the to-be-added (in 'README-hacking.md') publication checklist prior to the final check from new clone have been added: - 'README.md' file has been set. - "Reproducible supplement" was added just above the keywords, pointing to Zenodo. - A link to the to-be-uploaded data underlying the plot was added in the caption of the tools-per-year plot. - A new meta-data configuration file was added to store basic project metadata to be used throughout the project. This will later be taken into Maneage. For examle the project title is now stored here and written into the paper's LaTeX source and output datasets automatically. - Verification was activated and plot's data and LaTeX macro files are now automatically verified. - A complete metadata was added for the data underlying the plot. - A generic function was added in 'initialize.mk' that will automatically write project info and copyright in all plain-text outputs.
2020-06-04	README-hacking.md: minor edits in description of merging with Maneage	Mohammad Akhlaghi	-7/+15
	The recently added description for this step in the last commit needed some edits to be more clear and encourage re-building the project from scratch anytime authors merge with Maneage.
2020-06-03	Imported recent updated in Maneage, minor conflict fixed	Mohammad Akhlaghi	-915/+1386
	The minor conflict was with 'reproduce/software/make/high-level.mk', and in particular because we implemented the fix to Maneage's Task #15664 in this project first. After it was moved to the main Maneage branch some minor stylistic corrections were done to it, thus causing the conflict. To resolve the conflict, I simply imported the full Maneage version of the file with this command: git checkout maneage -- reproduce/software/make/high-level.mk The other conflicts were due to the deleted files (that were resolved as described in 'README-hacking.md') and the LaTeX files that I had told '.gitattributes' to ignore from the Maneage branch.
2020-06-03	README-hacking.md: Improved section on ignoring some files in Maneage	Mohammad Akhlaghi	-24/+55
	When some files should not be merged, until now we were suggesting to also add deleted files to the '.gitattributes' file. However, this feature of Git doesn't work for deleted files and they would still show up in the 'master' branch after a merge. So with this commit, we have added a simple AWK command to run after a merge that will automatically detect and delete such files (using the output of 'git status --porcelain'). Also, two minor typos were corrected in the newly added 'servers-backup.conf' file: the copyright year was wrong and there was no new-line at the end of the file (a good convention!).
2020-06-03	Updated .gitattributes to include all files to not merge	Mohammad Akhlaghi	-4/+3
	Following a test merge, I noticed that the '.gitattributes' file is not doing anything about the deleted files and also that all the files in 'tex/src/*.txt' should be added (they are too project-specific). So now it only includes the files that aren't deleted. For the files that are deleted, in the Maneage 'README-hacking.md' file, I added an AWK command to easily remove them.
2020-06-03	Adding point on small-ness of final product, some summarization	Mohammad Akhlaghi	-83/+65
	I noticed that we hadn't include the publication of the workflow and the advantage that Maneage provides in this regard. So it was added at the end of the proof-of-concept section. However, it was necessary to summarize some other parts to not increase the wordcount.
2020-06-02	Core software build before using Make to build other software	Mohammad Akhlaghi	-364/+635
	Until now, Maneage would only build Flock before building everything else using Make (calling 'basic.mk') in parallel. Flock was necessary to avoid parallel downloads during the building of software (which could cause network problems). But after recently trying Maneage on FreeBSD (which is not yet complete, see bug #58465), we noticed that the BSD implemenation of Make couldn't parse 'basic.mk' (in particular, complaining with the 'ifeq' parts) and its shell also had some peculiarities. It was thus decided to also install our own minimalist shell, Make and compressor program before calling 'basic.mk'. In this way, 'basic.mk' can now assume the same GNU Make features that high-level.mk and python.mk assume. The pre-make building of software is now organized in 'reproduce/software/shell/pre-make-build.sh'. Another nice feature of this commit is for macOS users: until now the default macOS Make had problems for parallel building of software, so 'basic.mk' was built in one thread. But now that we can build the core tools with GNU Make on macOS too, it uses all threads. Furthermore, since we now run 'basic.mk' with GNU Make, we can use '.ONESHELL' and don't have to finish every line of a long rule with a backslash to keep variables and such. Generally, the pre-make software are now organized like this: first we build Lzip before anything else: it is downloaded as a simple '.tar' file that is not compressed (only ~400kb). Once Lzip is built, the pre-make phase continues with building GNU Make, Dash (a minimalist shell) and Flock. All of their tarballs are in '.tar.lz'. Maneage then enters 'basic.mk' and the first program it builds is GNU Gzip (itself packaged as '.tar.lz'). Once Gzip is built, we build all the other compression software (all downloaded as '.tar.gz'). Afterwards, any compression standard for other software is fine because we have it. In the process, a bug related to using backup servers was found in 'reproduce/analysis/bash/download-multi-try' for calling outside of 'basic.mk' and removed Bash-specific features. As a result of that bug-fix, because we now have multiple servers for software tarballs, the backup servers now have their own configuration file in 'reproduce/software/config/servers-backup.conf'. This makes it much easier to maintain the backup server list across the multiple places that we need it. Some other minor fixes: - In building Bzip2, we need to specify 'CC' so it doesn't use 'gcc'. - In building Zip, the 'generic_gcc' Make option caused a crash on FreeBSD (which doesn't have GCC). - We are now using 'uname -s' to specify if we are on a Linux kernel or not, if not, we are still using the old 'on_mac_os' variable. - While I was trying to build on FreeBSD, I noticed some further corrections that could help. For example the 'makelink' Make-function now takes a third argument which can be a different name compared to the actual program (used for examle to make a link to '/usr/bin/cc' from 'gcc'. - Until now we didn't know if the host's Make implementation supports placing a '@' at the start of the recipe (to avoid printing the actual commands to standard output). Especially in the tarball download phase, there are many lines that are printed for each download which was really annoying. We already used '@' in 'high-level.mk' and 'python.mk' before, but now that we also know that 'basic.mk' is called with our custom GNU Make, we can use it at the start for a cleaner stdout. - Until now, WCSLIB assumed a Fortran compiler, but when the user is on a system where we can't install GCC (or has activated the '--host-cc' option), it may not be present and the project shouldn't break because of this. So with this commit, when a Fortran compiler isn't present, WCSLIB will be built with the '--disable-fortran' configuration option. This commit (task #15667) was completed with help/checks by Raul Infante-Sainz and Boud Roukema.
2020-06-01	Edits by David	David Valls-Gabaud	-100/+110
	These are some corrections that David sent to me by email and I am committing here.
2020-06-01	Implemented Antonio's suggestion and thanked him	Mohammad Akhlaghi	-1/+2
	Antonio Diaz Diaz (author of the Lzip program/library), has had a very supportive role in what became Maneage in the last 4 years. For example I really started to appreciate the value of simplicity and archivability while reading Lzip's documentation. Fortunately he also read a recent version of the paper that was again very supportive. Some of the minor points he raised had already been fixed, but using 'supplier' instead of 'server' (in the Free Software) criterion was new so I implemented it here with this commit. With this, I am also thanking him for all his wonderful support and encouragement in the last 4 years.
2020-06-01	Minor edits to clarify some of the previous corrections	Mohammad Akhlaghi	-3/+3
	Boud's point about a "random reader" not being a good example case was correct. But "user" also gives it a software perspective that is ofcourse not wrong, its can just be confusing. So I thought of changing it to "interested reader". In the part about the C-library dependency of high-level software, from Boud's correction, I found out that it is very hard to convey what I wanted to say (that separating errors due to C-library implementation and measurement errors will be easy, because they should be on much different scales). But I then corrected it to give it a slightly better tone while mentioning the same thing: that with Maneage we can now accurately measure the effect of the C library.
2020-05-31	Mostly minor edits of nearly final version	Boud Roukema	-17/+18
	Changes with this commit are mostly minor and obvious. Some worth commenting on include: * `technologies develop very fast` - As a general statement, this is too jargony, since technology is much wider than just `software`; `some technologies` makes it clear that we're referring to the specific case of the previous sentence * `in a functional-like paradigm, enabling exact provenance` - While `make` is not an imperative programming language, I don't see how `make` is `like` a functional programming language. Classifying it as a declarative and a dataflow programming language and as a metaprogramming language would seem to go in the right direction [1-3]. I also couldn't see how the language type relates to tracking exact provenance. But since we don't want to lengthen the text, my proposal is to put `and efficient in managing exact provenance` without trying to explain this in terms of a taxonomy of programming languages. [1] https://en.wikipedia.org/wiki/Functional_programming [2] https://en.wikipedia.org/wiki/Comparison_of_multi-paradigm_programming_languages [3] https://en.wikipedia.org/wiki/Dataflow_programming * `A random reader` - In the scientific programming context, `random` has quite specific meanings which we are not using here; a `reader` has not necessarily tried to reproduce the project. So I've proposed `A user` here - with the idea that a `user` is more likely to be someone who has done `./project configure && ./project make`. * `studying this is another research project` - the present tense `is` doesn't sound so good; I've put what seems to be about the shortest natural equivalent. Pdf word count: 5856
2020-05-30	Corrected a few words for more clarity	Mohammad Akhlaghi	-2/+2
	An "internally" was added to the part about core GNU tools accounting for the differences between POSIX-compatible systems. One extra word was also removed in the next sentence.
2020-05-30	Corrected a few words to make POSIX-fuzzyness paragraph more clear	Mohammad Akhlaghi	-2/+2
	Hopefully, it is more to the point with these few word-corrections.
2020-05-30	Discussion on issues with POSIX and minor edits to shorten paper	Mohammad Akhlaghi	-39/+47
	Konrad raised some very interesting points in particular about the limitations of POSIX as a fuzzy standard that does not guaratee reproducibility. A relatively long paragraph was thus added in the discussion to address this important point. In order to fit it in, the paragraph on "unwanted competition" was removed since the POSIX issue was much more relevant for a curious reader. Throughout the text, some other parts were edited to decrease the length of the paper while making it easier to read.
2020-05-30	Minor edits removing redundant sentences	Mohammad Akhlaghi	-4/+3
	Some of the redundant sentences have been removed and some minor edits made.
2020-05-29	Minor tidying of about half a dozen words	Boud Roukema	-11/+11
	The changes in this commit are best shown with `git diff --word-diff` or `git patch --word-diff`. There are about half a dozen changes of 1-2 words or a comma, the reasons should be obvious. The sentence with "can not just" seems to be correct formally, but "can not only" seems to me better to warn the reader that this is a phrase of the form "can not only do X but can also do Y"; "can not just" sounds a bit like "You cannot just enter the room without knocking" - it doesn't require a second part.