paper-concept.git - Paper (Towards Long-term and Archivable Reproducibility)

Age	Commit message (Collapse)	Author	Lines
2020-08-20	Imported recent updates in Maneage, minor conflicts fixed	Mohammad Akhlaghi	-0/+2
	Some very minor conflicts came up and were easily corrected. They were mostly in parts that are also shared with the demonstration in the core Maneage branch.
2020-07-21	Printing location when downloaded input data checksum is different	Boud Roukema	-0/+1
	There are many different directory trees involved in Maneage system: the top directory, the 'reproduce/' directory and its sub-directories, '.build/' (that point to a user-defined build area), and a possibly user-defined input directory. Until now, in the case of a download checksum failure, it was not immediately obvious [1] to the user where the file with a failed checksum is. To clarify to the user where the suspicious file is now located, this commit adds a line to 'reproduce/analysis/make/download.mk' to print out this full path location: '$$unchecked' along with the expected and calculated checksums. [1] Euphemism for me spending lots of time debugging and being confused.
2020-07-04	Better names and comments in INPUTS.conf	Mohammad Akhlaghi	-2/+3
	Until now, the dataset's configuration names had a 'WFPC2' prefix. But this very alien to anyone that is not familiar with the history of the Hubble Space Telescope (the camera is no longer used! Its just used here since its one of the standard FITS files from the FITS standard webpage). With this commit the variable names have been modified to be more readable and clear (having a 'DEMO-' prefix). Also the comments of 'INPUTS.conf' (describing the purpose of each variable) were edited and made more clear.
2020-06-10	Corrected bug in using local copy of input dataset	Mohammad Akhlaghi	-11/+15
	As described in Maneage's commit 2bd2e2f18 (which I found while testing this project), the existing download recipe had problems when using a local copy of the input dataset. It was first fixed here, then implemented there. Also, to clarify things for a new user, some long comments were added at the top of 'INPUTS.conf' to describe each of the variables, that comment has also been put here (and is also in commit 2bd2e2f18 of Maneage).
2020-06-10	IMPORTANT: bug fix in default data download script of download.mk	Mohammad Akhlaghi	-10/+14
	Summary of possible semantic conflicts 1. The recipe to download input datasets has been modified. You have to re-set the old 'origname' variable to 'localname' (to avoid confusion) and the default dataset URL should now be complete (including the actual filename). See the newly added descriptions in 'INPUTS.conf' for more on this. Until now, when the dataset was already present on the host system, a link couldn't be made to it, causing the project to crash in the checksum phase. This has been fixed with properly naming the main variable as 'localname' to avoid the confusion that caused it. Some other problems have been fixed in this recipe in the meantime: - When the checksum is different, the expected and calculated checksums are printed. - In the default paper, we now print the full URL of the dataset, not just the server, so the checksum of the 'download.tex' step has been updated.
2020-06-03	Imported recent updated in Maneage, minor conflict fixed	Mohammad Akhlaghi	-8/+11
	The minor conflict was with 'reproduce/software/make/high-level.mk', and in particular because we implemented the fix to Maneage's Task #15664 in this project first. After it was moved to the main Maneage branch some minor stylistic corrections were done to it, thus causing the conflict. To resolve the conflict, I simply imported the full Maneage version of the file with this command: git checkout maneage -- reproduce/software/make/high-level.mk The other conflicts were due to the deleted files (that were resolved as described in 'README-hacking.md') and the LaTeX files that I had told '.gitattributes' to ignore from the Maneage branch.
2020-05-22	Corrected copyright notices to fit GPL suggested format	Mohammad Akhlaghi	-8/+11
	In time, some of the copyright license description had been mistakenly shortened to two paragraphs instead of the original three that is recommended in the GPL. With this commit, they are corrected to be exactly in the same three paragraph format suggested by GPL. The following files also didn't have a copyright notice, so one was added for them: reproduce/software/make/README.md reproduce/software/bibtex/healpix.tex reproduce/analysis/config/delete-me-num.conf reproduce/analysis/config/verify-outputs.conf
2020-05-01	Imported recent changes in Maneage, minor conflicts fixed	Mohammad Akhlaghi	-8/+8
	A few small conflicts showed up here and there. They are fixed with this merge.
2020-04-23	Further edits to summarize the parts corrected by Boud	Mohammad Akhlaghi	-1/+4
	[Compared to first submission to DSJ last week with 11436 words in raw PDF, we have decreased the paper by ~1000 words to 10493 :-)] As with the previous commits, the moment Boud changed the structure of sentences, I was able to find the redundancies and remove them! This is a fascinating feature of collaboration I had never felt before: it is so hard to find redundancies in my own raw text, but even a minor correction by someone else suddeny breaks my mental memories/barrier on the sentence, allowing me to be more critical to it! Anyway, besides such corrections, I fixed a few other things: 1) In the DSJ's recently published papers, ther is no `~' between "Figure" and its number. 2) I noticed that in `tex/src/figure-src-inputconf.tex' I was actually using manually input strings for the filename, checksum and size! This was contrary to the whole philosophy of Maneage(!), I must have rushed and forgot! So LaTeX variables are now defined and used.
2020-04-20	Maneage instead of Template in README-hacking.md and copyright notices	Mohammad Akhlaghi	-8/+8
	Until now, throughout Maneage we were using the old name of "Reproducible Paper Template". But we have finally decided to use Maneage, so to avoid confusion, the name has been corrected in `README-hacking.md' and also in the copyright notices. Note also that in `README-hacking.md', the main Maneage branch is now called `maneage', and the main Git remote has been changed to `https://gitlab.com/maneage/project' (this is a new GitLab Group that I have setup for all Maneage-related projects). In this repository there is only one `maneage' branch to avoid complications with the `master' branch of the projects using Maneage later.
2020-04-02	Imported recent work on Maneage, minor conflicts fixed	Mohammad Akhlaghi	-3/+4
	A few minor conflicts occurred and were fixed.
2020-03-02	Described the first analysis phase with a demo subMakefile	Mohammad Akhlaghi	-4/+4
	Until now, there was no explanation on an actual analysis phase, therefore with this commit an example scenario with a readable Makefile is included. The Data lineage graph was also simplified to both be more readable, and also to correspond to this new explanation and subMakefile. Some random edits/typos were also corrected and some references added for discussion.
2020-02-16	Menke+2020 data is now imported and ready for later steps in plain text	Mohammad Akhlaghi	-6/+6
	The main problems with this dataset was the names of the journals (which sometimes have single quotes or apostrophes in them that is really annoying for SED)! But ultimately, for the simple study we want to do here, the journal names are irrelevant, so in the end I just ignored the names. Later we can set an identifier for the journals if necessary. But now we have the basic information in a way that is usable in a plot to show in this paper.
2020-01-20	IMPORTANT!!! Configuration Makefiles now have a .conf suffix	Mohammad Akhlaghi	-4/+5
	Until now, the configuration Makefiles (in `reproduce/software/config/installation' and `reproduce/analysis/config') had a `.mk' suffix, similar to the workhorse Makefiles. Although they are indeed Makefiles, but given their nature (to only keep configuration parameters), it is confusing (especially to early users) for them to also have a `.mk' (similar to the analysis or software building Makefiles). To address this issue, with this commit, all the configuration Makefiles (in those directories) are now given a `.conf' suffix. This is also assumed for all the files that are loaded. The configuration (software building) and running of the template have been checked with this change from scratch, but please report any error that may not have been noticed. THIS IS AN IMPORTANT CHANGE AND WILL CAUSE CRASHES OR UNEXPECTED BEHAVIORS FOR PROJECTS THAT HAVE BRANCHED FROM THIS TEMPLATE. PLEASE CORRECT THE SUFFIX OF ALL YOUR PROJECT'S CONFIGURATION MAKEFILES (IN THE DIRECTORIES ABOVE), OTHERWISE THEY AREN'T AUTOMATICALLY LOADED ANYMORE.
2020-01-01	Copyright statements updated to include 2020	Mohammad Akhlaghi	-1/+1
	Now that its 2020, its necessary to include this year in the copyright statements.
2019-11-29	Download links directly to actual file if it exists in INDIR	Mohammad Akhlaghi	-2/+8
	Until now, when an input dataset already exists in `INDIR', the template would just make a symbolic link to it in the build directory. However, in many cases, the files in INDIR will actually be links to other locations on the filesystem and some programs have problems following too many links. With this commit, the template is now using the `readlink' program (part of GNU Coreutils) to follow a possible link and point the link in the build directory directly to an actual non-link file.
2019-04-15	New architecture to separate software-building and analysis steps	Mohammad Akhlaghi	-0/+91
	Until now, the software building and analysis steps of the pipeline were intertwined. However, these steps (of how to build a software, and how to use it) are logically completely independent. Therefore with this commit, the pipeline now has a new architecture (particularly in the `reproduce' directory) to emphasize this distinction: The `reproduce' directory now has the two `software' and `analysis' subdirectories and the respective parts of the previous architecture have been broken up between these two based on their function. There is also no more `src' directory. The `config' directory for software and analysis is now mixed with the language-specific directories. Also, some of the software versions were also updated after some checks with their webpages. This new architecture will allow much more focused work on each part of the pipeline (to install the software and to run them for an analysis).