paper-concept.git - Paper (Towards Long-term and Archivable Reproducibility)

Age	Commit message (Collapse)	Author	Lines
2021-04-09	Minor corrections on previous copyedit	Mohammad Akhlaghi	-1/+1
	Being immutable doesn't necessary mean that something is always present, so an "always present" was also added for the reason we recommend a Git hash. The end of the sentence was also slightly summarized to allow the extra few words. The re-wording of the conclusion of Active papers, was great! I just changed the "likely" to "possible", because as Konrad mentioned in Commit a63900bc5a8, he is now using Guix.
2021-04-09	Minor copyedits	Boud Roukema	-1/+1
	These are minor last minute copyedits for recently added text, e.g. a git hash is not literally a timestamp.
2021-04-09	Comments by IAA's AMIGA team implemented	Mohammad Akhlaghi	-6/+10
	The AMIGA team at the Instituto Astrofísica Andalucía (IAA) are very active proponents of reproducibility. They had already provided very constructive comments after my visit there and many subsequent interactions. So until now, the whole team's contributions were acknowledged. Since the last submission, several of the team members were able to kindly invest the time in reading the paper and providing very useful comments which are now being implemented. As a result, I was able to specifically thank them in the paper's acknowledgments (Thanks a lot AMIGA!). Below, I am listing the points in the order that is shown in 'git log -p -1' for this commit. - Javier Moldón: "PM is not defined. First appearance in the first page". Thanks for noticing this Javier, it has been corrected. - Javier Moldón: "In Section III. PROPOSED CRITERIA FOR LONGEVITY and Appendix B, you mention the FAIR principles as desirable properties of research projects and solutions, respectively which is good, but may bring confusion. Although they are general enough, FAIR principles are specifically for scientific data, not scientific software. Currently, there is an initiative promoted by the Research Data Alliance (RDA), among others, to create FAIR principles adapted to research software, and it is called FAIR4RS (FAIR for Research Software). More information here: https://www.rd-alliance.org/groups/fair-4-research-software-fair4rs-wg. In 2020 there was a kick-off meeting to divide the work in 4 WG. There is some more information in this talk: https://sorse.github.io/programme/workshops/event-016/. I have been following the work of WG1, and they are about the finish the first document describing how to adapt the FAIR principles to software. Even if all this is still work in progress, I think the paper would benefit from mentioning the existence of this effort and noticing the diferences between Data and Software FAIR definitions." Thanks for highlighting this Javier, a footnote has been added for this (hopefully faithfully summarizing it into one sentence due to space limitations). - Sebastian Luna Valero: "Would it be a good idea to define long-term as a period of time; for example, 5 years is a lot in the field of computer science (i.e. in terms of hardware and software aging), but maybe that is not the case in other domains (e.g. Astronomy)." Thanks Sebastian, in section 2, we do give longevity of the various "tools" in rough units of years (this was also a suggestion by a referee). But of course the discussion there is very generic, so going into finer detail would probably be too subjective and bore the reader. - Sebastian Luna Valero: "Why do you use git commit eeff5de instead of git tags or releases for Maneage? Shown for example in the abstract of the paper: "This paper is itself written with Maneage (project commit eeff5de)." Thanks for raising this important point, a sentence has been added to explain why hashes are objective and immutable for a given history, while tags can easily be removed or changed, or not cloned/pushed at all. - Susana Sanchez Exposito: "We think interoperability with other research projects would be important, do you have any plans to make maneage interoperable with, for example, the Common Workflow Language (CWL)?". Thanks a lot for raising this point Susana. Indeed, in the future I really do hope we can invest enough resources on this. In the discussion, I had already touched upon research objects as one method for interoperability, there was also a discussion on such generic standards in Appendix A.D.10. But to further clarify this point (given its importance), I mentioned CWL (and also the even more generic CWFR) in the discussion. - Sebastian Luna Valero: "Regarding Apache Taverna, please see:" https://github.com/apache/incubator-taverna-engine/blob/master/README.md Thanks a lot for this note Sebastian! I didn't know this! I wrote this section (and visited their webpage) before their "vote"! It was a surprize to see that their page had changed. I have modified the explanation of Taverna to mention that it has been "retired" and use the Github link instead. - Sebastian Luna Valero: "Page 21: 'logevity' should be 'longevity'." Thanks a lot for noticing this! It has been corrected :-). - Javier Moldón: "There is a nice diagram in Johannes Köster's article on data processing with snakemake that I find very interesting to show some key aspects of data workflows: see Fig 1 in https://www.authorea.com/users/165354/articles/441233-sustainable-data-analysis-with-snakemake " This is indeed a nice diagram! I tried to cite it, but as of today, this link is not a complete paper (with no abstract and many empty section titles). If it was complete, I would certainly have cited it in Snakemake's discussion. - Javier Moldón: "Regarding the problem mentioned in the introduction about PM not precisely identified all software versions, I would like to mention that with Snakemake, even if the analysis are usually constructed using other package managers such as conda, or containers, you don't need to depend on online servers or poorly-documented software versions, as you can now encapsulate an analysis in a tarball containing all the software needed. You still have long-term dependency problems (as you will need to install snakemake itself, and a particular OS), but at least you can keep the exact software versions for a particular platform." Thanks for highlighting this Javier. This is indeed better than nothing, we have already discussed the dangers of this "black box" approach of archiving binaries in many contexts, and many package managers have it. So while I really appreciate the point (I didn't know this), to avoid lengthening the paper, I think its fine to not mention it in the paper.
2021-04-09	Comments by Konrad Hinsen implemented	Mohammad Akhlaghi	-25/+44
	Konrad had kindly gone through the paper and the appendices with very good feedback that is now being addressed in the paper (thanks a lot Konrad!): - IPOL recently also allows Python code. So the respective parts of the description of IPOL have been updated. To address the dependency issue, I also added a sentence that only certain dependencies (with certain versions) are acceptable. - On Active Papers (AP: which is written by Konrad) corrections were made based on the following parts of his comments: - "The fundamental issue with ActivePapers is its platform dependence on either Java or Python, neither of which is attractive." - "The one point which is overemphasized, in my opinion, is the necessity to download large data files if some analysis script refers to it. That is true in the current implementation (which I consider a research prototype), but not a fundamental feature of the approach. Implementing an on-demand download strategy is not particularly complicated, it just needs to be done, and it wasn't a priority for my own use cases." - "A historical anecdote: you mention that HDF View requires registering for download. This is true today, but wasn't when I started ActivePapers. Otherwise I'd never have built on HDF5. What happened is that the HDF Group, formerly part of NCSA and thus a public research infrastructure, was turned into a semi-commercial entity. They have committed to keeping the core HDF5 library Open Source, but not any of the tooling around it. Many users have moved away from HDF5 as a consequence. The larger lesson is that Richard Stallman was right: if software isn't GPLed, then you never know what will happen to it in the future." - On Guix, some further clarification was added to address Konrad's quote below (with a link to the blog-post mentioned there). In short, I clarified that I mean storing the Guix commit hash with any respective high-level analysis change is the extra step. - "I also looked at the discussion of Nix and Guix, which is what I am mainly using today. It is mostly correct as well, the one exception being the claim that 'it is up to the user to ensure that their created environment is recorded properly for reproducibility in the future'. The environment is recorded in all detail, automatically. What requires some effort is extracting a human-readable description of that environment. For Guix, I have described how to do this in a blog post (https://guix.gnu.org/en/blog/2020/reproducible-computations-with-guix/), and in less detail in a recent CiSE paper (https://hal.archives-ouvertes.fr/hal-02877319). There should definitely be a better user interface for this, but it's no more than a user interface issue. What is pretty nice in Guix by now is the user interface for re-creating an environment, using the "guix time-machine" subcommand." - The sentence on Software Heritage being based on Git was reworded to fit this comment of Konrad: "The plural sounds quite optimistic. As far as I know, SWH is the only archive of its kind, and in view of the enormous resources and long-time commitments it requires, I don't expect to see a second one." - When introducing hashes, Konrad suggested the following useful paper that shows how they are used in content-based storage: DOI:10.1109/MCSE.2019.2949441 - On Snakemake, Konrad had the following comment: "[A system call in Python is] No slower than from bash, or even from any C code. Meaning no slower than Make. It's the creation of a new process that takes most of the time." So the point was just shifted to the many quotations necessary for calling external programs and how it is best suited for a Python-based project. In addition some minor typos that I found during the process are also fixed.
2021-01-07	Removed all \new highlights after submission of review	Mohammad Akhlaghi	-2/+2
	With the submission of the revision (which highlighted all the relevant parts to the points the referees raised in the submitted PDF) it is no longer necessary to highlight these parts. If we get another revision request, we can add new '\new' parts for highlighting.
2021-01-07	Minor copyedits in appendices, e.g. parentheses	Boud Roukema	-5/+7
	This commit makes some minor fixes following the hardwired non-numerical solution to the cross-referencing issue between the main article and the supplement, such as fixing "lineage like lineage" and missing closing parentheses. From Mohammad: while re-basing the commit over the 'master' branch, I also added Boud'd name at the top of the copyright holders of the appendices.
2021-01-05	appendix.bbl is now included in make dist tarball	Mohammad Akhlaghi	-1/+1
	Since the addition of the appendix bibliography we hadn't checked the 'make dist' command, as a result the PDF couldn't be built. With this commit, in the 'dist' rule, we are now also copying 'appendix.bbl' and the created tarball could build the PDF properly. Also the 'peer-review' directory is now also included in the tarball created by './project make dist'. I also found a small typo in the description of Occam (an 'a' was missing) and fixed it.
2021-01-05	Polished main paper and appendices after a full re-read	Mohammad Akhlaghi	-151/+193
	In preparation for the submission of the revised manuscript, I went through the full paper and appendices one last time. The second appendix (reviewing existing reproducible solutions) in particular needed some attention because some of the tools weren't properly compared with the criteria. In the paper, I was also able to remove about 30 words, and bring our own count (which is an over-estimation already) to below 6250.
2021-01-04	Edits on points raised by Raul	Mohammad Akhlaghi	-10/+12
	After his previous two commits, we discussed some of the points and I am making these edits following those. In particular the last statement about Madagascar "could have been more useful..." was changed to simply mention that mixing workflow with analysis is against the modularity principle. We should not judge its usefulness to the community (which is beyond our scope and would need an official survey). A few other minor edits were done here and there to clarify some of the points.
2021-01-04	Very minor corrections to the necessity appendix	Raul Infante-Sainz	-13/+18
	With this commit, I have corrected some minor typos of this appendix. They are very minor corrections.
2021-01-04	Minor corrections to the existing solutions appendix	Raul Infante-Sainz	-87/+104
	With this commit, I have corrected some minor typos of this appendix. In addition to that, I also put empty lines to separate subsections and subsubsections appropiately.
2021-01-03	Spell check on main body and appendices	Mohammad Akhlaghi	-29/+29
	I ran a simple Emacs spell check over the main body and the two appendices. All discovered typos have been fixed.
2021-01-03	Minor corrections to the existing tools appendix	Raul Infante-Sainz	-136/+137
	With this commit, I have corrected some minor typos of this appendix. In addition to that, I also put empty lines to separate subsections and subsubsections appropiately (5 lines and 1 line, respectively).
2021-01-03	Updated copyrights of project-specific copyrights	Mohammad Akhlaghi	-6/+6
	Having entered 2021, it was necessary to update the years of all the copyright statements.
2021-01-03	Imported recent updates in Maneage, minor conflicts fixed	Mohammad Akhlaghi	-5/+5
	There were only three very small conflicts that have been fixed.
2021-01-03	No links to main body in the appendices in --supplement mode	Mohammad Akhlaghi	-9/+50
	Until now, in the appendices we were simply using '\ref' to refer to different parts of the published paper. However, when built in '--supplement' mode, the main body of the paper is a separate PDF and having links to a separate PDF is not impossible, but far too complicated. However, having the links adds to the richness of the text and helps point readers to specific parts of the paper. With this commit, there is a LaTeX conditional anywhere in the appendices that we want to refer the reader to sections/figures in the main body. When building a separate PDF, the resepective section/figure is cited in a descriptive mode (like "Seciton discussing longevity of tools"). However, when the appendices go into the same PDF as the main body, the '\ref's remain.
2021-01-03	Added Boud as copyright holder of supplement.tex	Mohammad Akhlaghi	-1/+2
	Having added/modified text in the supplements, Boud is now a copyright holder of this file too. I also added 2021 to the copyright years of paper.tex and supplement.tex.
2021-01-03	Minor copyediting	Boud Roukema	-9/+9
	This commit does some minor copyediting, especially of the introduction to the supplement. There's no point complaining to the reader about the word limit of the journal: s/he is not interested in that. This is not the right place for discussing journal policy. The need for summarising content and focussing on key elements of a cohesive argument is fundamental in a world of information overload. A&A/MNRAS/ApJ/PRD letters are generally much worse than normal articles in terms of reproducibility because they have to omit so many details that the reader has to read the full articles to really know what is done. But the reality is that letters get read a lot, because they're short and snappy.
2021-01-03	Cleaned abstract and Section II to fit word limit	Mohammad Akhlaghi	-1/+2
	In the abstract the repeated benefits of Maneage (which are also mentioned in the criteria) were removed to fit into CiSE's online submission guidelines. In Section II (Longevity of existing tools), the paragraph that itemized the following paragrahs as a numbered list has been removed with the sentence that repeatedly states the importance of reproducibility in the sciences and some branches of the industry. With these changes our approximate automatic count has 6277 words. This is still very slightly larger than the 6250 word limit of the journal. However, this count is a definite over-estimation (including many things like page titles and page numberings from the raw PDF to text conversion). So the actual count for the journal publication should be less than this. A few other tiny corrections were made: - The year of the paper and copyright in 'README.md' was set to 2021. The copyright of the rest of the files will be set to 2021 after the next merge with Maneage soon (the years of core infrastructure copyrights has already been corrected there). - Mohammadreza's name was added in 'README.md'. - The line to import the "necessity" appendix has been commented in the version to have the full paper in one PDF (to be upladed to arXiv or Zenodo). - The supplement PDF now starts with '\appendices' so the sections have the same labels as the single-PDF version.
2021-01-03	Added abstract for supplement	Mohammad Akhlaghi	-1/+13
	Until now the supplement had no introduction for a random reader to see the purpose of this "Web extra" supplement. With this commit, an abstract has been added.
2021-01-02	Supplement (containing appendices) optionally built separately	Mohammad Akhlaghi	-3/+143
	Until now, the build strategy of the paper was to have a single output PDF that either contains (1) the full paper with appendices in the same paper (2) only the main body of the paper with no appencies. But the editor in chief of CiSE recently recommended publishing the appendices as supplements that is a separate PDF (on its webpage). So with this commit, the project can make either (1) a single PDF (containing both the main body and the appendices) that will be published on arXiv and will be the default output (this is the same as before). (2) two PDFs: one that is only the main body of the paper and another that is only the appendices. Since the appendices will be printed as a PDF in any case now, the old '--no-appendix' option has been replaced by '--supplement'. Also, the internal shell/TeX variable 'noappendix' has been renamed to 'separatesupplement'.
2021-01-02	Copyright year updated in all source files	Mohammad Akhlaghi	-9/+9
	Having entered 2021, it was necessary to update the copyright years at the top of the source files. We recommend that you do this for all your project-specific source files also.
2020-12-30	Each appendix moved to a separate .tex file	Mohammad Akhlaghi	-0/+1059
	As recommended by Lorena Barba (editor in chief of CiSE), we should prepare the appendices as a separate "Supplement" for the journal. But we also want them to be appendices within the paper when built for arXiv. As a first step, with this commit, each appendix has been put in a separate 'tex/src/appendix-*.tex' file and '\input' into the paper. We will then be able to conditionally include them in the PDF or not. Also, as recommended by Lorena, the general "necessity for reproducible research" appendix isn't included (possibly going into the webpage later).
2020-12-29	Copyedit on Appendix A	Boud Roukema	-34/+34
	This commit makes many small wording fixes, mainly to Appendix A. It also insert "quotes" around some of the titles fields in 'tex/src/references.tex', since otherwise capitalisation is lost (DNA becomes Dna; 'of Reinhart and Rogoff' becomes 'of reinhart and rogoff'; and so on). I didn't do this for all titles, because some Have All Words Capitalised, which blocks the .bib file from choosing a consistent style.
2020-12-28	Minor edits, updated citation to published Menke+20 paper	Mohammad Akhlaghi	-4/+5
	Some minor edits were made to the paper to shorten it. In particular the example of IPOL was removed from the main body of the paper, and we'll just rely on the more extensive review of IPOL in the appendix. I also updated the referee report to account for the new Appendix A that is just an extended introduction. Also, I noticed that the Menke+20 paper that we replicate here has recently been published in the iScience journal. So its bibliography was updated from the bioarXiv information to the journal information. Also, the number of words (after removing abstract and captions and accounting for figures) is now only printed when the project is built with '--no-appendix'. This was done because this information is extra/annoying/unnecessary for the case where there is an appendix.
2020-12-28	The old/long introduction is now an appendix on necessity	Mohammad Akhlaghi	-6/+6
	In the first/long draft of this work, we had a good introduction on the necessity of reproducibility. But we were forced to remove it because of word-count limits. Having moved a major portion of the previous work into the appendices, I thought it would be good to put that introduction as a first appendix also, focused on the necessity for reproducibile research.
2020-12-01	Imported recent work in Maneage, minor conflicts fixed	Mohammad Akhlaghi	-8/+19
	Some minor conflicts that came up during the merge were fixed.
2020-12-01	IMPORTANT: organizational improvements in Maneage TeX sources	Mohammad Akhlaghi	-159/+173
	This only concerns the TeX sources in the default branch. In case you don't use them, there should only be a clean conflict in 'paper.tex' (that is obvious and easy to fix). Conflicts may only happen in some of the 'tex/src/preamble-*.tex' files if you have actually changed them for your project. But generally any conflict that does arise by this commit with your project branch should be very clear and easy to fix and test. In short, from now on things will even be easier: any LaTeX configuration that you want to do for your project can be done in 'tex/src/preamble-project.tex', so you don't have to worry about any other LaTeX preamble file. They are either templates (like the ones for PGFPlots and BibLaTeX) or low-level things directly related to Maneage. Until now, this distinction wasn't too clear. Here is a summary of the improvements: - Two new options to './project make': with '--highlight-new' and '--highlight-notes' it is now possible to activate highlighting on the command-line. Until now, there was a LaTeX macro for this at the start of 'paper.tex' (\highlightchanges). But changing that line would change the Git commit hash, making it hard for the readers to trust that this is the same PDF. With these two new run-time options, the printed commit hash will not changed. - paper.tex: the sentences are formatted as one sentence per line (and one line per sentence). This helps in version controlling narrative and following the changes per sentence. A description of this format (and its advantages) is also included in the default text. - The internal Maneage preambles have been modified: - 'tex/src/preamble-header.tex' and 'tex/src/preamble-style.tex' have been merged into one preamble file called 'tex/src/preamble-maneage-default-style.tex'. This helps a lot in simply removing it when you use a journal style file for example. - Things like the options to highlight parts of the text are now put in a special 'tex/src/preamble-maneage.tex'. This helps highlight that these are Maneage-specific features that are independent of the style used in the paper. - There is a new 'tex/src/preamble-project.tex' that is the place you can add your project-specific customizations.
2020-11-30	New tex/src/preamble-maneage.tex for Maneage-only TeX customization	Mohammad Akhlaghi	-22/+53
	Until now, the Maneage-only features of LaTeX where mixed with 'tex/src/preamble-project.tex' (which is reserved for project-specific things). But we want to move the highlighting features (that have started here) into the core Maneage branch, so its best for these Maneage-specific features to be in a Maneage-specific preamble file. With this commit, a hew 'tex/src/preamble-maneage.tex' has been created for this purpose and the highlighting modes have been put in there. In the process, I noticed that 'tex/src/preamble-project.tex' doesn't have a copyright! This has been corrected.
2020-11-27	Clickable links in appendices	Boud Roukema	-6/+7
	This commit makes the numbered links to references such as [13] [14] [15] in the appendices clickable in the pdf. The solution was to call the "\newcites" command from the "multilibs" package after loading "hyperref". First do "rm -fv .build/tex/build/.bbl .build/tex/build/.aux" and then "./project make" a few times.
2020-11-23	First draft of all the points addressed by the referees	Mohammad Akhlaghi	-4/+217
	A new directory has been added at the top of the project's source called 'peer-review'. The raw reviews of the paper by the editors and referees has been added there as '1-review.txt'. All the main points raised by the referees have been listed in a numbered list and addressed (mostly) in '1-answers.txt'. The text of the paper now also includes all the implemented answers to the various points.
2020-11-20	Highlighting changes can now be toggled at run-time	Mohammad Akhlaghi	-5/+9
	Until now, the core Maneage 'paper.tex' had a '\highlightchanges' macro that defines two LaTeX macros: '\new' and '\tonote'. When '\highlightchanges' was defined, anything that was written within '\new' became dark green (highlighting new things that have been added). Also, anything that was written in '\tonote' was put within a '[]' and became dark red (to show that there is a note here that should be addressed later). When '\highlightchanges' wasn't defined, anything within the '\new' element would be black (like the rest of the text), and the things in '\tonote' would not be shown at all. Commenting the '\newcommand{\highlightchanges}{}' line within 'paper.tex' (to toggle the modes above) would create a different Git hash and has to be committed. But this different commit hash could create a false sense in the reader that other things have also been changed and the only way they could confirm was to actually go and look into the project history (which they will not usually have time to do, and thus won't be able to trust the two modes of the text). Also, the added highlights and the note highlights were bundeled together into one macro, so you couldn't only have one of them. With this commit, the choice of highlighting either one of the two is now done as two new run-time options to the './project' script (which are passed to the Makefiles, and written into the 'project.tex' file which is loaded into 'paper.tex'). In this way, we can generate two PDFs with the same Git commit (project's state): one with the selected highlights and another one without it. This issue actually came up for me while implementing the changes here: we need to submit one PDF to the journal/referees with highlights on the added features. But we also need to submit another PDF to arXiv and Zenodo without any highlights. If the PDFs have different commit hashes, the referees may associate it with other changes in any part of the work. For example https://oadoi.org/10.22541/au.159724632.29528907 that mentions "Another version of the manuscript was published on arXiv: 2006.03018", while the only difference was a few words in the abstract after the journal complained on the abstract word-count of our first submission (where the commit hashes matched with arXiv/Zenodo).
2020-11-15	First edits on the newly added appendices in new form	Mohammad Akhlaghi	-12/+11
	With the optional appendices added recently to the paper, it was important to go through them and make them more fitting into the paper.
2020-11-04	Appendix of long paper added, optionally we can disable it	Mohammad Akhlaghi	-306/+24
	Given the referee reports, after discussing with the editors of CiSE, we decided that it is important to include the complete appendix we had before that included a thorough review of existing tools and methods. However, the appendix will not be published in the paper (due to the strict word-count limit). It will only be used in the arXiv/Zenodo versions of the paper. This actually created a technical problem: we want the commit hash of the project source to remain the same when the paper is built with an appendix or without it. To fix this problem the choice of including an appendix has gone into the 'project' script as a run-time option called '--no-appendix'. So by default (when someone just runs './project make'), the PDF will have an appendix, but when we want to submit to the journal, or when the appendix isn't needed for a certain reason, we can use this new option. The appendix also has its own separate bibliography. Some other corrections made in this commit: 1. Some new references were added that had an '_' in their source, they were corrected in 'references.tex'. 2. I noticed that 'preamble-style.tex' is not actually used in this paper, so it has been deleted.
2020-09-24	Gnuastro's analysis configuration files removed	Mohammad Akhlaghi	-2/+2
	Until now, the core Maneage branch included some configuration files for Gnuastro's programs. This was actually a remnant of the distant past when Maneage didn't actually build its own software and we had to rely on the host's software versions. This file contained the configuration files specific to Gnuastro for this project and also had a feature to avoid checking the host's own configuration files. However, we now build all our software ourselves with fixed configuration files (for the version that is being installed and its version is stored). So those extra configuration files were just extra and caused confusion and problems in some scenarios. With this commit, those extra files are now removed. Also, two small issues are also addressed in parallel with this commit: - When running './project make clean', the 'hardware-parameters.tex' macro file (which is created by './project configure' is not deleted. - The project title is now written into the default output's PDF's properties (through 'hypersetup' in 'tex/src/preamble-header.tex') through the LaTeX macro. All these issues were found and fixed with the help of Samane Raji.
2020-08-20	Data lineage and replicated plot in one row	Mohammad Akhlaghi	-31/+261
	Until now, the replicated plot had the width of the full page and the data lineage graph was under it. Together they were covering more than half of the height of the page! But the plot showing the number of papers with tools really doesn't have too much detail, and all the space was being wasted. With this commit, the plot is now much much thinner and the data lineage graph has been fitted to the right of it.
2020-08-20	Imported recent updates in Maneage, minor conflicts fixed	Mohammad Akhlaghi	-0/+12
	Some very minor conflicts came up and were easily corrected. They were mostly in parts that are also shared with the demonstration in the core Maneage branch.
2020-07-04	Better names and comments in INPUTS.conf	Mohammad Akhlaghi	-1/+1
	Until now, the dataset's configuration names had a 'WFPC2' prefix. But this very alien to anyone that is not familiar with the history of the Hubble Space Telescope (the camera is no longer used! Its just used here since its one of the standard FITS files from the FITS standard webpage). With this commit the variable names have been modified to be more readable and clear (having a 'DEMO-' prefix). Also the comments of 'INPUTS.conf' (describing the purpose of each variable) were edited and made more clear.
2020-07-04	Citing Maneage paper in acknowledgments	Mohammad Akhlaghi	-1/+1
	In the previous commit, the modified abstract of the acknowledgments only included the URL of Maneage, but its more formal to cite the Maneage paper, the URL is already present in the paper.
2020-06-18	Better step-by-step implementation of the data-lineage figure	Mohammad Akhlaghi	-104/+87
	Until now, the data-lineage figure's step-by-step feature (using the macros defined at the top) didn't correspond to the new format! It was still based on the purely-hypothetical format, while the boxes and their contents had changed. With this commit, they macros and corresponding parts that they create have been updated to represent the step-by-step data lineage of this paper. Also, in the "tools-per-year" plot, the green line was brought ontop of the histogram to be more clear (especially when transparency isn't implemented properly in the conversion).
2020-06-17	Better color for project branch in branching figure	Mohammad Akhlaghi	-1/+1
	Until now the color for the branching figure'e "project" branch was too close to the Derved project branch. With this commit, I am using a slightly darker shade of brown that is sufficiently differnet from the core Maneage branch and the derived project branch.
2020-06-16	Acknowledged contributions of Marios Karouzos	Mohammad Akhlaghi	-28/+29
	Marios had read the first draft of the paper (Commit f990bba) and provided valuable feedback (shown below) that ultimately helped in the current version. But because of all the work that was necessary in those days, I forgot to actually thank him in the acknowledgment, while I had implemented most of his thoughts. Following Marios' thoughts on the Git branching figure, with this commit, I am also adding a few sentences at the end of the caption with a very rough summary of Git. I also changed the branch commit-colors to shades of brown (incrementally becoming lighter as higher-level branches are shown) to avoid the confusion with the blue and green signs within the schematic papers shown in the figure. Marios' comments (April 28th, 2020, on Commit f990bba) ------------------------------------------------------ I think the structure of the paper is more or less fine. There are two places that I thought could be improved: 1) Section 3 (Principles) was somewhat confusing to me in the way that it was structured. I think the main source of confusion is the mixing of what Maeage is about and what other programs have done. I would suggest to separate the two. I would have short intro for the section, similar to what you have now. However, I would suggest to highlight the underlying goals motivating the principles that follow: reproducibility, open science, something else? Then I would go into the details of the seven principles. Some of the principles are less clear to me than others. For example, why is simplicity a guiding principle? Then some other principles appear to be related, for example modularity, minimal complexity and scalability to my eyes are not necessarily separate. Finally, I would separate the comparison with other software and either dedicate a section to that somewhere toward the end of the paper (perhaps a subsection for section 5) or at least condense it and put it as a closing paragraph for Section 3. As it is now I think it draws focus from Maneage and also includes some repetitions. 2) Section 4 (Maneage) was at times confusing because it is written, I think in part as a demonstration of Maneage (i.e., including examples that showed how Maneage was used to write this or other papers) and a manual/description of the software. I wonder whether these two aspects can be more cleanly separated. Perhaps it would be possible to first have a section 4 where each of the modules/units of Maneage are listed and explained and then have the following section discuss a working example of Maneage using this or another paper. 3) I found Figure 7 [the git branching figure] and its explanation not very intuitive. This probably has to do with my zero knowledge of github and how versioning there works, but perhaps the description can be a bit more "user friendly" even for those who are not familiar with the tool. 4) I find Section 6 to be rather inconsequential. It does not add anything and it more or less is just a summary of what was discussed. I would personally remove it and include a very short summary of the ideals/principles/goals of Maneage at the beginning of Section 5, before the discussion.
2020-06-14	Better comments for the top macros of paper.tex	Mohammad Akhlaghi	-20/+0
	The default 'paper.tex' starts by defining some macros and comments describing them. Until now, the text was not too clear and could be confusing for someone that is not at all familiar with Maneage. With this commit, the comments have been edited to be more clear for a first-time reader. For example they all start with FULL CAPS summaries. Two other small things were corrected in 'tex/src/preamble-necessary.tex': - Until now 'project.tex' was included in this preamble. However, because of its importance in Maneage, and prominent place in the demonstration plot of the paper introducing Maneage, it is now included directly in 'paper.tex'. This also allows users to safely ignore/delete this preamble file if their LaTeX style is different. - I noticed that some macros for some astronomical software names from the very first commits in Maneage were still present here! They are no longer used, so they have been removed.
2020-06-13	Custom-built EPS icons in branching figure	Marjan Akbari	-432/+2270
	Until now, we were using three EPS (created from SVG) that were downloaded from https://www.flaticon.com. Therefore it was necessary to acknowledge the creators and put a link to the webpage. This consumed space in the caption and decreased the originality of the plot. Another problem was that the "collaboration" icon (with three people in it) had arrows, and some of those arrows pointed downwards, make ambiguity in relation to the top-ward arrows under the commits. With this commit, three alternative icons are added that I made from scratch, using Inkscape. The collaboration icon now is two figures and two speech-bubbles, without any arrows.
2020-06-10	Updated text of default paper.tex, putting more recent examples	Mohammad Akhlaghi	-22/+78
	The text of the default paper hadn't been changed for a very long time! In this time, three papers using Maneage have been published (which can be very good as an example), Maneage also now has a webpage! With these commit these examples and the webpage have been added and generally it was also polished a little to hopefully be more useful.
2020-06-07	Added SoftwareHeritage link, minor typo corrections and clarifications	Mohammad Akhlaghi	-0/+0
	The git history of the project is now archived on SoftwareHeritage and a link to it as was added in the "Reproducible supplement" tag just under the abstract. Also, some corrections were also made in the text. In particular, the part explaining the separation of software and data reproducibility was slightly clarified to be more clear
2020-06-06	IMPORTANT: Added publication checklist, improved relevant infrastructure	Mohammad Akhlaghi	-10/+21
	Possible semantic conflicts (that may not show up as Git conflicts but may cause a crash in your project after the merge): 1) The project title (and other basic metadata) should be set in 'reproduce/analysis/conf/metadata.conf'. Please include this file in your merge (if it is ignored because of '.gitattributes'!). 2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if you have added all analysis Makefiles to the '.gitattributes' file (thus not merging any change in them with your branch). For example with this command: git diff master...maneage -- reproduce/analysis/make/initialize.mk 3) The old 'verify-txt-no-comments-leading-space' function has been replaced by 'verify-txt-no-comments-no-space'. The new function will also remove all white-space characters between the columns (not just white space characters at the start of the line). Thus the resulting check won't involve spacing between columns. A common set of steps are always necessary to prepare a project for publication. Until now, we would simply look at previous submissions and try to follow them, but that was prone to errors and could cause confusion. The internal infrastructure also didn't have some useful features to make good publication possible. Now that the submission of a paper fully devoted to the founding criteria of Maneage is complete (arXiv:2006.03018), it was time to formalize the necessary steps for easier submission of a project using Maneage and implement some low-level features that can make things easier. With this commit a first draft of the publication checklist has been added to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018 and zenodo.3872248. To help guide users on implementing the good practices for output datasets, the outputs of the default project shown in the paper now use the new features). After reading the checklist, please inspect these. Some other relevant changes in this commit: - The publication involves a copy of the necessary software tarballs. Hence a new target ('dist-software') was also added to package all the project's software tarballs in one tarball for easy distribution. - A new 'dist-lzip' target has been defined for those who want to distribute an Lzip-compressed tarball. - The '\includetikz' LaTeX macro now has a second argument to allow configuring the '\includegraphics' call when the plot should not be built, but just imported.
2020-06-04	Scale element in includegraphics for roughly similar-sized figures	Mohammad Akhlaghi	-6/+8
	Until now, when the figures were built directly from EPS ('\newcommand{\makepdf}{}' was commented), they would take the full line-width becoming a little too large! I noticed this after letting arXiv build the PDF. With this commit, the 'includetikz' tool takes a second argument to be a parameter given to 'includegraphics' (which is scale in this case).
2020-06-04	Verification activated, README added, Proper metadata in plot data	Mohammad Akhlaghi	-3/+3
	All the steps following the to-be-added (in 'README-hacking.md') publication checklist prior to the final check from new clone have been added: - 'README.md' file has been set. - "Reproducible supplement" was added just above the keywords, pointing to Zenodo. - A link to the to-be-uploaded data underlying the plot was added in the caption of the tools-per-year plot. - A new meta-data configuration file was added to store basic project metadata to be used throughout the project. This will later be taken into Maneage. For examle the project title is now stored here and written into the paper's LaTeX source and output datasets automatically. - Verification was activated and plot's data and LaTeX macro files are now automatically verified. - A complete metadata was added for the data underlying the plot. - A generic function was added in 'initialize.mk' that will automatically write project info and copyright in all plain-text outputs.
2020-06-03	Imported recent updated in Maneage, minor conflict fixed	Mohammad Akhlaghi	-8/+12
	The minor conflict was with 'reproduce/software/make/high-level.mk', and in particular because we implemented the fix to Maneage's Task #15664 in this project first. After it was moved to the main Maneage branch some minor stylistic corrections were done to it, thus causing the conflict. To resolve the conflict, I simply imported the full Maneage version of the file with this command: git checkout maneage -- reproduce/software/make/high-level.mk The other conflicts were due to the deleted files (that were resolved as described in 'README-hacking.md') and the LaTeX files that I had told '.gitattributes' to ignore from the Maneage branch.