aboutsummaryrefslogtreecommitdiff
path: root/paper.tex
AgeCommit message (Collapse)AuthorLines
2021-01-02Supplement (containing appendices) optionally built separatelyMohammad Akhlaghi-14/+8
Until now, the build strategy of the paper was to have a single output PDF that either contains (1) the full paper with appendices in the same paper (2) only the main body of the paper with no appencies. But the editor in chief of CiSE recently recommended publishing the appendices as supplements that is a separate PDF (on its webpage). So with this commit, the project can make either (1) a single PDF (containing both the main body and the appendices) that will be published on arXiv and will be the default output (this is the same as before). (2) two PDFs: one that is only the main body of the paper and another that is only the appendices. Since the appendices will be printed as a PDF in any case now, the old '--no-appendix' option has been replaced by '--supplement'. Also, the internal shell/TeX variable 'noappendix' has been renamed to 'separatesupplement'.
2021-01-01Minor edits in the acknowledgement and biographiesMohammad Akhlaghi-63/+29
Since we have a long list of Copyright statements at the top, I thought its easier to just move the copyright notice to the top of 'paper.tex' also. In the acknowledgments, the paragraph on Maneage was slighltly summarized to save a few words and still be clear. Also, the long name of the Japanese Ministry of Education, Culture, Sports, Science, and Technology, was summarized to Japanese MEXT. In the biographies, the '-at' (replacing '@' in the emails) was changed to '-AT' to be more clear to the eye that its just a place holder.
2020-12-30Each appendix moved to a separate .tex fileMohammad Akhlaghi-1087/+2
As recommended by Lorena Barba (editor in chief of CiSE), we should prepare the appendices as a separate "Supplement" for the journal. But we also want them to be appendices within the paper when built for arXiv. As a first step, with this commit, each appendix has been put in a separate 'tex/src/appendix-*.tex' file and '\input' into the paper. We will then be able to conditionally include them in the PDF or not. Also, as recommended by Lorena, the general "necessity for reproducible research" appendix isn't included (possibly going into the webpage later).
2020-12-29Added Mohammadreza's copyright notice on paper.texMohammad Akhlaghi-0/+1
After adding Mohammadreza as an author of the paper, we forgot to add him as a copyright holder at the start of the paper.
2020-12-29Copyedit on Appendix ABoud Roukema-45/+45
This commit makes many small wording fixes, mainly to Appendix A. It also insert "quotes" around some of the titles fields in 'tex/src/references.tex', since otherwise capitalisation is lost (DNA becomes Dna; 'of Reinhart and Rogoff' becomes 'of reinhart and rogoff'; and so on). I didn't do this for all titles, because some Have All Words Capitalised, which blocks the .bib file from choosing a consistent style.
2020-12-29Mohammadreza Khellat added as an authorMohammadreza Khellat-4/+11
Mohammadreza has made significant contributions to the text of the paper and also the source. However his contributions to the text came after the initial submission, so until now, he was not added as an author. The reason we waited for this was that no responses were given by CiSE editors, on the inquiry of the possibility of adding a new author at this phase. With this commit, following approval from the editors, Mohammadreza's information has been added to the manuscript as an author to refrain from delays in submitting the manuscript revision. While merging with the 'master' branch, Mohammad also done some minor edits to the other biographies to follow a similar format.
2020-12-28Minor edits, updated citation to published Menke+20 paperMohammad Akhlaghi-32/+36
Some minor edits were made to the paper to shorten it. In particular the example of IPOL was removed from the main body of the paper, and we'll just rely on the more extensive review of IPOL in the appendix. I also updated the referee report to account for the new Appendix A that is just an extended introduction. Also, I noticed that the Menke+20 paper that we replicate here has recently been published in the iScience journal. So its bibliography was updated from the bioarXiv information to the journal information. Also, the number of words (after removing abstract and captions and accounting for figures) is now only printed when the project is built with '--no-appendix'. This was done because this information is extra/annoying/unnecessary for the case where there is an appendix.
2020-12-28The old/long introduction is now an appendix on necessityMohammad Akhlaghi-0/+87
In the first/long draft of this work, we had a good introduction on the necessity of reproducibility. But we were forced to remove it because of word-count limits. Having moved a major portion of the previous work into the appendices, I thought it would be good to put that introduction as a first appendix also, focused on the necessity for reproducibile research.
2020-12-27Edits to snapshot size argument, minor edits here and thereMohammad Akhlaghi-22/+25
Following Boud's point in the previous commit, I tried to clarify the point in the text that we are only talking about hand-written source files: in short, in this part of the paper, we are not talking abou the version/snapshot for arXiv which needs figures and many extra automatically built files. We are just talking about the raw, hand-written files. Trying to convince people how good it is to keep the raw files separate from automatically generated files ;-). Also, while looking around in other parts of the main body of the paper, I tried to edit/clarify a few points and summarize/shorten others.
2020-12-27Fix typos; snapshot sizeBoud Roukema-4/+4
This commit fixes 'automaticly', 'mega byte', 'terra byte'. It also changes 'will be far less than a mega byte' to 'should be less than a megabyte'. The reason for 'should' is that in some cases, providing a small data set in the package is useful, as in [1]. Of course, [1] would be only 0.9 Mb in size, including the data sets, instead of 1.3 Mb, if the author, whoever that may happen to be, had excluded the useless (produced) file 'paper-tmp.eps'. :P Case [2] is 0.4 Mb. These two tar archives are for ArXiv, so they also contain produced .eps files. So maybe in principle 'far less than' is right. However, on neither [3] nor [4], trying to follow the recommendations :), are any of the "useful" versions of single file archives smaller than the ArXiv version. The git bundles are bigger because of the git history, and the 'software' archives are 0.5 to 0.6 Gb because they include almost everything. However, stating something that is possible in principle but not done in practice would be misleading. So I would not include 'far less'. [1] https://zenodo.org/record/3951152/files/subpoisson-252cf1c-arXiv.tar.gz [2] https://zenodo.org/record/4062461/files/elaphrocentre-724a7c8-arXiv.tar.gz [3] https://zenodo.org/record/3951152 [4] https://zenodo.org/record/4062461
2020-12-27Fix multiply defined labelsBoud Roukema-5/+5
This commit fixes the labels alliez19, claerbout1992, schwab2000 which were multiply defined. The problem was using \citeappendix instead of \cite for these in the appendix, even though they are first used in the official part of the article. You must do './project make clean' before recreating the pdf in order for this to compile correctly. (Otherwise you'll waste time re-using old files; this means that one of our 'make' dependencies could in principle be improved.) With this change, these references in the pdf are (for me) correct clickable links back to [5], [1], [11], respectively. [If you use xpdf (poppler library), remember the 'b' key for navigate back from a clicked internal link quickly.] This way you can quickly navigate between the appendix text and the references used, and you avoid LaTeX warning about 'multiply defined labels'.
2020-12-27Copyediting, based on the not contractionBoud Roukema-23/+23
This commit provides a little bit of minor copyediting, mainly in the appendices, based on and around changing the casual 'isn't', 'don't' and other contractions with 'not' to a less casual style of language. A few of the changes aim to improve the meaning in tiny ways.
2020-12-27Minor: add missing wordBoud Roukema-1/+1
The sentence sounds better with 'the'.
2020-12-26Added example of recent CentOS terminationMohammad Akhlaghi-6/+7
It was recently announced by both RedHat[1] and CentOS[2] that CentOS 8 (which was meant to end LTS at 2030) will be terminated 8 years early (by the end of 2021). This is a perfect example of the longevity issues when relying on third-party providers. With this commit, I added this as a parenthesis after mentioning Ubuntu's LTS web address. Some minor edits were also done in other parts of this paragraph. [1] https://www.redhat.com/en/blog/centos-stream-building-innovative-future-enterprise-linux [2] https://blog.centos.org/2020/12/future-is-centos-stream
2020-12-07Proprietary obsolescence added in free software criteriaMohammad Akhlaghi-1/+1
Today, Richard Stallman sent a mail in 'info-gnu@gnu.org' (GNU's public announcements mailing list) about proprietary obsolescence (or planned obsolescence) [1]. After looking into it, I saw there is actually a Wikipedia page for this concept. Since it direclty relates to our Free software criteria, I thought its good to use this technical term there. [1] https://www.gnu.org/proprietary/proprietary-obsolescence.html [2] https://en.wikipedia.org/wiki/Planned_obsolescence
2020-12-04Comparison with Jupyter: added that different editors can be usedMohammad Akhlaghi-3/+3
I just remembered that in the paragraph we compare with Jupyter, another important point is that with based on the modularity principle, people can choose their favorite text editor and aren't limited to one. I also tried to remove redundant parts to avoid adding too many extra words.
2020-12-02Minor edits in newly added parts on statistical verificationMohammad Akhlaghi-2/+3
Thanks a lot Boud for adding that script in your own project and linking it here. Since the raw file (without context of the whole project) is very hard to understand for the users, I switched the URL to the navigable URL the link is actually on the filename. It will always show the most recent version of this script, not the particular snapshot of now. But infact that is better, since we can make it better and improve it over time. Maybe even by the end of this paper's referee review will be able to include it in Maneage's core branch. I also removed the link to this discussion at the first paragraph of Section IV (proof of concept). Since that is just the introduction, and going into this level of detail there could be confusing for the readers. Having the name of the script in the proper place is more direct and understandable for the readers. Thanks again Boud for the nice work on this ;-).
2020-12-02URL of statistical verificationBoud Roukema-2/+2
This commit adds the SWH URL of the statistical verification script to the paper and tidies up the corresponding answer in '1-answer.txt'. The script file includes more extensive documentation than the earlier 'make' version of the method.
2020-12-02Modularity in file structure discussed with other minor editsMohammad Akhlaghi-26/+50
While going through Mohammad-reza's recent two commits, I noticed that we had missed an importnat discussion on modularity in this version of the paper (discussing how file management should also be modular resulting in cheaper archival, and thus better longevity), so a few sentences were added under criteria 2 (Modularity). Mohammad-reza's edits were also generally very good and helped clarify many points. I only reset the part that we discuss the problems with POSIX, and not being able to produce bitwise reproducible software (which systems like Guix work very hard at, and thus need root permissions). I felt the edit missed the main point here (that while bitwise reproducibility of the software is good, it is not always necessary).
2020-12-02Modified POSIX related discussionsMohammadreza Khellat-15/+14
Before this commit, there were discussions in different sections related to POSIX compliance and features. Since the relevant Cmpleteness criterion has been changed to execution within a Unix-like OS, such dicussions had to be modifies as well. With this commit, the parts that were related to condition (1) of the Completeness criterion have been modified to be relevant to new Unix-like OS requirement. Also, few spelling problems were fixed.
2020-12-02Minor modification of Completeness criterion conditionsMohammadreza Khellat-6/+6
Before this commit, condition (1) for the Completeness criterion was referring to POSIX compliance. POSIX is a very detailed dynamic standard which goes under revision continuously and not a lot of operating systems, GNU/Linux included are completely/officially POSIX-compliant. Furthermore, not all sections of the huge 4000 pages standard are really important specifically to the current Maneage functionality. With this commit, condition (1) has been replaced by a looser condition of execution within a Unix-like OS. Also since the term environment might have been mistaken with the term "Operating Environment", it was replaced by the unmistakable term "environment variables" in conditions (3) and (5). Last but not least, condition (2) was made more restrict by adding ASCII encoding as the condition for storing the plain text files. TO-DO: POSIX could contain valuable ideas regarding portability of programming practices. These can be taken advantage of later in providing necessary and sufficient conditions for project completeness. Another idea could be to make LFS construct or something else as a sharp definition for what we mean by minimal Unix-like OS.
2020-12-01Imported recent work in Maneage, minor conflicts fixedMohammad Akhlaghi-3/+4
Some minor conflicts that came up during the merge were fixed.
2020-12-01Default paper: macros available for date of commits citedMohammad Akhlaghi-4/+7
Until now, Maneage only provided the commit hashes (of the project and Maneage) as LaTeX macros to use in your paper. However, they are too cryptic and not really human friendly (unless you have access to the Git history on a computer). With this commit, to make things easier for the readers, the date of both commits are also available as LaTeX macros for use in the paper. The date of the Maneage commit is also included in the acknowledgements. Also, the paragraph above the acknowledgements has been updated with better explanation on why adding this acknowledgement in the science papers is good/necessary.
2020-12-01IMPORTANT: organizational improvements in Maneage TeX sourcesMohammad Akhlaghi-170/+96
This only concerns the TeX sources in the default branch. In case you don't use them, there should only be a clean conflict in 'paper.tex' (that is obvious and easy to fix). Conflicts may only happen in some of the 'tex/src/preamble-*.tex' files if you have actually changed them for your project. But generally any conflict that does arise by this commit with your project branch should be very clear and easy to fix and test. In short, from now on things will even be easier: any LaTeX configuration that you want to do for your project can be done in 'tex/src/preamble-project.tex', so you don't have to worry about any other LaTeX preamble file. They are either templates (like the ones for PGFPlots and BibLaTeX) or low-level things directly related to Maneage. Until now, this distinction wasn't too clear. Here is a summary of the improvements: - Two new options to './project make': with '--highlight-new' and '--highlight-notes' it is now possible to activate highlighting on the command-line. Until now, there was a LaTeX macro for this at the start of 'paper.tex' (\highlightchanges). But changing that line would change the Git commit hash, making it hard for the readers to trust that this is the same PDF. With these two new run-time options, the printed commit hash will not changed. - paper.tex: the sentences are formatted as one sentence per line (and one line per sentence). This helps in version controlling narrative and following the changes per sentence. A description of this format (and its advantages) is also included in the default text. - The internal Maneage preambles have been modified: - 'tex/src/preamble-header.tex' and 'tex/src/preamble-style.tex' have been merged into one preamble file called 'tex/src/preamble-maneage-default-style.tex'. This helps a lot in simply removing it when you use a journal style file for example. - Things like the options to highlight parts of the text are now put in a special 'tex/src/preamble-maneage.tex'. This helps highlight that these are Maneage-specific features that are independent of the style used in the paper. - There is a new 'tex/src/preamble-project.tex' that is the place you can add your project-specific customizations.
2020-11-30Comments to help clarify the roles of input files in paper.texMohammad Akhlaghi-1/+4
These can help a first-time reader of 'paper.tex'.
2020-11-30New tex/src/preamble-maneage.tex for Maneage-only TeX customizationMohammad Akhlaghi-1/+2
Until now, the Maneage-only features of LaTeX where mixed with 'tex/src/preamble-project.tex' (which is reserved for project-specific things). But we want to move the highlighting features (that have started here) into the core Maneage branch, so its best for these Maneage-specific features to be in a Maneage-specific preamble file. With this commit, a hew 'tex/src/preamble-maneage.tex' has been created for this purpose and the highlighting modes have been put in there. In the process, I noticed that 'tex/src/preamble-project.tex' doesn't have a copyright! This has been corrected.
2020-11-30Summarized Roberto's CV, further summarized Raul's and Mohammad'sMohammad Akhlaghi-9/+6
Roberto sent me his summarized CV which is now being included and I also removed the extra statements about non-degree things from Raul and my own biography (like mentioning Gnuastro, and scientific interests). To be short, we are only mentioning degrees and positions. For Raul, I added his M.Sc institute.
2020-11-30Imported improved definition that is made better after discussionMohammad Akhlaghi-5/+5
After Mohammad-reza sent me his commit on an improved definition for longevity, we had an indepth discussion (through a video-conference) to avoid complexities in the terminology, while staying on point and word-count. In this commit/merge, I am including the improved version of the definition of longevity, and the newly added term "functionality" (instead of "usability" that Mohammad-reza was originally complaining to).
2020-11-30Minor edit in paragraph on execution timeMohammad Akhlaghi-6/+6
The paragraph was slightly shortened, while keeping the main points.
2020-11-30Rephrased longetivity definitionMohammadreza Khellat-3/+4
Before this commit, Longetivity was defined on the basis of the term usability. Although the scope and context of the term has been mentioned right after its use, this could have caused confusion with the keyword "usability" in the field of software engineering. With this commit, Longetivity definition has been rephrased in a way that it would not require "usability". Furthermore, since longetivity would logically require the availability of the machines and platforms during the time of re-use, this has been explicitly mentioned in the definition.
2020-11-28Shorter biography for RaulRaul Infante-Sainz-3/+2
Following Boud's great suggestion, I also summarized my CV to be less than 40 words.
2020-11-27Shorter biography for MohammadMohammad Akhlaghi-8/+7
Following Boud's great suggestion, I also summarized my CV to be less than 40 words.
2020-11-27Shorter CVs for boud+davidBoud Roukema-10/+4
This commit provides shorter CVs for me (Boud) + David in order to get closer to the 6500 word limit. Our CVs are the least significant part of the paper.
2020-11-27Merged with Boud's corrected answers (generally very similar)Mohammad Akhlaghi-7/+8
The only issue that still remains is how to address statistical reproducibility, and I am in touch with Boud to do this in the best way possible (it has been highlighted with '#####'s in the answers.
2020-11-26All the referee points have been answeredMohammad Akhlaghi-15/+17
There is an answer for all the referee points now. I also did some minor edits in the paper. But we are still over the limit by around 250 words. The only remaining point that is not yet addressed (and has '####' around it) is the discussion on parallelization and its effect on reproducibility.
2020-11-26All questions have now been responded toBoud Roukema-8/+9
This commit is intended to be submittable quality. Point 56 was removed, and the later points renumbered, because it was a point of Reviewer 5 described what we have done - it was not a criticism to respond do. :) The current word count (without abstract and references) is 6091.
2020-11-25Reviewer points 16 to 32Boud Roukema-6/+7
Copyediting of points 16 to 32 (paper.tex + peer-review/1-answer.txt) is done in this commit. TODO list: 2. paper lacking focus 9. tidy up README-hacking.md for appearance on website App B.G. similar to Figure ?? - ref missing 29. website: README-hacking.md and tutorial "on same page"
2020-11-25Reviewer points 1-15; appendix clickable linksBoud Roukema-50/+57
This commit updates "paper.tex" and "peer-review/1-answer.txt" for the first 15 (out of 59!) reviewer points, excluding points 2 (not yet done) and 9 (README-hacking.md needs tidying). A fix to "reproduce/analysis/make/paper.mk" for the links in the appendices is also done in this commit (the same algorithm as for paper.tex is added). The links in the appendices are not (yet) clickable.
2020-11-25Copyedit; no-abstract word count 6084Boud Roukema-31/+37
This commit tidies up minor aspects of the language in the text marked by "\new", e.g. a "wokflow" would be fine for Chinese cooking, but is a little off-topic for Maneage. :) The word count is reduced by about 7 words. I haven't yet got to the serious part: checked that we've responded to the referees' points, and completing the responses which we haven't yet done.
2020-11-23Minor edits and correctionsMohammad Akhlaghi-6/+6
Raul's added point on the answer to the referee was very good, so I edited it a little to be more clear (and removed his name). Also, after looking in a few parts of the text, I fixed a few typos.
2020-11-23Minor corrections to the final paper documentRaul Infante-Sainz-20/+17
With this commit, I make several minor changes to the text of the final paper. They are not important, but minor modifications like avoiding contractions (don't -> do not, and so on).
2020-11-23First draft of all the points addressed by the refereesMohammad Akhlaghi-83/+249
A new directory has been added at the top of the project's source called 'peer-review'. The raw reviews of the paper by the editors and referees has been added there as '1-review.txt'. All the main points raised by the referees have been listed in a numbered list and addressed (mostly) in '1-answers.txt'. The text of the paper now also includes all the implemented answers to the various points.
2020-11-20Highlighting changes can now be toggled at run-timeMohammad Akhlaghi-7/+0
Until now, the core Maneage 'paper.tex' had a '\highlightchanges' macro that defines two LaTeX macros: '\new' and '\tonote'. When '\highlightchanges' was defined, anything that was written within '\new' became dark green (highlighting new things that have been added). Also, anything that was written in '\tonote' was put within a '[]' and became dark red (to show that there is a note here that should be addressed later). When '\highlightchanges' wasn't defined, anything within the '\new' element would be black (like the rest of the text), and the things in '\tonote' would not be shown at all. Commenting the '\newcommand{\highlightchanges}{}' line within 'paper.tex' (to toggle the modes above) would create a different Git hash and has to be committed. But this different commit hash could create a false sense in the reader that other things have also been changed and the only way they could confirm was to actually go and look into the project history (which they will not usually have time to do, and thus won't be able to trust the two modes of the text). Also, the added highlights and the note highlights were bundeled together into one macro, so you couldn't only have one of them. With this commit, the choice of highlighting either one of the two is now done as two new run-time options to the './project' script (which are passed to the Makefiles, and written into the 'project.tex' file which is loaded into 'paper.tex'). In this way, we can generate two PDFs with the same Git commit (project's state): one with the selected highlights and another one without it. This issue actually came up for me while implementing the changes here: we need to submit one PDF to the journal/referees with highlights on the added features. But we also need to submit another PDF to arXiv and Zenodo without any highlights. If the PDFs have different commit hashes, the referees may associate it with other changes in any part of the work. For example https://oadoi.org/10.22541/au.159724632.29528907 that mentions "Another version of the manuscript was published on arXiv: 2006.03018", while the only difference was a few words in the abstract after the journal complained on the abstract word-count of our first submission (where the commit hashes matched with arXiv/Zenodo).
2020-11-15First edits on the newly added appendices in new formMohammad Akhlaghi-306/+336
With the optional appendices added recently to the paper, it was important to go through them and make them more fitting into the paper.
2020-11-04Appendix of long paper added, optionally we can disable itMohammad Akhlaghi-3/+997
Given the referee reports, after discussing with the editors of CiSE, we decided that it is important to include the complete appendix we had before that included a thorough review of existing tools and methods. However, the appendix will not be published in the paper (due to the strict word-count limit). It will only be used in the arXiv/Zenodo versions of the paper. This actually created a technical problem: we want the commit hash of the project source to remain the same when the paper is built with an appendix or without it. To fix this problem the choice of including an appendix has gone into the 'project' script as a run-time option called '--no-appendix'. So by default (when someone just runs './project make'), the PDF will have an appendix, but when we want to submit to the journal, or when the appendix isn't needed for a certain reason, we can use this new option. The appendix also has its own separate bibliography. Some other corrections made in this commit: 1. Some new references were added that had an '_' in their source, they were corrected in 'references.tex'. 2. I noticed that 'preamble-style.tex' is not actually used in this paper, so it has been deleted.
2020-09-14Add machine class related argument and fix small typosMohammadreza Khellat-14/+12
Before this commit, there were no arguments regarding machine related specifications in the manuscript. This was needed as Mohammad Akhlaghi came across a review of the artcile by Dylan Aïssi in which Dylan mentioned the need for discussing CPU architecture dependence in pursuing a long-trem archivable workflow. With this commit, the required argument has been added in Sec.IV POC: Maneage in the paragraph in which it is explained how 'macro files build the core skeleton of Maneage'. Furthermore, few typos in different places have been fixed and the 'pre-make-build.sh' has been updated with the latest fix in Maneage core project.
2020-09-04Minor correction in first sentence of abstractMohammad Akhlaghi-2/+1
This paper is generally about data analysis pipelines, so the abstract now starts with "Analysis pipelines" instead of "Reproducible workflows". I also noticed that the sentence was mistakenly broken into multiple lines.
2020-09-03Imported recent work in Maneage, minor conflicts fixedMohammad Akhlaghi-1/+0
Only two small conflicts came up: * The addition of the hardware architecture macro in 'paper.tex' (which was removed for now, but will be added as the referee has requested within the text). * The usage of "" around directory variables in 'paper.mk'.
2020-09-03Added example of DockerHub deleting unused Docker imagesMohammad Akhlaghi-1/+3
I saw this link today in the news (to be implemented from November 1st, 2020), and because it is directly related to this work, I added it. Many people assume that simply pushing a Docker image to DockerHub is enough to preserve it, but ignore how much it costs to maintain the storage and network capacity.
2020-08-27Machine architecture and byte-order available as LaTeX macroMohammadreza Khellat-13/+14
Until now, no machine-related specifications were being documented in the workflow. This information can become helpful when observing differences in the outcome of both software and analysis segments of the workflow by others (some software may behave differently based on host machine). With this commit, the host machine's 'hardware class' and 'byte-order' are collected and now available as LaTeX macros for the authors to use in the paper. Currently it is placed in the acknowledgments, right after mentioning the Maneage commit. Furthermore, the project and configuration scripts are now capable of dealing with input directory names that have SPACE (and other special characters) by putting them inside double-quotes. However, having spaces and metacharacters in the address of the build directory could cause build/install failure for some software source files which are beyond the control of Maneage. So we now check the user's given build directory string, and if the string has any '@', '#', '$', '%', '^', '&', '*', '(', ')', '+', ';', and ' ' (SPACE), it will ask the user to provide a different directory.