2 files changed, 1961 insertions, 0 deletions
diff --git a/peer-review/1-answer.txt b/peer-review/1-answer.txt
new file mode 100644
index 0000000..55be70a
--- /dev/null
+++ b/peer-review/1-answer.txt
@@ -0,0 +1,1173 @@
+1.  [EiC] Some reviewers request additions, and overview of other
+    tools.
+
+ANSWER: Indeed, there is already a large body work in various issues that
+have been touched upon in this paper. Before submitting the paper, we had
+already done a very comprehensive review of the tools (as you may notice
+from the Git repository[1]). However, the CiSE Author Information
+explicitly states: "The introduction should provide a modicum of background
+in one or two paragraphs, but should not attempt to give a literature
+review". This is the usual practice in previously published papers at CiSE
+and is in line with the maximum 6250 word-count and maximum of 12
+references to be used in bibliography.
+
+We agree with the need for this extensive review to be on the public record
+(creating the review took a lot of time and effort; most of the tools were
+run and tested). We discussed this with the editors and the following
+solution was agreed upon: the extended reviews will be published as a set
+of appendices in the arXiv[2] and Zenodo[3] pre-prints of this paper. These
+publicly available appendices are also mentioned in the submitted paper so
+that any interested reader of the final paper published by CiSE can easily
+access them.
+
+[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/tex/src/paper-long.tex#L1579
+[2] https://arxiv.org/abs/2006.03018
+[3] https://doi.org/10.5281/zenodo.3872247
+
+------------------------------
+
+
+
+
+
+2.  [Associate Editor] There are general concerns about the paper
+    lacking focus
+
+ANSWER: With all the corrections/clarifications that have been done in this
+review the focus of the paper should be clear now. We are very grateful to
+the thorough listing of points by the referees.
+
+------------------------------
+
+
+
+
+
+3.  [Associate Editor] Some terminology is not well-defined
+    (e.g. longevity).
+
+ANSWER: Reproducibility, Longevity and Usage have now been explicitly
+defined in the first paragraph of Section II. With this definition, the
+main argument of the paper is clearer, thank you (and thank you to the
+referees for highlighting this).
+
+------------------------------
+
+
+
+
+
+4.  [Associate Editor] The discussion of tools could benefit from some
+    categorization to characterize their longevity.
+
+ANSWER: The longevity of the general tools reviewed in Section II is now
+mentioned immediately after each (VMs, SHARE: discontinued in 2019;
+Docker: 6 months; python-dependent package managers: a few years;
+Jupyter notebooks: shortest longevity non-core python dependency).
+
+------------------------------
+
+
+
+
+
+5.  [Associate Editor] Background and related efforts need significant
+    improvement. (See below.)
+
+ANSWER: This has been done, as mentioned in (1.) above.
+
+------------------------------
+
+
+
+
+
+6.  [Associate Editor] There is consistency among the reviews that
+    related work is particularly lacking.
+
+ANSWER: This has been done, as mentioned in (1.) above.
+
+------------------------------
+
+
+
+
+
+7.  [Associate Editor] The current work needs to do a better job of
+    explaining how it deals with the nagging problem of running on CPU
+    vs. different architectures.
+
+ANSWER: The CPU architecture of the running system is now reported in
+the "Acknowledgments" section and a description of the problem and its
+solution in Maneage is also added and illustrated in the "Proof of
+concept: Maneage" Section.
+
+------------------------------
+
+
+
+
+
+8.  [Associate Editor] At least one review commented on the need to
+    include a discussion of continuous integration (CI) and its
+    potential to help identify problems running on different
+    architectures. Is CI employed in any way in the work presented in
+    this article?
+
+ANSWER: CI has been added in the discussion section (V) as one
+solution to find breaking points in operating system updates and
+new/different architectures. For the core Maneage branch, we have
+defined task #15741 [1] to add CI on many architectures in the near
+future.
+
+[1] http://savannah.nongnu.org/task/?15741
+
+------------------------------
+
+
+
+
+
+9.  [Associate Editor] The presentation of the Maneage tool is both
+    lacking in clarity and consistency with the public
+    information/documentation about the tool. While our review focus
+    is on the article, it is important that readers not be confused
+    when they visit your site to use your tools.
+
+ANSWER: Thank you for raising this important point. We have broken down the
+very long "About" page into multiple pages to help in readability:
+
+https://maneage.org/about.html
+
+Generally, the webpage will soon undergo major improvements to be even more
+clear. The website is developed on a public git repository
+(https://git.maneage.org/webpage.git), so any specific proposals for
+improvements can be handled efficiently and transparently and we welcome
+any feedback in this aspect.
+
+------------------------------
+
+
+
+
+
+10. [Associate Editor] A significant question raised by one review is
+    how this work compares to "executable" papers and Jupyter
+    notebooks.  Does this work embody similar/same design principles
+    or expand upon the established alternatives? In any event, a
+    discussion of this should be included in background/motivation and
+    related work to help readers understand the clear need for a new
+    approach, if this is being presented as new/novel.
+
+ANSWER: Thank you for highlighting this important point. We saw that
+it is necessary to contrast our Maneage proof-of-concept demonstration
+more directly against the Jupyter notebook type of approach. Two
+paragraphs have been added in Sections II and IV to clarify this (our
+criteria require and build in more modularity and longevity than
+Jupyter).
+
+
+------------------------------
+
+
+
+
+
+11. [Reviewer 1] Adding an explicit list of contributions would make
+    it easier to the reader to appreciate these. These are not
+    mentioned/cited and are highly relevant to this paper (in no
+    particular order):
+     1.  Git flows, both in general and in particular for research.
+     2.  Provenance work, in general and with git in particular
+     3.  Reprozip: https://www.reprozip.org/
+     4.  OCCAM: https://occam.cs.pitt.edu/
+     5.  Popper: http://getpopper.io/
+     6.  Whole Tale: https://wholetale.org/
+     7.  Snakemake: https://github.com/snakemake/snakemake
+     8.  CWL https://www.commonwl.org/ and WDL https://openwdl.org/
+     9.  Nextflow: https://www.nextflow.io/
+     10. Sumatra: https://pythonhosted.org/Sumatra/
+     11. Podman: https://podman.io
+     12. AppImage (https://appimage.org/)
+     13. Flatpack (https://flatpak.org/)
+     14. Snap (https://snapcraft.io/)
+     15. nbdev https://github.com/fastai/nbdev and jupytext
+     16. Bazel: https://bazel.build/
+     17. Debian reproducible builds: https://wiki.debian.org/ReproducibleBuilds
+
+ANSWER:
+
+1.  In Section IV, we have added that "Generally, any git flow (branching
+    strategies) can be used by the high-level project authors or future
+    readers."
+2.  We have mentioned research objects as one mode of provenance tracking
+    and the related provenance work that has already been done and can be
+    exploited using these criteria and our proof of concept is indeed very
+    large. However, the 6250 word-count limit is very tight and if we add
+    more on it in this length, we would have to remove points of higher priority.
+    Hopefully this can be the subject of a follow-up paper.
+3.  A review of ReproZip is in Appendix C.
+4.  A review of Occam is in Appendix C.
+5.  A review of Popper is in Appendix C.
+6.  A review of Whole Tale is in Appendix C.
+7.  A review of Snakemake is in Appendix B.
+8.  CWL and WDL are described in Appendix B (Job management).
+9.  Nextflow is described in Appendix B (Job management).
+10. Sumatra is described in Appendix C.
+11. Podman is mentioned in Appendix B (Containers).
+12. AppImage is mentioned in Appendix B (Package management).
+13. Flatpak is mentioned in Appendix B (Package management).
+14. Snap is mentioned in Appendix B (Package management).
+15. nbdev and jupytext are high-level tools to generate documentation and
+    packaging custom code in Conda or pypi. High-level package managers
+    like Conda and Pypi have already been thoroughly reviewed in Appendix A
+    for their longevity issues, so we feel that there is no need to
+    include these.
+16. Bazel is mentioned in Appendix B (job management).
+17. Debian's reproducible builds are only designed for ensuring that software
+    packaged for Debian is bitwise reproducible. As mentioned in the
+    discussion section of this paper, the bitwise reproducibility of software is
+    not an issue in the context discussed here; the reproducibility of the
+    relevant output data of the software is the main issue.
+
+------------------------------
+
+
+
+
+
+12. [Reviewer 1] Existing guidelines similar to the proposed "Criteria
+    for longevity". Many articles of these in the form "10 simple
+    rules for X", for example (not exhaustive list):
+     * https://doi.org/10.1371/journal.pcbi.1003285
+     * https://arxiv.org/abs/1810.08055
+     * https://osf.io/fsd7t/
+     * A model project for reproducible papers: https://arxiv.org/abs/1401.2000
+     * Executable/reproducible paper articles and original concepts
+
+ANSWER: Thank you for highlighting these points. Appendix C starts with a
+subsection titled "suggested rules, checklists or criteria" with a review of
+existing sets of criteria. This subsection includes the sources proposed
+by the reviewer [Sandve et al; Rule et al; Nust et al] (and others).
+
+ArXiv:1401.2000 has been added in Appendix B as an example paper using
+virtual machines. We thank the referee for bringing up this paper, because
+the link to the VM provided in the paper no longer works (the URL
+http://archive.comp-phys.org/provenance_challenge/provenance_machine.ova
+redirects to
+https://share.phys.ethz.ch//~alpsprovenance_challenge/provenance_machine.ova
+which gives a 'Not Found' html response). Together with SHARE, this very nicely
+highlights our main issue with binary containers or VMs: their lack of
+longevity.
+
+------------------------------
+
+
+
+
+
+13. [Reviewer 1] Several claims in the manuscript are not properly
+    justified, neither in the text nor via citation. Examples (not
+    exhaustive list):
+     1. "it is possible to precisely identify the Docker “images” that
+        are imported with their checksums, but that is rarely practiced
+        in most solutions that we have surveyed [which ones?]"
+     2. "Other OSes [which ones?] have similar issues because pre-built
+        binary files are large and expensive to maintain and archive."
+     3. "Researchers using free software tools have also already had
+        some exposure to it"
+     4. "A popular framework typically falls out of fashion and
+        requires significant resources to translate or rewrite every
+        few years."
+
+ANSWER: These points have been clarified in the highlighted parts of the text:
+
+1. Many examples have been given throughout the newly added
+   appendices. To avoid confusion in the main body of the paper, we
+   have removed the "we have surveyed" part. It is already mentioned
+   above this point in the text that a large survey of existing
+   methods/solutions is given in the appendices.
+
+2. Due to the thorough discussion of this issue in the appendices with
+   precise examples, this line has been removed to allow space for the
+   other points raised by the referees. The main point (high cost of
+   keeping binaries) is already abundantly clear.
+
+   On a similar topic, Dockerhub's recent announcement that inactive images
+   (for over 6 months) will be deleted has also been added. The announcemnt
+   URL is here (it was too long to include in the paper, if IEEE has a
+   special short-url format, we can add it):
+   https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates
+
+3. A small statement has been added, reminding the readers that almost all
+   free software projects are built with Make (CMake is popular, but it is just a
+   high-level wrapper over Make: it finally produces a 'Makefile'; practical
+   usage of CMake generally obliges the user to understand Make).
+
+4. The example of Python 2 has been added.
+
+
+------------------------------
+
+
+
+
+
+14. [Reviewer 1] As mentioned in the discussion by the authors, not
+    even Bash, Git or Make is reproducible, thus not even Maneage can
+    address the longevity requirements. One possible alternative is
+    the use of CI to ensure that papers are re-executable (several
+    papers have been written on this topic). Note that CI is
+    well-established technology (e.g. Jenkins is almost 10 years old).
+
+ANSWER: Thank you for raising these issues. We had initially planned to
+discuss CIs, but like many discussion points, we were forced to remove
+it before the first submission due to the very tight word-count limit. We
+have now added a sentence on CI in the discussion.
+
+On the issue of Bash/Git/Make, indeed, the _executable_ Bash, Git and
+Make binaries are not bitwise reproducible/identical on different
+systems. However, as mentioned in the discussion, we are concerned
+with the _output_ of the software's executable file, _after_ the
+execution of its job. We (or any user of Bash) is not interested in
+the executable file itself. The reproducibility of the binary file
+only becomes important if a significant bug is found (very rare for
+ordinary usage of such core software of the OS).  Hence, even though
+the compiled binary files of specific versions of Git, Bash or Make
+will not be bitwise reproducible/identical on different systems, their
+scientific outputs are exactly reproducible: 'git describe' or Bash's
+'for' loop will have the same output on GNU/Linux, macOS/Darwin or
+FreeBSD (despite having bit-wise different executables).
+
+------------------------------
+
+
+
+
+
+15. [Reviewer 1] Criterion has been proposed previously. Maneage itself
+    provides little novelty (see comments below).
+
+ANSWER: The previously suggested sets of criteria that were listed by
+Reviewer 1 are reviewed by us in the newly added Appendix C, and the
+novelty and advantages of our proposed criteria are contrasted there
+with the earlier sets of criteria.
+
+------------------------------
+
+
+
+
+
+16. [Reviewer 2] Authors should add indication that using good practices it
+    is possible to use Docker or VM to obtain identical OS usable for
+    reproducible research.
+
+ANSWER: In the submitted version we had stated that "Ideally, it is
+possible to precisely identify the Docker “images” that are imported with
+their checksums ...". But to be more clear and go directly to the point, it
+has been edited to explicity say "... to recreate an identical OS image
+later".
+
+------------------------------
+
+
+
+
+
+17. [Reviewer 2] The CPU architecture of the platform used to run the
+    workflow is not discussed in the manuscript. Authors should probably
+    take into account the architecture used in their workflow or at least
+    report it.
+
+ANSWER: Thank you very much for raising this important point. We hadn't
+seen other reproducibility papers mention this important point and missed
+it. In the acknowledgments (where we also mention the commit hashes) we now
+explicitly mention the exact CPU architecture used to build this paper:
+"This project was built on an x86_64 machine with Little Endian byte-order
+and address sizes 39 bits physical, 48 bits virtual.". This is because we
+have already seen cases where the architecture is the same, but programs
+fail because of the byte order.
+
+Generally, Maneage will now extract this information from the running
+system during its configuration phase and provide the users with three
+different LaTeX macros that they can use anywhere in their paper.
+
+------------------------------
+
+
+
+
+
+18. [Reviewer 2] I don’t understand the "no dependency beyond
+    POSIX". Authors should more explained what they mean by this sentence.
+
+ANSWER: This has been clarified with the short extra statement "a minimal
+Unix-like standard that is shared between many operating systems". We would
+have liked to explain this more, but the word limit is very constraining.
+
+------------------------------
+
+
+
+
+
+19. [Reviewer 2] Unfortunately, sometime we need proprietary or specialized
+    software to read raw data... For example in genetics, micro-array raw
+    data are stored in binary proprietary formats. To convert this data
+    into a plain text format, we need the proprietary software provided
+    with the measurement tool.
+
+ANSWER: Thank you very much for this good point. A description of a
+possible solution to this has been added after criterion 8.
+
+------------------------------
+
+
+
+
+
+20. [Reviewer 2] I was not able to properly set up a project with
+    Maneage. The configuration step failed during the download of tools
+    used in the workflow. This is probably due to a firewall/antivirus
+    restriction out of my control. How frequent this failure happen to
+    users?
+
+ANSWER: Thank you for mentioning this. This has been fixed by archiving all
+Maneage'd software on Zenodo (https://doi.org/10.5281/zenodo.3883409) and
+also downloading from there.
+
+Until recently we would directly access each software's own webpage to
+download the source files, and this caused frequent problems of this sort. In other
+cases, we were very frustrated when a software's webpage would temporarily
+be unavailable (for maintenance reasons); this would be a hindrance in
+trying to build new projects.
+
+Since all the software is free-licensed, we are legally allowed to
+re-distribute it (within the conditions, such as not removing copyright
+notices) and Zenodo is defined for long-term archival of
+academic digital objects, so we decided that a software source code
+repository on Zenodo would be the most reliable solution. At configure
+time, Maneage now accesses Zenodo's DOI and resolves the most recent
+URL to automatically download any necessary software source code that
+the project needs from there.
+
+Generally, we also keep all software in a Git repository on our own
+webpage: http://git.maneage.org/tarballs-software.git/tree. Also, Maneage
+users can also identify their own custom URLs for downloading software,
+which will be given higher priority than Zenodo (useful for situations when
+custom software is downloaded and built in a project branch (not the core
+'maneage' branch).
+
+------------------------------
+
+
+
+
+
+21. [Reviewer 2] The time to configure a new project is quite long because
+    everything needs to be compiled. Authors should compare the time
+    required to set up a project Maneage versus time used by other
+    workflows to give an indication to the readers.
+
+ANSWER: Thank you for raising this point. it takes about 1.5 hours to
+configure the default Maneage branch on an 8-core CPU (more than half of
+this time is devoted to GCC on GNU/Linux operating systems, and the
+building of GCC can optionally be disabled with the '--host-cc' option to
+significantly speed up the build when the host's GCC is
+similar). Furthermore, Maneage can be built within a Docker container.
+
+A paragraph has been added in Section IV on this issue (the
+build time and building within a Docker container). We have also defined
+task #15818 [1] to have our own core Docker image that is ready to build a
+Maneaged project and will be adding it shortly.
+
+[1] https://savannah.nongnu.org/task/index.php?15818
+
+------------------------------
+
+
+
+
+
+22. [Reviewer 3] Authors should define their use of the term [Replicability
+    or Reproducibility] briefly for their readers.
+
+ANSWER: "Reproducibility" has been defined along with "Longevity" and
+"usage" at the start of Section II.
+
+------------------------------
+
+
+
+
+
+23. [Reviewer 3] The introduction is consistent with the proposal of the
+    article, but deals with the tools separately, many of which can be used
+    together to minimize some of the problems presented. The use of
+    Ansible, Helm, among others, also helps in minimizing problems.
+
+ANSWER: Ansible and Helm are primarily designed for distributed
+computing. For example Helm is just a high-level package manager for a
+Kubernetes cluster that is based on containers. A review of them could be
+added to the Appendices, but we feel they this would distract somewhat
+from the main points of our current paper.
+
+------------------------------
+
+
+
+
+
+24. [Reviewer 3] When the authors use the Python example, I believe it is
+    interesting to point out that today version 2 has been discontinued by
+    the maintaining community, which creates another problem within the
+    perspective of the article.
+
+ANSWER: Thank you very much for highlighting this point. We had excluded
+this point for the sake of article length, but we have restored it in
+the introduction of the revised version.
+
+------------------------------
+
+
+
+
+
+25. [Reviewer 3] Regarding the use of VM's and containers, I believe that
+    the discussion presented by THAIN et al., 2015 is interesting to
+    increase essential points of the current work.
+
+ANSWER: Thank you very much for pointing out the works by Thain. We
+couldn't find any first-author papers in 2015, but found Meng & Thain
+(https://doi.org/10.1016/j.procs.2017.05.116) which had a related
+discussion of why they didn't use Docker containers in their work. That
+paper is now cited in the discussion of Containers in Appendix B.
+
+------------------------------
+
+
+
+
+
+26. [Reviewer 3] About the Singularity, the description article was missing
+    (Kurtzer GM, Sochat V, Bauer MW, 2017).
+
+ANSWER: Thank you for the reference. We are restricted in the main
+body of the paper due to the strict bibliography limit of 12
+references; we have included Kurtzer et al 2017 in Appendix B (where
+we discuss Singularity).
+
+------------------------------
+
+
+
+
+
+27. [Reviewer 3] I also believe that a reference to FAIR is interesting
+    (WILKINSON et al., 2016).
+
+ANSWER: The FAIR principles have been mentioned in the main body of the
+paper, but unfortunately we had to remove its citation in the main paper (like
+many others) to keep to the maximum of 12 references. We have cited it in
+Appendix C.
+
+------------------------------
+
+
+
+
+
+28. [Reviewer 3] In my opinion, the paragraph on IPOL seems to be out of
+    context with the previous ones. This issue of end-to-end
+    reproducibility of a publication could be better explored, which would
+    further enrich the tool presented.
+
+
+ANSWER: We agree and have removed the IPOL example from that section.
+We have included an in-depth discussion of IPOL in Appendix C and we
+comment on the readiness of Maneage'd projects for a similar level of
+peer-review control.
+
+------------------------------
+
+
+
+
+
+29. [Reviewer 3] On the project website, I suggest that the information
+    contained in README-hacking be presented on the same page as the
+    Tutorial. A topic breakdown is interesting, as the markdown reading may
+    be too long to find information.
+
+ANSWER: Thank you very much for this good suggestion, it has been
+implemented: https://maneage.org/about.html . The webpage will continuously
+be improved and such feedback is always very welcome.
+
+------------------------------
+
+
+
+
+
+31. [Reviewer 3] The tool is suitable for Unix users, keeping users away
+    from Microsoft environments.
+
+ANSWER: The issue of building on Windows has been discussed in Section IV,
+either using Docker (or VMs) or using the Windows Subsystem for Linux.
+
+------------------------------
+
+
+
+
+32. [Reviewer 3] Important references are missing; more references are
+    needed
+
+ANSWER: Two comprehensive Appendices have been added to address this issue.
+
+------------------------------
+
+
+
+
+
+33. [Reviewer 4] Revisit the criteria, show how you have come to decide on
+    them, give some examples of why they are important, and address
+    potential missing criteria.
+
+ANSWER: Our selection of the criteria and their importance are
+questions of the philosophy of science: "what is good science? what
+should reproducibility aim for?" We feel that completeness;
+modularity; minimal complexity; scalability; verifiability of inputs
+and outputs; recording of the project history; linking of narrative
+to analysis; and the right to use, modify, and redistribute
+scientific software in original or modified form; constitute a set
+of criteria that should uncontroversially be seen as "important"
+from a wide range of ethical, social, political, and economic
+perspectives.  An exception is probably the issue of proprietary
+versus free software (criterion 8), on which debate is far from
+closed.
+
+Within the constraints of space (the limit is 6500 words), we don't
+see how we could add more discussion of the history of our choice of
+criteria or more anecdotal examples of their relevance.
+
+We do discuss some alternatives lists of criteria in Appendix C.A,
+without debating the wider perspective of which criteria are the
+most desirable.
+
+------------------------------
+
+
+
+
+
+34. [Reviewer 4] Clarify the discussion of challenges to adoption and make
+    it clearer which tradeoffs are important to practitioners.
+
+ANSWER: We discuss many of these challenges and caveats in the Discussion
+Section (V), within the existing word limit.
+
+------------------------------
+
+
+
+
+
+35. [Reviewer 4] Be clearer about which sorts of research workflow are best
+    suited to this approach.
+
+ANSWER: Maneage is flexible enough to enable a wide range of
+workflows to be implemented. This is done by leveraging the
+highly modular and flexible nature of Makefiles run via 'Make'.
+
+------------------------------
+
+
+
+
+
+36. [Reviewer 4] There is also the challenge of mathematical
+    reproducibility, particularly of the handling of floating point number,
+    which might occur because of the way the code is written, and the
+    hardware architecture (including if code is optimised / parallelised).
+
+ANSWER: Floating point errors and optimizations have been mentioned in the
+discussion (Section V). The issue with parallelization has also been
+discussed in Section IV, in the part on verification ("Where exact
+reproducibility is not possible (for example due to parallelization),
+values can be verified by a statistical method specified by the project
+authors."). We have linked keywords in the latter sentence to a Software
+Heritage URI [1] with the specific file in a Maneage'd paper that
+illustrates an example of how statistical verification of parallelised code
+can work in practice (Peper & Roukema 2020; zenodo.4062460).
+
+We would be interested to hear if any other papers already exist that use
+automatic statistical verification of parallelised code as has been done in
+this Maneage'd paper.
+
+[1] https://archive.softwareheritage.org/browse/origin/content/?branch=refs/heads/postreferee_corrections&origin_url=https://codeberg.org/boud/elaphrocentre.git&path=reproduce/analysis/bash/verify-parameter-statistically.sh
+
+------------------------------
+
+
+
+
+
+37. [Reviewer 4] ... the handling of floating point number
+[reproducibility] ...  will come with a tradeoff agianst
+performance, which is never mentioned.
+
+ANSWER: The criteria we propose and the proof-of-concept with Maneage do
+not force the choice of a tradeoff between exact bitwise floating point
+reproducibility versus performance (e.g. speed). The specific concepts of
+"verification" and "reproducibility" will vary between domains of
+scientific computation, but we expect that the criteria allow this wide
+range.
+
+Performance is indeed an important issue for _immediate_ reproducibility
+and we would have liked to discuss it. But due to the strict word-count, we
+feel that adding it to the discussion points, without having adequate space
+to elaborate, can confuse the readers away from the focus of this paper
+(which is focused on long term usability). It has therefore not been added.
+
+------------------------------
+
+
+
+
+
+38. [Reviewer 4] Tradeoff, which might affect Criterion 3 is time to result,
+    people use popular frameworks because it is easier to use them.
+
+ANSWER: That is true. In section IV, we have given the time it takes to
+build Maneage (only once on each computer) to be around 1.5 hours on an
+8-core CPU (a typical machine that may be used for data analysis). We
+therefore conclude that when the analysis is complex (and thus taking many
+hours or days to complete), this time is negligible.
+
+But if the project's full analysis takes 10 minutes or less (like the
+extremely simple analysis done in this paper which takes a fraction of a
+second). Indeed, the 1.5 hour building time is significant. In those cases,
+as discussed in the main body, the project can be built once in a Docker
+image and easily moved to other computers.
+
+Generally, it is true that the initial configuration time (only once on
+each computer) of a Maneage install may discourage some scientists; but a
+serious scientific research project is never started and completed on a
+time scale of a few hours.
+
+------------------------------
+
+
+
+
+
+39. [Reviewer 4] I would liked to have seen explanation of how these
+    challenges to adoption were identified: was this anecdotal, through
+    surveys? participant observation?
+
+ANSWER: The results mentioned here are anecdotal: based on private
+discussions after holding multiple seminars and Webinars with RDA's
+support, and also a workshop that was planned for
+non-astronomers. We invited (funded) early career researchers to
+come to the workshop with the RDA funding.  However, that workshop
+was cancelled due to the pandemic and we had private communications
+instead.
+
+We would very much like to elaborate on this experience of training new
+researchers with these tools. However, as with many of the cases above, the
+very strict word-limit doesn't allow us to elaborate beyond what we have
+already written.
+
+------------------------------
+
+
+
+
+
+40. [Reviewer 4] Potentially an interesting sidebar to investigate how
+    LaTeX/TeX has ensured its longevity!
+
+ANSWER: That is indeed a very interesting subject to study (an obvious link
+is that LaTeX/TeX is very strongly based on plain text files). We have been
+in touch with Karl Berry (one of the core people behind TeX Live, who also
+plays a prominent role in GNU) and have whitnessed the TeX Live community's
+efforts to become more and more portable and longer-lived.
+
+However, as the reviewer states, this would be a sidebar, and we are
+constrained for space, so we couldn't find a place to highlight this. But
+it is indeed a subject worthy of a full paper (that can be very useful for
+many software projects0..
+
+------------------------------
+
+
+
+
+
+41. [Reviewer 4] The title is not specific enough - it should refer to the
+    reproducibility of workflows/projects.
+
+ANSWER: A problem here is that "workflow" and "project" taken in isolation
+risk being vague for wider audiences. Also, we aim at covering a wider
+range of aspects of a project than just than the workflow alone; in the
+other direction, the word "project" could be seen as too broad, including
+the funding, principal investigator, and team coordination.
+
+A specific title that might be appropriate could be, for example, "Towards
+long-term and archivable reproducibility of scientific computational
+research projects". Using a term proposed by one of our reviewers, "Towards
+long-term and archivable end-to-end reproducibility of scientific
+computational research projects" might also be appropriate.
+
+Nevertheless, we feel that in the context of an article published in CiSE,
+our current short title is sufficient.
+
+------------------------------
+
+
+
+
+
+42. [Reviewer 4] Whilst the thesis stated is valid, it may not be useful to
+    practitioners of computation science and engineering as it stands.
+
+ANSWER: This point appears to refer to floating point bitwise reproducibility
+and possibly to the conciseness of our paper. The former is fully allowed
+for, as stated above, though not obligatory, using the "verify.mk" rule
+file to (typically, but not obligatorily) force bitwise reproducibility.
+The latter is constrained by the 6500-word limit. The addition of appendices
+in the extended version may help respond to the latter point.
+
+The current small number of existing research projects using
+Maneage, as indicated in the revised version of our paper includes
+papers outside of observational astronomy (which is the first
+author's main background). The fact that the approach is precisely
+defined for computational science and engineering problems where
+_publication_ of the human-readable workflow source is also
+important may partly respond to this issue.
+
+------------------------------
+
+
+
+
+
+43. [Reviewer 4] Longevity is not defined.
+
+ANSWER: This has been defined now at the start of Section II.
+
+------------------------------
+
+
+
+
+
+44. [Reviewer 4] Whilst various tools are discussed and discarded, no
+    attempt is made to categorise the magnitude of longevity for which they
+    are relevant. For instance, environment isolators are regarded by the
+    software preservation community as adequate for timescale of the order
+    of years, but may not be suitable for the timescale of decades where
+    porting and emulation are used.
+
+ANSWER: Statements on quantifying the longevity of specific tools
+have been added in Section II. For example in the case of Docker
+images: "their longevity is determined by the host kernel, usually a
+decade", for Python packages: "Python installation with a usual
+longevity of a few years", for Nix/Guix: "with considerably better
+longevity; same as supported CPU architectures."
+
+------------------------------
+
+
+
+
+
+45. [Reviewer 4] The title of this section "Commonly used tools and their
+    longevity" is confusing - do you mean the longevity of the tools or the
+    longevity of the workflows that can be produced using these tools?
+    What happens if you use a combination of all four categories of tools?
+
+ANSWER: We have changed the section title to "Longevity of existing tools"
+to clarify that we refer to longevity of the tools.
+
+If the four categories of tools were combined, then the overall
+longevity would be that of the shortest intersection of the time
+spans over which the tools remained viable.
+
+------------------------------
+
+
+
+
+
+46. [Reviewer 4] It wasn't clear to me if code was being run to generate
+    the results and figures in a LaTeX paper that is part of a project in
+    Maneage. It appears to be suggested this is the case, but Figure 1
+    doesn't show how this works - it just has the LaTeX files, the data
+    files and the Makefiles. Is it being suggested that LaTeX itself is the
+    programming language, using its macro functionality?
+
+ANSWER: Thank you for highlighting this point of confusion. The caption of
+Figure 1 has been edited to hopefully clarify the point. In short, the
+arrows represent the operation of software on their inputs (the file they
+originate from) to generate their outputs (the file they point to). In the
+case of generating 'paper.pdf' from its three dependencies
+('references.tex', 'paper.tex' and 'project.tex'), yes, LaTeX is used. But
+in other steps, other tools are used. For example as you see in [1] the
+main step of the arrow connecting 'table-3.txt' to 'tools-per-year.txt' is
+an AWK command (there are also a few 'echo' commands for meta data and
+copyright in the output plain-text file [2]).
+
+[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/reproduce/analysis/make/demo-plot.mk#L51
+[2] https://zenodo.org/record/3911395/files/tools-per-year.txt
+
+------------------------------
+
+
+
+
+
+47. [Reviewer 4] I was a bit confused on how collaboration is handled as
+    well - this appears to be using the Git branching model, and the
+    suggestion that Maneage is keeping track of all components from all
+    projects - but what happens if you are working with collaborators that
+    are using their own Maneage instance?
+
+ANSWER: Indeed, Maneage operates based on the Git branching model. As
+mentioned in the text, Maneage is itself a Git branch. People create their
+own branch from the 'maneage' branch and start customizing it for their
+particular project in their own particular repository. They can also use
+all types of Git-based collaborating models to work together on a project
+that is not yet finished.
+
+Figure 2 in fact explicitly shows such a case: the main project leader is
+committing on the "project" branch. But a collaborator creates a separate
+branch over commit '01dd812' and makes a couple of commits ('f69e1f4' and
+'716b56b'), and finally asks the project leader to merge them into the
+project. This can be generalized to any Git based collaboration model.
+
+Recent experience by one of us [Roukema] found that a merge of a
+Maneage-based cosmology simulation project (now zenodo.4062460),
+after separate evolution of about 30-40 commits on maneage and
+possibly 100 on the project, needed about one day of straightforward
+effort, without any major difficulties.
+
+------------------------------
+
+
+
+
+
+48. [Reviewer 4] I would also [have] liked to have seen a comparison
+    between this approach and other "executable" paper approaches
+    e.g. Jupyter notebooks, compared on completeness, time taken to
+    write a "paper", ease of depositing in a repository, and ease of
+    use by another researcher.
+
+ANSWER: This type of sociological survey will make sense once the number of
+projects run with Maneage is sufficiently high. The time taken to write a
+paper should be measurable automatically: from the git history. The other
+parameters suggested would require cooperation from the scientists in
+responding to the survey, or will have to be collected anecdotally in the
+short term.
+
+------------------------------
+
+
+
+
+
+49. [Reviewer 4] The weakest aspect is the assumption that research can be
+    easily compartmentalized into simple and complete packages. Given that
+    so much of research involves collaboration and interaction, this is not
+    sufficiently addressed. In particular, the challenge of
+    interdisciplinary work, where there may not be common languages to
+    describe concepts and there may be different common workflow practices
+    will be a barrier to wider adoption of the primary thesis and criteria.
+
+ANSWER: Maneage was precisely defined to address the problem of
+publishing/collaborating on complete workflows. Hopefully with the
+clarification to point 47 above, this should also become clear.
+
+------------------------------
+
+
+
+
+
+50. [Reviewer 5] Major figures currently working in this exact field do not
+    have their work acknowledged in this work.
+
+ANSWER: This was due to the strict word limit and the CiSE
+publication policy (to not include a literature review because there
+is a limit of only 12 citations). But we had indeed already done a
+comprehensive literature review and the editors kindly agreed that
+we publish that review as appendices to the main paper on arXiv and
+Zenodo.
+
+------------------------------
+
+
+
+
+
+51. [Reviewer 5] Jimenez I et al ... 2017 "The popper convention: Making
+    reproducible systems evaluation practical ..." and the later
+    revision that uses GitHub Actions, is largely the same as this
+    work.
+
+ANSWER: This work and the proposed criteria are very different from
+Popper. We agree that VMs and containers are an important component
+of this field, and the appendices add depth to our discussion of this.
+However, these do not appear to satisfy all our proposed criteria.
+A detailed review of Popper, in particular, is given in Appendix C.
+
+------------------------------
+
+
+
+
+
+52. [Reviewer 5] The lack of attention to virtual machines and containers
+    is highly problematic. While a reader cannot rely on DockerHub or a
+    generic OS version label for a VM or container, these are some of the
+    most promising tools for offering true reproducibility.
+
+ANSWER: Containers and VMs have been more thoroughly discussed in
+the main body and also extensively discussed in appendix B (that are
+now available in the arXiv and Zenodo versions of this paper). As
+discussed (with many cited examples), Containers and VMs are only
+appropriate when they are themselves reproducible (for example, if
+running the Dockerfile this year and next year gives the same
+internal environment). However, we show that this is not the case in
+most solutions (a more comprehensive review would require its own
+paper).
+
+Moreover, with complete, robust environment builders like Maneage, Nix or GNU
+Guix, the analysis environment within a container can be exactly reproduced
+later. But even so, due to their binary nature and large storage volume,
+they are not trusable sources for the long term (it is expensive to archive
+them). We show several example in the paper of how projects that relied on
+VMs in 2011 and 2014 are no longer active, and how even Dockerhub will be
+deleting containers that are not used for more than 6 months in free
+accounts (due to the high storage costs).
+
+Furthermore, as a unique new feature, Maneage has the criterion of
+"Minimal complexity". This means that even if for any reason the
+project is not able to be run in the future, the content, analysis
+scripts, etc. are accessible for the interested reader since they
+are stored as plain text (only the development history - the git
+history - is storied in git's binary format). Unlike Nix or Guix,
+our approach doesn't need a third-party package package manager: the
+instructions for building all the software of a project are directly
+in the same project as the high-level analysis software. The full
+end-to-end process is transparent in our case, and the interested
+scientist can follow the analysis and study the different decisions
+of each step (why and how the analysis was done).
+
+------------------------------
+
+
+
+
+
+53. [Reviewer 5] On the data side, containers have the promise to manage
+    data sets and workflows completely [Lofstead J, Baker J, Younge A. Data
+    pallets: containerizing storage for reproducibility and
+    traceability. InInternational Conference on High Performance Computing
+    2019 Jun 16 (pp. 36-45). Springer, Cham.] Taufer has picked up this
+    work and has graduated a MS student working on this topic with a
+    published thesis. See also Jimenez's P-RECS workshop at HPDC for
+    additional work highly relevant to this paper.
+
+ANSWER: Thank you for the interesting paper by Lofstead+2019 on Data
+pallets. We have cited it in Appendix B as an example of how generic the
+concept of containers is.
+
+The topic of linking data to analysis is also a core result of the criteria
+presented here, and is also discussed briefly in our paper.  There are
+indeed many very interesting works on this topic. But the format of CiSE is
+very short (a maximum of ~6500 words with 12 references), so we don't have
+the space to go into this any further. But this is indeed a very
+interesting aspect for follow-up studies, especially as usage of
+Maneage grows, and we have more example workflows by users to study the
+linkage of data analysis.
+
+------------------------------
+
+
+
+
+
+54. [Reviewer 5] Some other systems that do similar things include:
+    reprozip, occam, whole tale, snakemake.
+
+ANSWER: All these tools have been reviewed in the newly added appendices.
+
+------------------------------
+
+
+
+
+
+55. [Reviewer 5] the paper needs to include the context of the current
+    community development level to be a complete research paper. A revision
+    that includes evaluation of (using the criteria) and comparison with
+    the suggested systems and a related work section that seriously
+    evaluates the work of the recommended authors, among others, would make
+    this paper worthy for publication.
+
+ANSWER: A thorough review of current low-level tools and and high-level
+reproducible workflow management systems has been added in the extended
+Appendices.
+
+------------------------------
+
+
+
+
+
+
+56. [Reviewer 5] Yet another example of a reproducible workflows project.
+
+ANSWER: As the newly added thorough comparisons with existing systems
+shows, these set of criteria and the proof-of-concept offer uniquely new
+features. As another referee summarized: "This manuscript describes a new
+reproducible workflow which doesn't require another new trendy high-level
+software. The proposed workflow is only based on low-level tools already
+widely known."
+
+The fact that we don't define yet another workflow language and framework
+and base the whole workflow on time-tested solutions in a framwork that
+costs only ~100 kB to archive (in contrast to multi-GB containers or VMs)
+is new.
+
+------------------------------
+
+
+
+
+
+57. [Reviewer 5] There are numerous examples, mostly domain specific, and
+    this one is not the most advanced general solution.
+
+ANSWER: As the comparisons in the appendices and clarifications above show,
+there are many features in the proposed criteria and proof of concept that
+are new and not satisfied by the domain-specific solutions known to us.
+
+------------------------------
+
+
+
+
+
+58. [Reviewer 5] Lack of context in the field missing very relevant work
+    that eliminates much, if not all, of the novelty of this work.
+
+ANSWER: The newly added appendices thoroughly describe the context and
+previous work that has been done in this field.
+
+------------------------------
diff --git a/peer-review/1-review.txt b/peer-review/1-review.txt
new file mode 100644
index 0000000..16e227b
--- /dev/null
+++ b/peer-review/1-review.txt
@@ -0,0 +1,788 @@
+From: cise computer org
+To: mohammad akhlaghi org,
+    infantesainz gmail com,
+    boud astro uni torun pl,
+    david valls-gabaud observatoiredeparis psl eu,
+    rbaena iac es
+Received: Tue, 22 Sep 2020 15:28:21 -0400
+Subject: Computing in Science and Engineering, CiSESI-2020-06-0048
+         major revision required
+
+--------------------------------------------------
+
+Computing in Science and Engineering,CiSESI-2020-06-0048
+"Towards Long-term and Archivable Reproducibility"
+manuscript type: Reproducible Research
+
+Dear Dr. Mohammad Akhlaghi,
+
+The manuscript that you submitted to Computing in Science and Engineering
+has completed the review process. After carefully examining the manuscript
+and reviews, we have decided that the manuscript needs major revisions
+before it can be considered for a second review.
+
+Your revision is due before 22-Oct-2020. Please note that if your paper was
+submitted to a special issue, this due date may be different. Contact the
+peer review administrator, Ms. Jessica Ingle, at cise computer.org if you
+have questions.
+
+The reviewer and editor comments are attached below for your
+reference. Please maintain our 6,250–word limit as you make your revisions.
+
+To upload your revision and summary of changes, log on to
+https://mc.manuscriptcentral.com/cise-cs, click on your Author Center, then
+"Manuscripts with Decisions."  Under "Actions," choose "Create a Revision"
+next to the manuscript number.
+
+Highlight the changes to your manuscript by using the track changes mode in
+MS Word, the latexdiff package if using LaTex, or by using bold or colored
+text.
+
+When submitting your revised manuscript, you will need to respond to the
+reviewer comments in the space provided.
+
+If you have questions regarding our policies or procedures, please refer to
+the magazines' Author Information page linked from the Instructions and
+Forms (top right corner of the ScholarOne Manuscripts screen) or you can
+contact me.
+
+We look forward to receiving your revised manuscript.
+
+Sincerely,
+Dr. Lorena A. Barba
+George Washington University
+Mechanical and Aerospace Engineering
+Editor-in-Chief, Computing in Science and Engineering
+
+--------------------------------------------------
+
+
+
+
+
+EiC comments:
+Some reviewers request additions, and overview of other tools, etc. In
+doing your revision, please remember space limitations: 6,250 words
+maximum, including all main body, abstract, keyword, bibliography (12
+references or less), and biography text. See "Write For Us" section of the
+website: https://www.computer.org/csdl/magazine/cs
+
+Comments of the Associate Editor: Associate Editor
+Comments to the Author: Thank to the authors for your submission to the
+Reproducible Research department.
+
+Thanks to the reviewers for your careful and thoughtful reviews. We would
+appreciate it if you can make your reports available and share the DOI as
+soon as possible, per our original invitation e-mail. We will follow up our
+original invitation to obtain your review DOI, if you have not already
+included it in your review comments.
+
+Based on the review feedback, there are a number of major issues that
+require attention and many minor ones as well. Please take these into
+account as you prepare your major revision for another round of
+review. (See the actual review reports for details.)
+
+1. In general, there are a number of presentation issues needing
+attention. There are general concerns about the paper lacking focus. Some
+terminology is not well-defined (e.g. longevity). In addition, the
+discussion of tools could benefit from some categorization to characterize
+their longevity. Background and related efforts need significant
+improvement. (See below.)
+
+2. There is consistency among the reviews that related work is particularly
+lacking and not taking into account major works that have been written on
+this topic. See the reviews for details about work that could potentially
+be included in the discussion and how the current work is positioned with
+respect to this work.
+
+3. The current work needs to do a better job of explaining how it deals
+with the nagging problem of running on CPU vs. different architectures.  At
+least one review commented on the need to include a discussion of
+continuous integration (CI) and its potential to help identify problems
+running on different architectures. Is CI employed in any way in the work
+presented in this article?
+
+4. The presentation of the Maneage tool is both lacking in clarity and
+consistency with the public information/documentation about the tool. While
+our review focus is on the article, it is important that readers not be
+confused when they visit your site to use your tools.
+
+5. A significant question raised by one review is how this work compares to
+"executable" papers and Jupyter notebooks.  Does this work embody
+similar/same design principles or expand upon the established alternatives?
+In any event, a discussion of this should be included in
+background/motivation and related work to help readers understand the clear
+need for a new approach, if this is being presented as new/novel.
+
+Reviews:
+
+Please note that some reviewers may have included additional comments in a
+separate file. If a review contains the note "see the attached file" under
+Section III A - Public Comments, you will need to log on to ScholarOne
+Manuscripts to view the file. After logging in, select the Author Center,
+click on the "Manuscripts with Decisions" queue and then click on the "view
+decision letter" link for this manuscript. You must scroll down to the very
+bottom of the letter to see the file(s), if any. This will open the file
+that the reviewer(s) or the Associate Editor included for you along with
+their review.
+
+--------------------------------------------------
+
+
+
+
+
+Reviewer: 1
+Recommendation: Author Should Prepare A Major Revision For A Second Review
+
+Comments:
+
+ * Adding an explicit list of contributions would make it easier to the
+   reader to appreciate these.
+
+ * These are not mentioned/cited and are highly relevant to this paper (in
+   no particular order):
+
+     * Git flows, both in general and in particular for research.
+     * Provenance work, in general and with git in particular
+     * Reprozip: https://www.reprozip.org/
+     * OCCAM: https://occam.cs.pitt.edu/
+     * Popper: http://getpopper.io/
+     * Whole Tale: https://wholetale.org/
+     * Snakemake: https://github.com/snakemake/snakemake
+     * CWL https://www.commonwl.org/ and WDL https://openwdl.org/
+     * Nextflow: https://www.nextflow.io/
+     * Sumatra: https://pythonhosted.org/Sumatra/
+     * Podman: https://podman.io
+     * AppImage (https://appimage.org/), Flatpack
+       (https://flatpak.org/), Snap (https://snapcraft.io/)
+     * nbdev https://github.com/fastai/nbdev  and jupytext
+     * Bazel: https://bazel.build/
+     * Debian reproducible builds: https://wiki.debian.org/ReproducibleBuilds
+
+     * Existing guidelines similar to the proposed "Criteria for
+       longevity". Many articles of these in the form "10 simple rules for
+       X", for example (not exhaustive list):
+          * https://doi.org/10.1371/journal.pcbi.1003285
+          * https://arxiv.org/abs/1810.08055
+          * https://osf.io/fsd7t/
+
+     * A model project for reproducible papers: https://arxiv.org/abs/1401.2000
+
+     * Executable/reproducible paper articles and original concepts
+
+ * Several claims in the manuscript are not properly justified, neither in
+   the text nor via citation. Examples (not exhaustive list):
+
+     * "it is possible to precisely identify the Docker “images” that are
+       imported with their checksums, but that is rarely practiced in most
+       solutions that we have surveyed [which ones?]"
+
+     * "Other OSes [which ones?] have similar issues because pre-built
+       binary files are large and expensive to maintain and archive."
+
+     * "Researchers using free software tools have also already had some
+       exposure to it"
+
+     * "A popular framework typically falls out of fashion and requires
+       significant resources to translate or rewrite every few years."
+
+ * As mentioned in the discussion by the authors, not even Bash, Git or
+   Make is reproducible, thus not even Maneage can address the longevity
+   requirements. One possible alternative is the use of CI to ensure that
+   papers are re-executable (several papers have been written on this
+   topic). Note that CI is well-established technology (e.g. Jenkins is
+   almost 10 years old).
+
+Additional Questions:
+
+1. How relevant is this manuscript to the readers of this periodical?
+   Please explain your rating in the Detailed Comments section.: Very
+   Relevant
+
+2. To what extent is this manuscript relevant to readers around the world?:
+   The manuscript is of interest to readers throughout the world
+
+1. Please summarize what you view as the key point(s) of the manuscript and
+   the importance of the content to the readers of this periodical.: This
+   article introduces desiderata for long-term archivable reproduciblity
+   and presents Maneage, a system whose goal is to achieve these outlined
+   properties.
+
+2. Is the manuscript technically sound? Please explain your answer in the
+   Detailed Comments section.: Partially
+
+3. What do you see as this manuscript's contribution to the literature in
+   this field?: Presentation of Maneage
+
+4. What do you see as the strongest aspect of this manuscript?: A great
+   summary of Maneage, as well as its implementaiton.
+
+5. What do you see as the weakest aspect of this manuscript?: Criterion has
+   been proposed previously. Maneage itself provides little novelty (see
+   comments below).
+
+1. Does the manuscript contain title, abstract, and/or keywords?: Yes
+
+2. Are the title, abstract, and keywords appropriate? Please elaborate in
+   the Detailed Comments section.: Yes
+
+3. Does the manuscript contain sufficient and appropriate references
+   (maximum 12-unless the article is a survey or tutorial in scope)? Please
+   elaborate in the Detailed Comments section.: Important references are
+   missing; more references are needed
+
+4. Does the introduction clearly state a valid thesis? Please explain your
+   answer in the Detailed Comments section.: Could be improved
+
+5. How would you rate the organization of the manuscript? Please elaborate
+   in the Detailed Comments section.: Satisfactory
+
+6. Is the manuscript focused? Please elaborate in the Detailed Comments
+   section.: Satisfactory
+
+7. Is the length of the manuscript appropriate for the topic? Please
+   elaborate in the Detailed Comments section.: Satisfactory
+
+8. Please rate and comment on the readability of this manuscript in the
+   Detailed Comments section.: Easy to read
+
+9. Please rate and comment on the timeliness and long term interest of this
+   manuscript to CiSE readers in the Detailed Comments section. Select all
+   that apply.: Topic and content are of limited interest to CiSE readers.
+
+Please rate the manuscript. Explain your choice in the Detailed Comments
+section.: Good
+
+--------------------------------------------------
+
+
+
+
+
+Reviewer: 2
+Recommendation: Accept If Certain Minor Revisions Are Made
+
+Comments: https://doi.org/10.22541/au.159724632.29528907
+
+Operating System: Authors mention that Docker is usually used with an image
+of Ubuntu without precision about the version used. And Even if users take
+care about the version, the image is updated monthly thus the image used
+will have different OS components based on the generation time. This
+difference in OS components will interfere on the reproducibility. I agree
+on that, but I would like to add that it is a wrong habit of users. It is
+possible to generate reproducible Docker images by generating it from an
+ISO image of the OS. These ISO images are archived, at least for Ubuntu
+(http://old-releases.ubuntu.com/releases) and for Debian
+(https://cdimage.debian.org/mirror/cdimage/archive) thus allow users to
+generate an OS with identical components. Combined with the
+snapshot.debian.org service, it is even possible to update a Debian release
+to a specific time point up to 2005 and with a precision of six hours. With
+combination of both ISO image and snapshot.debian.org service it is
+possible to obtain an OS for Docker or for a VM with identical components
+even if users have to use the PM of the OS. Authors should add indication
+that using good practices it is possible to use Docker or VM to obtain
+identical OS usable for reproducible research.
+
+CPU architecture: The CPU architecture of the platform used to run the
+workflow is not discussed in the manuscript. During software integration in
+Debian, I have seen several software failing their unit tests due to
+different behavior from itself or from a library dependency. This not
+expected behavior was only present on non-x86 architectures, mainly because
+developers use a x86 machine for their developments and tests. Bug or
+feature? I don’t know, but nowadays, it is quite frequent to see computers
+with a non-x86 CPU. It would be annoying to fail the reproducibility step
+because of a different in CPU architecture. Authors should probably take
+into account the architecture used in their workflow or at least report it.
+
+POSIX dependency: I don’t understand the "no dependency beyond
+POSIX". Authors should more explained what they mean by this sentence. I
+completely agree that the dependency hell must be avoided and dependencies
+should be used with parsimony. Unfortunately, sometime we need proprietary
+or specialized software to read raw data.  For example in genetics,
+micro-array raw data are stored in binary proprietary formats. To convert
+this data into a plain text format, we need the proprietary software
+provided with the measurement tool.
+
+Maneage: I was not able to properly set up a project with Maneage. The
+configuration step failed during the download of tools used in the
+workflow. This is probably due to a firewall/antivirus restriction out of
+my control. How frequent this failure happen to users? Moreover, the time
+to configure a new project is quite long because everything needs to be
+compiled. Authors should compare the time required to set up a project
+Maneage versus time used by other workflows to give an indication to the
+readers.
+
+Disclaimer: For the sake of transparency, it should be noted that I am
+involved in the development of Debian, thus my comments are probably
+oriented.
+
+Additional Questions:
+
+1. How relevant is this manuscript to the readers of this periodical?
+   Please explain your rating in the Detailed Comments section.: Relevant
+
+2. To what extent is this manuscript relevant to readers around the world?:
+   The manuscript is of interest to readers throughout the world
+
+1. Please summarize what you view as the key point(s) of the manuscript and
+   the importance of the content to the readers of this periodical.: The
+   authors describe briefly the history of solutions proposed by
+   researchers to generate reproducible workflows. Then, they report the
+   problems with the current tools used to tackle the reproducible
+   problem. They propose a set of criteria to develop new reproducible
+   workflows and finally they describe their proof of concept workflow
+   called "Maneage". This manuscript could help researchers to improve
+   their workflow to obtain reproducible results.
+
+2. Is the manuscript technically sound? Please explain your answer in the
+   Detailed Comments section.: Yes
+
+3. What do you see as this manuscript's contribution to the literature in
+   this field?: The authors try to propose a simple answer to the
+   reproducibility problem by defining new criteria. They also propose a
+   proof of concept workflow which can be directly used by researchers for
+   their projects.
+
+4. What do you see as the strongest aspect of this manuscript?: This
+   manuscript describes a new reproducible workflow which doesn't require
+   another new trendy high-level software. The proposed workflow is only
+   based on low-level tools already widely known. Moreover, the workflow
+   takes into account the version of all software used in the chain of
+   dependencies.
+
+5. What do you see as the weakest aspect of this manuscript?: Authors don't
+   discuss the problem of results reproducibility when analysis are
+   performed using CPU with different architectures. Some libraries have
+   different behaviors when they ran on different architectures and it
+   could influence final results. Authors are probably talking about x86,
+   but there is no reference at all in the manuscript.
+
+1. Does the manuscript contain title, abstract, and/or keywords?: Yes
+
+2. Are the title, abstract, and keywords appropriate? Please elaborate in
+   the Detailed Comments section.: Yes
+
+3. Does the manuscript contain sufficient and appropriate references
+   (maximum 12-unless the article is a survey or tutorial in scope)? Please
+   elaborate in the Detailed Comments section.: References are sufficient
+   and appropriate
+
+4. Does the introduction clearly state a valid thesis? Please explain your
+   answer in the Detailed Comments section.: Yes
+
+5. How would you rate the organization of the manuscript? Please elaborate
+   in the Detailed Comments section.: Satisfactory
+
+6. Is the manuscript focused? Please elaborate in the Detailed Comments
+   section.: Satisfactory
+
+7. Is the length of the manuscript appropriate for the topic? Please
+   elaborate in the Detailed Comments section.: Satisfactory
+
+8. Please rate and comment on the readability of this manuscript in the
+   Detailed Comments section.: Easy to read
+
+9. Please rate and comment on the timeliness and long term interest of this
+   manuscript to CiSE readers in the Detailed Comments section. Select all
+   that apply.: Topic and content are of immediate and continuing interest
+   to CiSE readers
+
+Please rate the manuscript. Explain your choice in the Detailed Comments
+section.: Good
+
+--------------------------------------------------
+
+
+
+
+
+Reviewer: 3
+Recommendation: Accept If Certain Minor Revisions Are Made
+
+Comments: Longevity of workflows in a project is one of the problems for
+reproducibility in different fields of computational research. Therefore, a
+proposal that seeks to guarantee this longevity becomes relevant for the
+entire community, especially when it is based on free software and is easy
+to access and implement.
+
+GOODMAN et al., 2016, BARBA, 2018 and PLESSER, 2018 observed in their
+research that the terms reproducibility and replicability are frequently
+found in the scientific literature and their use interchangeably ends up
+generating confusion due to the authors' lack of clarity. Thus, authors
+should define their use of the term briefly for their readers.
+
+The introduction is consistent with the proposal of the article, but deals
+with the tools separately, many of which can be used together to minimize
+some of the problems presented. The use of Ansible, Helm, among others,
+also helps in minimizing problems. When the authors use the Python example,
+I believe it is interesting to point out that today version 2 has been
+discontinued by the maintaining community, which creates another problem
+within the perspective of the article. Regarding the use of VM's and
+containers, I believe that the discussion presented by THAIN et al., 2015
+is interesting to increase essential points of the current work. About the
+Singularity, the description article was missing (Kurtzer GM, Sochat V,
+Bauer MW, 2017). I also believe that a reference to FAIR is interesting
+(WILKINSON et al., 2016).
+
+In my opinion, the paragraph on IPOL seems to be out of context with the
+previous ones. This issue of end-to-end reproducibility of a publication
+could be better explored, which would further enrich the tool presented.
+
+The presentation of the longevity criteria was adequate in the context of
+the article and explored the points that were dealt with later.
+
+The presentation of the tool was consistent. On the project website, I
+suggest that the information contained in README-hacking be presented on
+the same page as the Tutorial. A topic breakdown is interesting, as the
+markdown reading may be too long to find information.
+
+Additional Questions:
+
+1. How relevant is this manuscript to the readers of this periodical?
+   Please explain your rating in the Detailed Comments section.: Relevant
+
+2. To what extent is this manuscript relevant to readers around the world?:
+   The manuscript is of interest to readers throughout the world
+
+1. Please summarize what you view as the key point(s) of the manuscript and
+   the importance of the content to the readers of this periodical.: In
+   this article, the authors discuss the problem of the longevity of
+   computational workflows, presenting what they consider to be criteria
+   for longevity and an implementation based on these criteria, called
+   Maneage, seeking to ensure a long lifespan for analysis projects.
+
+2. Is the manuscript technically sound? Please explain your answer in the
+   Detailed Comments section.: Yes
+
+3. What do you see as this manuscript's contribution to the literature in
+   this field?: In this article, the authors discuss the problem of the
+   longevity of computational workflows, presenting what they consider to
+   be criteria for longevity and an implementation based on these criteria,
+   called Maneage, seeking to ensure a long lifespan for analysis projects.
+
+   As a key point, the authors enumerate quite clear criteria that can
+   guarantee the longevity of projects and present a free software-based
+   way of achieving this objective. The method presented by the authors is
+   not easy to implement for many end users, with low computer knowledge,
+   but it can be easily implemented by users with average knowledge in the
+   area.
+
+4. What do you see as the strongest aspect of this manuscript?: One of the
+   strengths of the manuscript is the implementation of Maneage entirely in
+   free software and the search for completeness presented in the
+   manuscript. The use of GNU software adds the guarantee of long
+   maintenance by one of the largest existing software communities. In
+   addition, the tool developed has already been tested in different
+   publications, showing itself consistent in different scenarios.
+
+5. What do you see as the weakest aspect of this manuscript?: For the
+   proper functioning of the proposed tool, the user needs prior knowledge
+   of LaTeX, GIT and the command line, which can keep inexperienced users
+   away. Likewise, the tool is suitable for Unix users, keeping users away
+   from Microsoft environments.
+
+   Even though Unix-like environments are the majority in the areas of
+   scientific computing, many users still perform their analysis in
+   different areas on Windows computers or servers, with the assistance of
+   package managers.
+
+1. Does the manuscript contain title, abstract, and/or keywords?: Yes
+
+2. Are the title, abstract, and keywords appropriate? Please elaborate in
+   the Detailed Comments section.: Yes
+
+3. Does the manuscript contain sufficient and appropriate references
+   (maximum 12-unless the article is a survey or tutorial in scope)? Please
+   elaborate in the Detailed Comments section.: Important references are
+   missing; more references are needed
+
+4. Does the introduction clearly state a valid thesis? Please explain your
+   answer in the Detailed Comments section.: Could be improved
+
+5. How would you rate the organization of the manuscript? Please elaborate
+   in the Detailed Comments section.: Satisfactory
+
+6. Is the manuscript focused? Please elaborate in the Detailed Comments
+   section.: Could be improved
+
+7. Is the length of the manuscript appropriate for the topic? Please
+   elaborate in the Detailed Comments section.: Satisfactory
+
+8. Please rate and comment on the readability of this manuscript in the
+   Detailed Comments section.: Easy to read
+
+9. Please rate and comment on the timeliness and long term interest of this
+   manuscript to CiSE readers in the Detailed Comments section. Select all
+   that apply.: Topic and content are of immediate and continuing interest
+   to CiSE readers
+
+Please rate the manuscript. Explain your choice in the Detailed Comments
+section.: Excellent
+
+--------------------------------------------------
+
+
+
+
+
+Reviewer: 4
+Recommendation: Author Should Prepare A Major Revision For A Second Review
+
+Comments: Overall evaluation - Good.
+
+This paper is in scope, and the topic is of interest to the readers of
+CiSE. However in its present form, I have concerns about whether the paper
+presents enough new contributions to the area in a way that can then be
+understood and reused by others. The main things I believe need addressing
+are: 1) Revisit the criteria, show how you have come to decide on them,
+give some examples of why they are important, and address potential missing
+criteria. 2) Clarify the discussion of challenges to adoption and make it
+clearer which tradeoffs are important to practitioners. 3) Be clearer about
+which sorts of research workflow are best suited to this approach.
+
+B2.Technical soundness: here I am discussing the soundness of the paper,
+rather than the soundness of the Maneage tool. There are some fundamental
+additional challenges to reproducibility that are not addressed. Although
+software library versions are addressed, there is also the challenge of
+mathematical reproducibility, particularly of the handling of floating
+point number, which might occur because of the way the code is written, and
+the hardware architecture (including if code is optimised /
+parallelised). This could obviously be addressed through a criterion around
+how code is written, but this will also come with a tradeoff against
+performance, which is never mentioned. Another tradeoff, which might affect
+Criterion 3 is time to result - people use popular frameworks because it is
+easier to use them.  Regarding the discussion, I would liked to have seen
+explanation of how these challenges to adoption were identified: was this
+anecdotal, through surveys. participant observation?  As a side note around
+the technical aspects of Maneage - it is using LaTeX which in turn is built
+on TeX which in turn has had many portability problems in the past due to
+being written using WEB / Tangle, though with web2c this is largely now
+resolved - potentially an interesting sidebar to investigate how LaTeX/TeX
+has ensured its longevity!
+
+C2. The title is not specific enough - it should refer to the
+reproducibility of workflows/projects.
+
+C4. As noted above, whilst the thesis stated is valid, it may not be useful
+to practitioners of computation science and engineering as it stands.
+
+C6. Manuscript focus. I would have liked a more focussed approach to the
+presentation of information in II. Longevity is not defined, and whilst
+various tools are discussed and discarded, no attempt is made to categorise
+the magnitude of longevity for which they are relevant. For instance,
+environment isolators are regarded by the software preservation community
+as adequate for timescale of the order of years, but may not be suitable
+for the timescale of decades where porting and emulation are used. The
+title of this section "Commonly used tools and their longevity" is also
+confusing - do you mean the longevity of the tools or the longevity of the
+workflows that can be produced using these tools? What happens if you use a
+combination of all four categories of tools?
+
+C8. Readability. I found it difficult to follow the description of how
+Maneage works. It wasn't clear to me if code was being run to generate the
+results and figures in a LaTeX paper that is part of a project in
+Maneage. It appears to be suggested this is the case, but Figure 1 doesn't
+show how this works - it just has the LaTeX files, the data files and the
+Makefiles. Is it being suggested that LaTeX itself is the programming
+language, using its macro functionality? I was a bit confused on how
+collaboration is handled as well - this appears to be using the Git
+branching model, and the suggestion that Maneage is keeping track of all
+components from all projects - but what happens if you are working with
+collaborators that are using their own Maneage instance?
+
+I would also liked to have seen a comparison between this approach and
+other "executable" paper approaches e.g. Jupyter notebooks, compared on
+completeness, time taken to write a "paper", ease of depositing in a
+repository, and ease of use by another researcher.
+
+Additional Questions:
+
+1. How relevant is this manuscript to the readers of this periodical?
+   Please explain your rating in the Detailed Comments section.: Relevant
+
+2. To what extent is this manuscript relevant to readers around the world?:
+   The manuscript is of interest to readers throughout the world
+
+1. Please summarize what you view as the key point(s) of the manuscript and
+   the importance of the content to the readers of this periodical.: This
+   manuscript discusses the challenges of reproducibility of computational
+   research workflows, suggests criteria for improving the "longevity" of
+   workflows, describes the proof-of-concept tool, Maneage, that has been
+   built to implement these criteria, and discusses the challenges to
+   adoption.
+
+   Of primary importance is the discussion of the challenges to adoption,
+   as CiSE is about computational science which does not take place in a
+   theoretical vacuum. Many of the identified challenges relate to the
+   practice of computational science and the implementation of systems in
+   the real world.
+
+2. Is the manuscript technically sound? Please explain your answer in the
+   Detailed Comments section.: Partially
+
+3. What do you see as this manuscript's contribution to the literature in
+   this field?: The manuscript makes a modest contribution to the
+   literature through the description of the proof-of-concept, in
+   particular its approach to integrating asset management, version control
+   and build and the discussion of challenges to adoption.
+
+   The proposed criteria have mostly been discussed at length in many other
+   works looking at computational reproducibility and executable papers.
+
+4. What do you see as the strongest aspect of this manuscript?: The
+   strongest aspect is the discussion of difficulties for widespread
+   adoption of this sort of approach. Because the proof-of-concept tool
+   received support through the RDA, it was possible to get feedback from
+   researchers who were likely to use it. This has highlighted and
+   reinforced a number of challenges and caveats.
+
+5. What do you see as the weakest aspect of this manuscript?: The weakest
+   aspect is the assumption that research can be easily compartmentalized
+   into simple and complete packages. Given that so much of research
+   involves collaboration and interaction, this is not sufficiently
+   addressed. In particular, the challenge of interdisciplinary work, where
+   there may not be common languages to describe concepts and there may be
+   different common workflow practices will be a barrier to wider adoption
+   of the primary thesis and criteria.
+
+1. Does the manuscript contain title, abstract, and/or keywords?: Yes
+
+2. Are the title, abstract, and keywords appropriate? Please elaborate in
+   the Detailed Comments section.: No
+
+3. Does the manuscript contain sufficient and appropriate references
+   (maximum 12-unless the article is a survey or tutorial in scope)? Please
+   elaborate in the Detailed Comments section.: References are sufficient
+   and appropriate
+
+4. Does the introduction clearly state a valid thesis? Please explain your
+   answer in the Detailed Comments section.: Could be improved
+
+5. How would you rate the organization of the manuscript? Please elaborate
+   in the Detailed Comments section.: Satisfactory
+
+6. Is the manuscript focused? Please elaborate in the Detailed Comments
+   section.: Could be improved
+
+7. Is the length of the manuscript appropriate for the topic? Please
+   elaborate in the Detailed Comments section.: Satisfactory
+
+8. Please rate and comment on the readability of this manuscript in the
+   Detailed Comments section.: Readable - but requires some effort to
+   understand
+
+9. Please rate and comment on the timeliness and long term interest of this
+   manuscript to CiSE readers in the Detailed Comments section. Select all
+   that apply.: Topic and content are of immediate and continuing interest
+   to CiSE readers
+
+Please rate the manuscript. Explain your choice in the Detailed Comments
+section.: Good
+
+--------------------------------------------------
+
+
+
+
+
+Reviewer: 5
+Recommendation: Author Should Prepare A Major Revision For A Second Review
+
+Comments:
+
+Major figures currently working in this exact field do not have their work
+acknowledged in this work. In no particular order: Victoria Stodden,
+Michael Heroux, Michela Taufer, and Ivo Jimenez. All of these authors have
+multiple publications that are highly relevant to this paper. In the case
+of Ivo Jimenez, his Popper work [Jimenez I, Sevilla M, Watkins N, Maltzahn
+C, Lofstead J, Mohror K, Arpaci-Dusseau A, Arpaci-Dusseau R. The popper
+convention: Making reproducible systems evaluation practical. In2017 IEEE
+International Parallel and Distributed Processing Symposium Workshops
+(IPDPSW) 2017 May 29 (pp. 1561-1570). IEEE.] and the later revision that
+uses GitHub Actions, is largely the same as this work. The lack of
+attention to virtual machines and containers is highly problematic. While a
+reader cannot rely on DockerHub or a generic OS version label for a VM or
+container, these are some of the most promising tools for offering true
+reproducibility. On the data side, containers have the promise to manage
+data sets and workflows completely [Lofstead J, Baker J, Younge A. Data
+pallets: containerizing storage for reproducibility and
+traceability. InInternational Conference on High Performance Computing 2019
+Jun 16 (pp. 36-45). Springer, Cham.] Taufer has picked up this work and has
+graduated a MS student working on this topic with a published thesis. See
+also Jimenez's P-RECS workshop at HPDC for additional work highly relevant
+to this paper.
+
+Some other systems that do similar things include: reprozip, occam, whole
+tale, snakemake.
+
+While the work here is a good start, the paper needs to include the context
+of the current community development level to be a complete research
+paper. A revision that includes evaluation of (using the criteria) and
+comparison with the suggested systems and a related work section that
+seriously evaluates the work of the recommended authors, among others,
+would make this paper worthy for publication.
+
+Additional Questions:
+
+1. How relevant is this manuscript to the readers of this periodical?
+   Please explain your rating in the Detailed Comments section.: Very
+   Relevant
+
+2. To what extent is this manuscript relevant to readers around the world?:
+   The manuscript is of interest to readers throughout the world
+
+1. Please summarize what you view as the key point(s) of the manuscript and
+   the importance of the content to the readers of this periodical.: This
+   paper describes the Maneage system for reproducibile workflows. It lays
+   out a bit of the need, has very limited related work, and offers
+   criteria any system that offers reproducibility should have, and finally
+   describes how Maneage achieves these goals.
+
+2. Is the manuscript technically sound? Please explain your answer in the
+   Detailed Comments section.: Partially
+
+3. What do you see as this manuscript's contribution to the literature in
+   this field?: Yet another example of a reproducible workflows
+   project. There are numerous examples, mostly domain specific, and this
+   one is not the most advanced general solution.
+
+4. What do you see as the strongest aspect of this manuscript?: Working
+   code and published artifacts
+
+5. What do you see as the weakest aspect of this manuscript?: Lack of
+   context in the field missing very relevant work that eliminates much, if
+   not all, of the novelty of this work.
+
+1. Does the manuscript contain title, abstract, and/or keywords?: Yes
+
+2. Are the title, abstract, and keywords appropriate? Please elaborate in
+   the Detailed Comments section.: Yes
+
+3. Does the manuscript contain sufficient and appropriate references
+   (maximum 12-unless the article is a survey or tutorial in scope)? Please
+   elaborate in the Detailed Comments section.: Important references are
+   missing; more references are needed
+
+4. Does the introduction clearly state a valid thesis? Please explain your
+   answer in the Detailed Comments section.: Could be improved
+
+5. How would you rate the organization of the manuscript? Please elaborate
+   in the Detailed Comments section.: Satisfactory
+
+6. Is the manuscript focused? Please elaborate in the Detailed Comments
+   section.: Could be improved
+
+7. Is the length of the manuscript appropriate for the topic? Please
+   elaborate in the Detailed Comments section.: Could be improved
+
+8. Please rate and comment on the readability of this manuscript in the
+   Detailed Comaments section.: Easy to read
+
+9. Please rate and comment on the timeliness and long term interest of this
+   manuscript to CiSE readers in the Detailed Comments section. Select all
+   that apply.: Topic and content are likely to be of growing interest to
+   CiSE readers over the next 12 months
+
+Please rate the manuscript. Explain your choice in the Detailed Comments
+section.: Fair