aboutsummaryrefslogtreecommitdiff
path: root/peer-review/1-answer.txt
diff options
context:
space:
mode:
Diffstat (limited to 'peer-review/1-answer.txt')
-rw-r--r--peer-review/1-answer.txt1216
1 files changed, 1216 insertions, 0 deletions
diff --git a/peer-review/1-answer.txt b/peer-review/1-answer.txt
new file mode 100644
index 0000000..dd7f272
--- /dev/null
+++ b/peer-review/1-answer.txt
@@ -0,0 +1,1216 @@
+Dear CiSE editos,
+
+Thank you very much for the very complete and useful referee reports. They
+have been fully implemented in this submission and have significantly
+improved teh quality and clarity of the paper.
+
+Below all the points raised by the Editor in Chief (EiC), Associate editor,
+and the 5 referees (in the same order as the review process report) are
+addressed individually as a numbered list.
+
+Sincerely yours,
+Dr. Mohammad Akhlaghi [on behalf of the co-authors]
+Instituto de Astrofísica de Canarias, Tenerife, Spain.
+
+------------------------------
+
+
+
+
+
+1. [EiC] Some reviewers request additions, and overview of other
+ tools.
+
+ANSWER: Indeed, there is already a large body of previous work in this
+field, and we had learnt a lot from them during the creation of the
+criteria and the proof of concept tool (Maneage). Before submitting the
+paper, we had already done a very comprehensive review of the tools (as you
+may notice from the Git repository[1], where most of the tools were run and
+practically tested). However, the CiSE Author Information explicitly
+states: "The introduction should provide a modicum of background in one or
+two paragraphs, but should not attempt to give a literature review". This
+is the usual practice in previously published papers at CiSE and is in line
+with the maximum 6250 word-count and maximum of 12 references to be used in
+bibliography.
+
+We already discussed this point privately with you and we agreed upon the
+following solution: the extended reviews will be submitted as supplementary
+material, to accompany the paper as "Web extras". These appendices are also
+mentioned in the submitted paper so that any interested CiSE reader can
+easily be informed about the existance from the paper and access them.
+
+Appendix A is focused on the low-level "tools" that are commonly used in
+the reproducible workflow solutions (including Maneage). In Appendix B, we
+touch upon +25 reproducible solutions and compare them directly with our
+criteria. In particular, we also review tools that have been abandoned or
+discontinued and use the criteria to justify why this happened.
+
+[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/tex/src/paper-long.tex#L1579
+[2] https://arxiv.org/abs/2006.03018
+[3] https://doi.org/10.5281/zenodo.3872247
+
+------------------------------
+
+
+
+
+
+2. [Associate Editor] There are general concerns about the paper
+ lacking focus
+
+ANSWER: Thanks to all the corrections/clarifications that have been done in
+this review, the paper is much more focused and direct to the point. We are
+very grateful to the thorough listing of points by the referees that helped
+clarify points that we needed to improve.
+
+------------------------------
+
+
+
+
+
+3. [Associate Editor] Some terminology is not well-defined
+ (e.g. longevity).
+
+ANSWER: In this revision, "Reproducibility", "Longevity" and "Usage" have
+been explicitly defined in the first paragraph of Section II. With this
+definition, the main argument of the paper has become much more clear.
+Thank you (and the referees) for highlighting this.
+
+------------------------------
+
+
+
+
+
+4. [Associate Editor] The discussion of tools could benefit from some
+ categorization to characterize their longevity.
+
+ANSWER: The approximate longevity of the various tools reviewed in Section
+II is now mentioned immediately after each and highlighted in green. For
+example we have added this after containers "(their longevity is determined
+by the host kernel, typically a decade)".
+
+------------------------------
+
+
+
+
+
+5. [Associate Editor] Background and related efforts need significant
+ improvement. (See below.)
+
+ANSWER: This has been done, as mentioned in (1.) above.
+
+------------------------------
+
+
+
+
+
+6. [Associate Editor] There is consistency among the reviews that
+ related work is particularly lacking.
+
+ANSWER: This has been done, as mentioned in (1.) above.
+
+------------------------------
+
+
+
+
+
+7. [Associate Editor] The current work needs to do a better job of
+ explaining how it deals with the nagging problem of running on CPU
+ vs. different architectures.
+
+ANSWER: The CPU architecture of the running system is now precisely
+reported in the "Acknowledgments" section (highlighted in green). Also, a
+description of dependency on hardware architecture, and how Maneage reports
+this, is also added in the "Proof of concept: Maneage" Section.
+
+------------------------------
+
+
+
+
+
+8. [Associate Editor] At least one review commented on the need to
+ include a discussion of continuous integration (CI) and its
+ potential to help identify problems running on different
+ architectures. Is CI employed in any way in the work presented in
+ this article?
+
+ANSWER: CI has been added in the "Discussion" section as one solution to
+find breaking points in operating system updates and new/different
+architectures. For the core Maneage branch, we have defined task #15741 [1]
+to add CI on many architectures in the near future.
+
+[1] http://savannah.nongnu.org/task/?15741
+
+------------------------------
+
+
+
+
+
+9. [Associate Editor] The presentation of the Maneage tool is both
+ lacking in clarity and consistency with the public
+ information/documentation about the tool. While our review focus
+ is on the article, it is important that readers not be confused
+ when they visit your site to use your tools.
+
+ANSWER: Thank you for raising this important point. We have broken down the
+very long "About" page into multiple pages to help in readability:
+
+https://maneage.org/about.html
+
+Generally, the webpage will soon undergo major improvements to be even more
+clear (as part of our RDA grant for Maneage, after the paper we have
+promised a clear and friendly webpage). The website is developed on a
+public git repository (https://git.maneage.org/webpage.git), so any
+specific proposals for improvements can be handled efficiently and
+transparently and we welcome any feedback in this aspect.
+
+------------------------------
+
+
+
+
+
+10. [Associate Editor] A significant question raised by one review is
+ how this work compares to "executable" papers and Jupyter
+ notebooks. Does this work embody similar/same design principles
+ or expand upon the established alternatives? In any event, a
+ discussion of this should be included in background/motivation and
+ related work to help readers understand the clear need for a new
+ approach, if this is being presented as new/novel.
+
+ANSWER: Thank you for highlighting this important point. We saw that it is
+necessary to compare and contrast our Maneage proof-of-concept
+demonstration more directly against the Jupyter notebook type of
+approach. Two paragraphs have been added in Sections II and IV to clarify
+this (our criteria require and build in more modularity and longevity than
+Jupyter). A much more extensive comparison and review is now also available
+in Appendix A.
+
+
+------------------------------
+
+
+
+
+
+11. [Reviewer 1] Adding an explicit list of contributions would make
+ it easier to the reader to appreciate these. These are not
+ mentioned/cited and are highly relevant to this paper (in no
+ particular order):
+ 1. Git flows, both in general and in particular for research.
+ 2. Provenance work, in general and with git in particular
+ 3. Reprozip: https://www.reprozip.org/
+ 4. OCCAM: https://occam.cs.pitt.edu/
+ 5. Popper: http://getpopper.io/
+ 6. Whole Tale: https://wholetale.org/
+ 7. Snakemake: https://github.com/snakemake/snakemake
+ 8. CWL https://www.commonwl.org/ and WDL https://openwdl.org/
+ 9. Nextflow: https://www.nextflow.io/
+ 10. Sumatra: https://pythonhosted.org/Sumatra/
+ 11. Podman: https://podman.io
+ 12. AppImage (https://appimage.org/)
+ 13. Flatpack (https://flatpak.org/)
+ 14. Snap (https://snapcraft.io/)
+ 15. nbdev https://github.com/fastai/nbdev and jupytext
+ 16. Bazel: https://bazel.build/
+ 17. Debian reproducible builds: https://wiki.debian.org/ReproducibleBuilds
+
+ANSWER:
+
+1. In Section IV, we have added that "Generally, any git flow (branching
+ strategies) can be used by the high-level project authors or future
+ readers."
+2. We have mentioned research objects as one mode of provenance tracking
+ and the related provenance work that has already been done and can be
+ exploited using these criteria and our proof of concept is indeed very
+ large. However, the 6250 word-count limit is very tight and if we add
+ more on it in this length, we would have to remove points of higher priority.
+ Hopefully this can be the subject of a follow-up paper.
+3. A review of ReproZip is in Appendix B.
+4. A review of Occam is in Appendix B.
+5. A review of Popper is in Appendix B.
+6. A review of Whole Tale is in Appendix B.
+7. A review of Snakemake is in Appendix A.
+8. CWL and WDL are described in Appendix A (Job management).
+9. Nextflow is described in Appendix A (Job management).
+10. Sumatra is described in Appendix B.
+11. Podman is mentioned in Appendix A (Containers).
+12. AppImage is mentioned in Appendix A (Package management).
+13. Flatpak is mentioned in Appendix A (Package management).
+14. Snap is mentioned in Appendix A (Package management).
+15. nbdev and jupytext are high-level tools to generate documentation and
+ packaging custom code in Conda or pypi. High-level package managers
+ like Conda and Pypi have already been thoroughly reviewed in Appendix A
+ for their longevity issues, so we feel that there is no need to
+ include these.
+16. Bazel is mentioned in Appendix A (job management).
+17. Debian's reproducible builds are only designed for ensuring that software
+ packaged for Debian is bitwise reproducible. As mentioned in the
+ discussion section of this paper, the bitwise reproducibility of software is
+ not an issue in the context discussed here; the reproducibility of the
+ relevant output data of the software is the main issue.
+
+------------------------------
+
+
+
+
+
+12. [Reviewer 1] Existing guidelines similar to the proposed "Criteria
+ for longevity". Many articles of these in the form "10 simple
+ rules for X", for example (not exhaustive list):
+ * https://doi.org/10.1371/journal.pcbi.1003285
+ * https://arxiv.org/abs/1810.08055
+ * https://osf.io/fsd7t/
+ * A model project for reproducible papers: https://arxiv.org/abs/1401.2000
+ * Executable/reproducible paper articles and original concepts
+
+ANSWER: Thank you for highlighting these points. Appendix B starts with a
+subsection titled "suggested rules, checklists or criteria". In this
+section, we review the existing sets of criteria. This subsection includes
+the sources proposed by the reviewer [Sandve et al; Rule et al; Nust et al]
+(and others).
+
+ArXiv:1401.2000 has been added in Appendix A as an example paper using
+virtual machines. We thank the referee for bringing up this paper, because
+the link to the VM provided in the paper no longer works (the URL
+http://archive.comp-phys.org/provenance_challenge/provenance_machine.ova
+redirects to
+https://share.phys.ethz.ch//~alpsprovenance_challenge/provenance_machine.ova
+which gives a 'Not Found' html response). Together with SHARE, this very
+nicely highlights our main issue with binary containers or VMs: their lack
+of longevity due to the high cost of long term storage of large files.
+
+------------------------------
+
+
+
+
+
+13. [Reviewer 1] Several claims in the manuscript are not properly
+ justified, neither in the text nor via citation. Examples (not
+ exhaustive list):
+ 1. "it is possible to precisely identify the Docker “images” that
+ are imported with their checksums, but that is rarely practiced
+ in most solutions that we have surveyed [which ones?]"
+ 2. "Other OSes [which ones?] have similar issues because pre-built
+ binary files are large and expensive to maintain and archive."
+ 3. "Researchers using free software tools have also already had
+ some exposure to it"
+ 4. "A popular framework typically falls out of fashion and
+ requires significant resources to translate or rewrite every
+ few years."
+
+ANSWER: These points have been clarified in the highlighted parts of the text:
+
+1. Many examples have been given throughout the newly added
+ appendices. To avoid confusion in the main body of the paper, we
+ have removed the "we have surveyed" part. It is already mentioned
+ above this point in the text that a large survey of existing
+ methods/solutions is given in the appendices.
+
+2. Due to the thorough discussion of this issue in the appendices with
+ precise examples, this line has been removed to allow space for the
+ other points raised by the referees. The main point (high cost of
+ keeping binaries) is already abundantly clear.
+
+ On a similar topic, Dockerhub's recent announcement that inactive images
+ (for over 6 months) will be deleted has also been added. The announcemnt
+ URL is a hyperlink in the text (it was too long to print directly, if
+ IEEE has a special short-url format, we can add it).
+
+ Another interesting News in relation to longevity has also been added
+ here: the decision by CentOS to abandon CentOS 8 next year. Again, the
+ URL is within a hyperlink on the text. Many scientific and industrial
+ projects have relied on CentOS for longevity over the last two decades,
+ but that didn't stop its creators from abandoning it 8 years early and
+ completely switching its release paradigm.
+
+3. A small statement has been added, reminding the readers that almost all
+ free software projects are built with Make (CMake is also used
+ sometimes, but CMake is just a high-level wrapper over Make: it finally
+ produces a 'Makefile'; practical usage of CMake generally obliges the
+ user to understand Make).
+
+4. The example of Python 2 has been added to clarify this point.
+
+
+------------------------------
+
+
+
+
+
+14. [Reviewer 1] As mentioned in the discussion by the authors, not
+ even Bash, Git or Make is reproducible, thus not even Maneage can
+ address the longevity requirements. One possible alternative is
+ the use of CI to ensure that papers are re-executable (several
+ papers have been written on this topic). Note that CI is
+ well-established technology (e.g. Jenkins is almost 10 years old).
+
+ANSWER: Thank you for raising these issues. We had initially planned to
+discuss CIs, but like many discussion points, we were forced to remove it
+before the first submission due to the very tight word-count limit. We have
+now added a sentence on CI in the discussion.
+
+On the issue of Bash/Git/Make, indeed, the executable built files of Bash,
+Git and Make binaries are not bitwise reproducible/identical on different
+systems. However, as mentioned in the discussion, we are concerned with the
+_output_ of the software's executable file. We are not interested in the
+executable file itself (which should be different for different OSs or CPU
+architectures).
+
+The reproducibility of a binary file only becomes important for security
+purposes where binaries are downloaded. In Maneage, we download the
+software source code tarball, confirm the tarball's SHA512 checksum with
+the checksum that is recorded in Maneage [1], and build the software with
+precisely defined build environment and dependencies.
+
+In summary, even though the compiled binary files of specific versions of
+Git, Bash or Make will not be bitwise reproducible/identical on different
+systems, their scientific outputs are exactly reproducible: 'git describe'
+or Bash's 'for' loop will have the same output on GNU/Linux, macOS/Darwin
+or FreeBSD (despite having bitwise different executables).
+
+[1] http://git.maneage.org/project.git/tree/reproduce/software/config/checksums.conf
+
+------------------------------
+
+
+
+
+
+15. [Reviewer 1] Criterion has been proposed previously. Maneage itself
+ provides little novelty (see comments below).
+
+ANSWER: The previously suggested sets of criteria that were listed by
+Reviewer 1 are reviewed by us in the newly added Appendix B, and the
+novelty and advantages of our proposed criteria are contrasted there
+with the earlier sets of criteria.
+
+------------------------------
+
+
+
+
+
+16. [Reviewer 2] Authors should add indication that using good practices it
+ is possible to use Docker or VM to obtain identical OS usable for
+ reproducible research.
+
+ANSWER: In the submitted version we had stated that "Ideally, it is
+possible to precisely identify the Docker images that are imported with
+their checksums ...". But to be more clear and go directly to the point, it
+has been edited to explicity say "... to recreate an identical OS image
+later".
+
+------------------------------
+
+
+
+
+
+17. [Reviewer 2] The CPU architecture of the platform used to run the
+ workflow is not discussed in the manuscript. Authors should probably
+ take into account the architecture used in their workflow or at least
+ report it.
+
+ANSWER: Thank you very much for raising this important point. We hadn't
+seen other reproducibility papers mention this important point and thus
+missed it. In the acknowledgments (where we also mention the commit hashes)
+we now explicitly mention the exact CPU architecture used to build this
+paper: "This project was built on an x86_64 machine with Little Endian
+byte-order and address sizes 39 bits physical, 48 bits virtual.". This is
+because we have already seen cases where the architecture is the same, but
+programs fail because of the byte order.
+
+Generally, Maneage will now extract this information from the running
+system during its configuration phase, and provide the users with three
+different LaTeX macros that contain this information. Users can use these
+LaTeX macros anywhere in their paper.
+
+------------------------------
+
+
+
+
+
+18. [Reviewer 2] I don’t understand the "no dependency beyond
+ POSIX". Authors should more explained what they mean by this sentence.
+
+ANSWER: This has been clarified with the short extra statement "a minimal
+Unix-like standard that is shared between many operating systems". Also in
+the appendix we now say "no execution requirement beyond a minimal
+Unix-like operating system".
+
+We would have liked to explain this more, but the word limit is very
+constraining. It is more clear in the appendices, and we will put more
+clear explations in teh web page.
+
+------------------------------
+
+
+
+
+
+19. [Reviewer 2] Unfortunately, sometime we need proprietary or specialized
+ software to read raw data... For example in genetics, micro-array raw
+ data are stored in binary proprietary formats. To convert this data
+ into a plain text format, we need the proprietary software provided
+ with the measurement tool.
+
+ANSWER: Thank you very much for this good point. A description of a
+possible solution to this has been added after criterion 8.
+
+------------------------------
+
+
+
+
+
+20. [Reviewer 2] I was not able to properly set up a project with
+ Maneage. The configuration step failed during the download of tools
+ used in the workflow. This is probably due to a firewall/antivirus
+ restriction out of my control. How frequent this failure happen to
+ users?
+
+ANSWER: Thank you for mentioning this. This has been fixed by archiving all
+Maneage'd software on Zenodo (https://doi.org/10.5281/zenodo.3883409) and
+also downloading them from there as highest precedence.
+
+Until recently we would directly access each software's own webpage to
+download the source files, and this caused frequent problems of the type
+you mentioned (different servers in different ISPs/states/countries can
+behave differentely). In other cases, we were very frustrated when a
+software's webpage would temporarily be unavailable (e.g., for maintenance
+reasons); this was a major hindrance in building new projects.
+
+Since all the software is free-licensed, we are legally allowed to
+re-distribute them (within the conditions, such as not removing copyright
+notices) and Zenodo is defined for long-term archival of academic digital
+objects, so we decided that a software source code repository on Zenodo
+would be the most reliable solution. At configure time, Maneage now
+accesses Zenodo's DOI and resolves the most recent URL to automatically
+download any necessary software source code that the project needs from
+there.
+
+Generally, we also keep all software in a Git repository on our own
+webpage: http://git.maneage.org/tarballs-software.git/tree. Also, Maneage
+users can identify their own custom URLs for downloading software, which
+will be given higher priority than Zenodo (useful for situations when
+custom software is downloaded and built in a project branch (not the core
+'maneage' branch).
+
+------------------------------
+
+
+
+
+
+21. [Reviewer 2] The time to configure a new project is quite long because
+ everything needs to be compiled. Authors should compare the time
+ required to set up a project Maneage versus time used by other
+ workflows to give an indication to the readers.
+
+ANSWER: Thank you for raising this point. it takes about 1.5 hours to
+configure the default Maneage branch on an 8-core CPU (more than half of
+this time is devoted to GCC on GNU/Linux operating systems, and the
+building of GCC can optionally be disabled with the '--host-cc' option to
+significantly speed up the build when the host's GCC is
+similar). Furthermore, Maneage can be built within a Docker container.
+
+A paragraph has been added in Section IV on this issue (the
+build time and building within a Docker container). We have also defined
+task #15818 [1] to have our own core Docker image that is ready to build a
+Maneaged project and will be adding it shortly.
+
+[1] https://savannah.nongnu.org/task/index.php?15818
+
+------------------------------
+
+
+
+
+
+22. [Reviewer 3] Authors should define their use of the term [Replicability
+ or Reproducibility] briefly for their readers.
+
+ANSWER: "Reproducibility" has been defined along with "Longevity" and
+"usage" at the start of Section II.
+
+------------------------------
+
+
+
+
+
+23. [Reviewer 3] The introduction is consistent with the proposal of the
+ article, but deals with the tools separately, many of which can be used
+ together to minimize some of the problems presented. The use of
+ Ansible, Helm, among others, also helps in minimizing problems.
+
+ANSWER: That is correct. In the new appendices we have touched upon this,
+especially in Appendix B where we discuss the technologies used by various
+reproducible workflow solutions.
+
+About Ansible and Helm; they are primarily designed for distributed
+computing. For example Helm is just a high-level package manager for a
+Kubernetes cluster that is based on containers. A review of them could be
+added to the Appendices, but we feel they this would distract somewhat from
+the main points of our current paper.
+
+------------------------------
+
+
+
+
+
+24. [Reviewer 3] When the authors use the Python example, I believe it is
+ interesting to point out that today version 2 has been discontinued by
+ the maintaining community, which creates another problem within the
+ perspective of the article.
+
+ANSWER: Thank you very much for highlighting this point. We had excluded
+this point for the sake of article length, but we have restored it in
+the introduction of the revised version.
+
+------------------------------
+
+
+
+
+
+25. [Reviewer 3] Regarding the use of VM's and containers, I believe that
+ the discussion presented by THAIN et al., 2015 is interesting to
+ increase essential points of the current work.
+
+ANSWER: Thank you very much for pointing out the works by Thain. We
+couldn't find any first-author papers in 2015, but found Meng & Thain
+(https://doi.org/10.1016/j.procs.2017.05.116) which had a relevant
+discussion of why they didn't use Docker containers in their work. That
+paper is now cited in the discussion of Containers in Appendix A.
+
+------------------------------
+
+
+
+
+
+26. [Reviewer 3] About the Singularity, the description article was missing
+ (Kurtzer GM, Sochat V, Bauer MW, 2017).
+
+ANSWER: Thank you for the reference. We are restricted in the main
+body of the paper due to the strict bibliography limit of 12
+references; we have included Kurtzer et al 2017 in Appendix A (where
+we discuss Singularity).
+
+------------------------------
+
+
+
+
+
+27. [Reviewer 3] I also believe that a reference to FAIR is interesting
+ (WILKINSON et al., 2016).
+
+ANSWER: The FAIR principles have been mentioned in the main body of the
+paper, but unfortunately we had to remove its citation in the main paper (like
+many others) to keep to the maximum of 12 references. We have cited it in
+Appendix B.
+
+------------------------------
+
+
+
+
+
+28. [Reviewer 3] In my opinion, the paragraph on IPOL seems to be out of
+ context with the previous ones. This issue of end-to-end
+ reproducibility of a publication could be better explored, which would
+ further enrich the tool presented.
+
+
+ANSWER: We agree and have removed the IPOL example from that section. We
+have included an in-depth discussion of IPOL in Appendix B and we comment
+on how Maneage'd projects offer a similar level of peer-review control.
+
+------------------------------
+
+
+
+
+
+29. [Reviewer 3] On the project website, I suggest that the information
+ contained in README-hacking be presented on the same page as the
+ Tutorial. A topic breakdown is interesting, as the markdown reading may
+ be too long to find information.
+
+ANSWER: Thank you very much for this good suggestion, it has been
+implemented: https://maneage.org/about.html . The webpage will continuously
+be improved and such feedback is always very welcome.
+
+------------------------------
+
+
+
+
+
+31. [Reviewer 3] The tool is suitable for Unix users, keeping users away
+ from Microsoft environments.
+
+ANSWER: The issue of building on Windows has been discussed in Section IV,
+either using Docker (or VMs) or using the Windows Subsystem for Linux.
+
+------------------------------
+
+
+
+
+32. [Reviewer 3] Important references are missing; more references are
+ needed
+
+ANSWER: Two comprehensive Appendices have been added to address this issue.
+
+------------------------------
+
+
+
+
+
+33. [Reviewer 4] Revisit the criteria, show how you have come to decide on
+ them, give some examples of why they are important, and address
+ potential missing criteria.
+
+ANSWER: In the new appendix B, we have added a new section, reviewing some
+existing criteria. We would be very interested to discuss them even further
+in the main body, Within the constraints of space (the limit is 6250
+words), it is almost impossible to discuss the history of each in detail or
+add more anecdotal examples of their relevance.
+
+------------------------------
+
+
+
+
+
+34. [Reviewer 4] Clarify the discussion of challenges to adoption and make
+ it clearer which tradeoffs are important to practitioners.
+
+ANSWER: We discuss many of these challenges and caveats in the Discussion
+Section (V), within the existing word limit.
+
+------------------------------
+
+
+
+
+
+35. [Reviewer 4] Be clearer about which sorts of research workflow are best
+ suited to this approach.
+
+ANSWER: Maneage is flexible enough to enable a wide range of workflows to
+be implemented. This is done by leveraging the highly modular and flexible
+nature of Makefiles run via 'Make'.
+
+GUI-based operations (that involve human interaction and cannot be run in
+batch-mode) are one type of workflow that our proof-of-concept will not
+support. But as discussed in the completeness criteria, human interaction
+is an incompleteness, dramatically reducing the reproducibility of a
+result.
+
+------------------------------
+
+
+
+
+
+36. [Reviewer 4] There is also the challenge of mathematical
+ reproducibility, particularly of the handling of floating point number,
+ which might occur because of the way the code is written, and the
+ hardware architecture (including if code is optimised / parallelised).
+
+ANSWER: Floating point errors and optimizations have been mentioned in the
+discussion (Section V). The issue with parallelization has also been
+discussed in Section IV, in the part on verification ("Where exact
+reproducibility is not possible (for example due to parallelization),
+values can be verified by a statistical method specified by the project
+authors."). We have linked keywords in the latter sentence to a Software
+Heritage URI [1] with the specific file in a Maneage'd paper that
+illustrates an example of how statistical verification of parallelised code
+can work in practice (Peper & Roukema 2020; zenodo.4062460).
+
+We would be interested to hear if any other papers already exist that use
+automatic statistical verification of parallelised code as has been done in
+this Maneage'd paper.
+
+[1] https://archive.softwareheritage.org/browse/origin/content/?branch=refs/heads/postreferee_corrections&origin_url=https://codeberg.org/boud/elaphrocentre.git&path=reproduce/analysis/bash/verify-parameter-statistically.sh
+
+------------------------------
+
+
+
+
+
+37. [Reviewer 4] ... the handling of floating point number
+[reproducibility] ... will come with a tradeoff agianst performance, which
+is never mentioned.
+
+ANSWER: The criteria we propose and the proof-of-concept with Maneage do
+not force the choice of a tradeoff between exact bitwise floating point
+reproducibility versus performance (e.g. speed). The specific concepts of
+"verification" and "reproducibility" will vary between domains of
+scientific computation, but we expect that the criteria allow this wide
+range.
+
+Performance is indeed an important issue for _immediate_ reproducibility
+and we would have liked to discuss it. But due to the strict word-count, we
+feel that adding it to the discussion points, without having adequate space
+to elaborate, can confuse the readers away from the focus of this paper (on
+long term usability). It has therefore not been added.
+
+------------------------------
+
+
+
+
+
+38. [Reviewer 4] Tradeoff, which might affect Criterion 3 is time to result,
+ people use popular frameworks because it is easier to use them.
+
+ANSWER: That is true. In section IV, we have given the time it takes to
+build Maneage (only once on each computer) to be around 1.5 hours on an
+8-core CPU (a typical machine that may be used for data analysis). We
+therefore conclude that when the analysis is complex (and thus taking many
+hours, or even days to complete), this time is negligible.
+
+But if the project's full analysis takes 10 minutes or less (like the
+extremely simple analysis done in this paper). Indeed, the 1.5 hour
+building time is significant. In those cases, as discussed in the main
+body, the project can be built once in a Docker image and easily moved to
+other computers.
+
+Generally, it is true that the initial configuration time (only once on
+each computer) of a Maneage install may discourage some scientists; but a
+serious scientific research project is never started and completed on a
+time scale of a few hours.
+
+------------------------------
+
+
+
+
+
+39. [Reviewer 4] I would liked to have seen explanation of how these
+ challenges to adoption were identified: was this anecdotal, through
+ surveys? participant observation?
+
+ANSWER: The results mentioned here are anecdotal: based on private
+discussions after holding multiple seminars and Webinars with RDA's
+support, and also a workshop that was planned for non-astronomers. We
+invited (funded) early career researchers to come to the workshop with the
+RDA funding. However, that workshop was cancelled due to the COVID-19
+pandemic and we had private communications instead.
+
+We would very much like to elaborate on this experience of training new
+researchers with these tools. However, as with many of the cases above, the
+very strict word-limit doesn't allow us to elaborate beyond what we have
+already written. Hopefully in a couple of years and with the wider usage of
+Maneage or these criteria in research papers, we will be able to write a
+paper that is directly focused on this.
+
+------------------------------
+
+
+
+
+
+40. [Reviewer 4] Potentially an interesting sidebar to investigate how
+ LaTeX/TeX has ensured its longevity!
+
+ANSWER: That is indeed a very interesting subject to study (an obvious link
+is that LaTeX/TeX is very strongly based on plain text files). We have been
+in touch with Karl Berry (one of the core people behind TeX Live, who also
+plays a prominent role in GNU) and have whitnessed the TeX Live community's
+efforts to become more and more portable and longer-lived.
+
+However, as the reviewer states, this would be a sidebar, and we are
+constrained for space, so we couldn't find a place to highlight this. But
+it is indeed a subject worthy of a full paper (that can be very useful for
+many software projects).
+
+------------------------------
+
+
+
+
+
+41. [Reviewer 4] The title is not specific enough - it should refer to the
+ reproducibility of workflows/projects.
+
+ANSWER: A problem here is that "workflow" and "project" taken in isolation
+risk being vague for wider audiences. Also, we aim at covering a wider
+range of aspects of a project than just than the workflow alone; in the
+other direction, the word "project" could be seen as too broad, including
+the funding, principal investigator, and team coordination.
+
+A specific title that might be appropriate could be, for example, "Towards
+long-term and archivable reproducibility of scientific computational
+research projects". Using a term proposed by one of our reviewers, "Towards
+long-term and archivable end-to-end reproducibility of scientific
+computational research projects" might also be appropriate.
+
+Nevertheless, we feel that in the context of an article published in CiSE,
+our current short title is sufficient.
+
+------------------------------
+
+
+
+
+
+42. [Reviewer 4] Whilst the thesis stated is valid, it may not be useful to
+ practitioners of computation science and engineering as it stands.
+
+ANSWER: This point appears to refer to floating point bitwise
+reproducibility and possibly to the conciseness of our paper. The former is
+fully allowed for, as stated above, though not obligatory, using the
+"verify.mk" rule file to (typically, but not obligatorily) force bitwise
+reproducibility. The latter is constrained by the 6250-word limit of
+CiSE. The addition of supplementary appendices in the extended version help
+respond to the latter point.
+
+------------------------------
+
+
+
+
+
+43. [Reviewer 4] Longevity is not defined.
+
+ANSWER: This has been defined now at the start of Section II.
+
+------------------------------
+
+
+
+
+
+44. [Reviewer 4] Whilst various tools are discussed and discarded, no
+ attempt is made to categorise the magnitude of longevity for which they
+ are relevant. For instance, environment isolators are regarded by the
+ software preservation community as adequate for timescale of the order
+ of years, but may not be suitable for the timescale of decades where
+ porting and emulation are used.
+
+ANSWER: Statements on quantifying the longevity of specific tools have been
+added in Section II and are highlighted in green. For example in the case
+of Docker images: "their longevity is determined by the host kernel,
+usually a decade", for Python packages: "Python installation with a usual
+longevity of a few years", for Nix/Guix: "with considerably better
+longevity; same as supported CPU architectures."
+
+------------------------------
+
+
+
+
+
+45. [Reviewer 4] The title of this section "Commonly used tools and their
+ longevity" is confusing - do you mean the longevity of the tools or the
+ longevity of the workflows that can be produced using these tools?
+ What happens if you use a combination of all four categories of tools?
+
+ANSWER: We have changed the section title to "Longevity of existing tools"
+to clarify that we refer to longevity of the tools.
+
+If the four categories of tools were combined, then the overall longevity
+would be that of the shortest intersection of the time spans over which the
+tools remained viable.
+
+------------------------------
+
+
+
+
+
+46. [Reviewer 4] It wasn't clear to me if code was being run to generate
+ the results and figures in a LaTeX paper that is part of a project in
+ Maneage. It appears to be suggested this is the case, but Figure 1
+ doesn't show how this works - it just has the LaTeX files, the data
+ files and the Makefiles. Is it being suggested that LaTeX itself is the
+ programming language, using its macro functionality?
+
+ANSWER: Thank you for highlighting this point of confusion. The caption of
+Figure 1 has been edited to hopefully clarify the point. In short, the
+arrows represent the operation of software and boxes represent files. In
+the case of generating 'paper.pdf' from its three dependencies
+('references.tex', 'paper.tex' and 'project.tex'), yes, LaTeX is used. But
+in other steps, other tools are used (depending on the analysis). For
+example as you see in [1] the main step of the arrow connecting
+'table-3.txt' to 'tools-per-year.txt' is an AWK command (there are also a
+few 'echo' commands for meta data and copyright in the output plain-text
+file [2]).
+
+[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/reproduce/analysis/make/demo-plot.mk#L51
+[2] https://zenodo.org/record/3911395/files/tools-per-year.txt
+
+------------------------------
+
+
+
+
+
+47. [Reviewer 4] I was a bit confused on how collaboration is handled as
+ well - this appears to be using the Git branching model, and the
+ suggestion that Maneage is keeping track of all components from all
+ projects - but what happens if you are working with collaborators that
+ are using their own Maneage instance?
+
+ANSWER: Indeed, Maneage operates based on the Git branching model. As
+mentioned in the text, Maneage is itself a Git branch. Researchers spin-off
+their own branch for a specific project from the 'maneage' branch and start
+customizing it for their particular project in their own particular
+repository. They can also use all types of Git-based collaborating models
+to work together on their branch.
+
+Figure 2 in fact explicitly shows such a case: the main project leader is
+committing on the "project" branch. But a collaborator creates a separate
+branch over commit '01dd812' and makes a couple of commits ('f69e1f4' and
+'716b56b'), and finally asks the project leader to merge them into the
+project. This can be generalized to any Git based collaboration model.
+
+Recent experience by one of us [Roukema] found that a merge of a
+Maneage-based cosmology simulation project (now zenodo.4062460), after
+separate evolution of about 30-40 commits on maneage and possibly 100 on
+the project, needed about one day of straightforward effort, without any
+major difficulties. So it is easy to update low-level infrastructure.
+
+------------------------------
+
+
+
+
+
+48. [Reviewer 4] I would also [have] liked to have seen a comparison
+ between this approach and other "executable" paper approaches
+ e.g. Jupyter notebooks, compared on completeness, time taken to
+ write a "paper", ease of depositing in a repository, and ease of
+ use by another researcher.
+
+ANSWER: This type of sociological survey will make sense once the number of
+projects run with Maneage is sufficiently high and comparable to Jupyter
+for example. The time taken to write a paper is be measurable
+automatically: from the git history. The other parameters suggested would
+require cooperation from the scientists in responding to the survey, or
+will have to be collected anecdotally in the short term. This is a good
+subject for a follow-up paper in a few years.
+
+------------------------------
+
+
+
+
+
+49. [Reviewer 4] The weakest aspect is the assumption that research can be
+ easily compartmentalized into simple and complete packages. Given that
+ so much of research involves collaboration and interaction, this is not
+ sufficiently addressed. In particular, the challenge of
+ interdisciplinary work, where there may not be common languages to
+ describe concepts and there may be different common workflow practices
+ will be a barrier to wider adoption of the primary thesis and criteria.
+
+ANSWER: Maneage was precisely defined to address the problem of
+publishing/collaborating on complete workflows by many people (in this
+paper itself, we are already 6 people who have been collaborating to
+complete it and you can see this in the Git history). Git has been
+exceptionally powerful in enabling collaborations of huge projects with
+thousands of contributors like the Linux kernel. Exactly the same
+collaborating style of the Linux kernel can be implemented in Maneage for
+large scientific projects.
+
+Hopefully with the clarification to point 47 above, this should also become
+clear.
+
+------------------------------
+
+
+
+
+
+50. [Reviewer 5] Major figures currently working in this exact field do not
+ have their work acknowledged in this work.
+
+ANSWER: This was due to the strict word limit and the CiSE publication
+policy (to not include a literature review because there is a limit of only
+12 citations). But we had indeed already done a comprehensive literature
+review and the editors kindly agreed that we submit that review as
+supplementary appendices.
+
+------------------------------
+
+
+
+
+
+51. [Reviewer 5] Jimenez I et al ... 2017 "The popper convention: Making
+ reproducible systems evaluation practical ..." and the later
+ revision that uses GitHub Actions, is largely the same as this
+ work.
+
+ANSWER: This work and the proposed criteria are very different from
+Popper. A detailed review of Popper, in particular, is given in Appendix B.
+
+------------------------------
+
+
+
+
+
+52. [Reviewer 5] The lack of attention to virtual machines and containers
+ is highly problematic. While a reader cannot rely on DockerHub or a
+ generic OS version label for a VM or container, these are some of the
+ most promising tools for offering true reproducibility.
+
+ANSWER: Containers and VMs have been more thoroughly discussed in the main
+body and also extensively discussed in appendix A. As discussed (with many
+cited examples), Containers and VMs are only appropriate when they are
+themselves reproducible (for example, if running the Dockerfile this year
+and next year gives the same internal environment). However, we show that
+this is not the case in most solutions (a more comprehensive review would
+require its own paper).
+
+Moreover, with complete, robust environment builders like Maneage, Nix or
+GNU Guix, the analysis environment within a container can be exactly
+reproduced later. But even so, due to their binary nature and large storage
+volume, they are not trusable sources for the long term (it is expensive to
+archive them). We show several examples in the paper and appendices of how
+projects that relied on VMs in 2011 and 2014 are no longer active, and how
+even Dockerhub will be deleting containers that are not used for more than
+6 months in free accounts (due to the high storage costs).
+
+Furthermore, as a unique new feature, Maneage has the criterion of "Minimal
+complexity". This means that even if, for any reason, the project is not
+able to be run in the future, the content, analysis scripts, etc. are
+accessible for the interested reader as plain text (only the development
+history - the git history - is storied in git's binary format). Unlike Nix
+or Guix, our approach doesn't need a third-party package package manager:
+the instructions for building all the software of a project are directly in
+the same project as the high-level analysis software. The full end-to-end
+process is transparent and archived in Maneage, and the interested
+scientist can follow the analysis and study the different decisions of each
+step (why and how the analysis was done). They can also modify it to work
+on future hardware that we don't know about today (this is not possible on
+a binary file like VMs or containers).
+
+------------------------------
+
+
+
+
+
+53. [Reviewer 5] On the data side, containers have the promise to manage
+ data sets and workflows completely [Lofstead J, Baker J, Younge A. Data
+ pallets: containerizing storage for reproducibility and
+ traceability. InInternational Conference on High Performance Computing
+ 2019 Jun 16 (pp. 36-45). Springer, Cham.] Taufer has picked up this
+ work and has graduated a MS student working on this topic with a
+ published thesis. See also Jimenez's P-RECS workshop at HPDC for
+ additional work highly relevant to this paper.
+
+ANSWER: Thank you for the interesting paper by Lofstead+2019 on Data
+pallets. We have cited it in Appendix A as an example of how generic the
+concept of containers is.
+
+The topic of linking data to analysis is also a core result of the criteria
+presented here, and is also discussed briefly in our paper. There are
+indeed many very interesting works on this topic. But the format of CiSE is
+very short (a maximum of ~6500 words with 12 references), so we don't have
+the space to go into this any further. But this is indeed a very
+interesting aspect for follow-up studies, especially as usage of
+Maneage grows, and we have more example workflows by users to study the
+linkage of data analysis.
+
+------------------------------
+
+
+
+
+
+54. [Reviewer 5] Some other systems that do similar things include:
+ reprozip, occam, whole tale, snakemake.
+
+ANSWER: All these tools have been reviewed in the newly added appendices.
+
+------------------------------
+
+
+
+
+
+55. [Reviewer 5] the paper needs to include the context of the current
+ community development level to be a complete research paper. A revision
+ that includes evaluation of (using the criteria) and comparison with
+ the suggested systems and a related work section that seriously
+ evaluates the work of the recommended authors, among others, would make
+ this paper worthy for publication.
+
+ANSWER: A thorough review of current low-level tools and and high-level
+reproducible workflow management systems has been added in the extended
+Appendices.
+
+------------------------------
+
+
+
+
+
+
+56. [Reviewer 5] Yet another example of a reproducible workflows project.
+
+ANSWER: As the newly added thorough comparisons with existing systems
+shows, these set of criteria and the proof-of-concept offer uniquely new
+features. As another referee summarized: "This manuscript describes a new
+reproducible workflow _which doesn't require another new trendy high-level
+software_. The proposed workflow is only based on low-level tools already
+widely known."
+
+Interestingly, the fact that we don't define yet another workflow language
+and framework is itself what makes our proof-of-concept unique. Other
+unique features of Maneage is that it is based on time-tested solutions
+(the youngest tool we use it Git which is already 15 years old) in a
+framwork that costs only ~100 kB to archive (in contrast to multi-GB
+containers or VMs).
+
+------------------------------
+
+
+
+
+
+57. [Reviewer 5] There are numerous examples, mostly domain specific, and
+ this one is not the most advanced general solution.
+
+ANSWER: As the comparisons in the appendices and clarifications above show,
+there are many features in the proposed criteria and proof of concept that
+are new and not satisfied by the domain-specific solutions known to us.
+
+------------------------------
+
+
+
+
+
+58. [Reviewer 5] Lack of context in the field missing very relevant work
+ that eliminates much, if not all, of the novelty of this work.
+
+ANSWER: The newly added appendices thoroughly describe the context and
+previous work that has been done in this field.
+
+------------------------------