diff options
Diffstat (limited to 'peer-review/1-answer.txt')
| -rw-r--r-- | peer-review/1-answer.txt | 1040 | 
1 files changed, 1040 insertions, 0 deletions
| diff --git a/peer-review/1-answer.txt b/peer-review/1-answer.txt new file mode 100644 index 0000000..76244bc --- /dev/null +++ b/peer-review/1-answer.txt @@ -0,0 +1,1040 @@ +1.  [EiC] Some reviewers request additions, and overview of other +    tools. + +ANSWER: Indeed, there is already a large body work in various issues that +have been touched upon in this paper. Before submitting the paper, we had +already done a very comprehensive review of the tools (as you may notice +from the Git repository[1]). However, the CiSE Author Information +explicitly states: "The introduction should provide a modicum of background +in one or two paragraphs, but should not attempt to give a literature +review". This is also practiced in previously published papers at CiSE and +is in line with the very limited word-count and maximum of 12 references to +be used in bibliography. + +We were also eager to get that extensive review out (which took a lot of +time, and most of the tools were actually run andtested). Hence we +discussed this privately with the editors and this solution was agreed +upon: we include that extended review as appendices on the arXiv[2] and +Zenodo[3] pre-prints of this paper and mention those publicly available +appendices in the submitted paper for an interested reader to followup. + +[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/tex/src/paper-long.tex#L1579 +[2] https://arxiv.org/abs/2006.03018 +[3] https://doi.org/10.5281/zenodo.3872247 + +------------------------------ + + + + + +2.  [Associate Editor] There are general concerns about the paper +    lacking focus + +ANSWER: + +------------------------------ + + + + + +3.  [Associate Editor] Some terminology is not well-defined +    (e.g. longevity). + +ANSWER: It has now been clearly defined in the first paragraph of Section +II. With this definition, the main argument of the paper much more clear, +thank you (and the referees for highlighting this). + +------------------------------ + + + + + +4.  [Associate Editor] The discussion of tools could benefit from some +    categorization to characterize their longevity. + +ANSWER: The longevity of the general tools reviewed in Section II are now +mentioned immediately after each (highlighted in green). + +------------------------------ + + + + + +5.  [Associate Editor] Background and related efforts need significant +    improvement. (See below.) + +ANSWER: This has been done, as mentioned in (1). + +------------------------------ + + + + + +6.  [Associate Editor] There is consistency among the reviews that +    related work is particularly lacking. + +ANSWER: This has been done, as mentioned in (1). + +------------------------------ + + + + + +7.  [Associate Editor] The current work needs to do a better job of +    explaining how it deals with the nagging problem of running on CPU +    vs. different architectures. + +ANSWER: The CPU architecture of the running system is now reported in the +"Acknowledgments" section and a description of the problem and its solution +in Maneage is also added in the "Proof of concept: Maneage" Section. + +------------------------------ + + + + + +8.  [Associate Editor] At least one review commented on the need to +    include a discussion of continuous integration (CI) and its +    potential to help identify problems running on different +    architectures. Is CI employed in any way in the work presented in +    this article? + +ANSWER: CI has been added in the discussion as one solution to find +breaking points in operating system updates and new/different +architectures. For the core Maneage branch, we have defined task #15741 [1] +to add CI on many architectures in the near future. + +[1] http://savannah.nongnu.org/task/?15741 + +------------------------------ + + + + + +9.  [Associate Editor] The presentation of the Maneage tool is both +    lacking in clarity and consistency with the public +    information/documentation about the tool. While our review focus +    is on the article, it is important that readers not be confused +    when they visit your site to use your tools. + +########################### +ANSWER [NOT COMPLETE]: We should separate the various sections of the +README-hacking.md webpage into smaller pages that can be entered. +########################### + +------------------------------ + + + + + +10. [Associate Editor] A significant question raised by one review is +    how this work compares to "executable" papers and Jupyter +    notebooks.  Does this work embody similar/same design principles +    or expand upon the established alternatives? In any event, a +    discussion of this should be included in background/motivation and +    related work to help readers understand the clear need for a new +    approach, if this is being presented as new/novel. + +ANSWER: Thank you for highlighting this important point. We saw that its +necessary to contrast our proof of concept demonstration more directly with +Maneage. Two paragraphs have been added in Sections II and IV for this. + +------------------------------ + + + + + +11. [Reviewer 1] Adding an explicit list of contributions would make +    it easier to the reader to appreciate these. These are not +    mentioned/cited and are highly relevant to this paper (in no +    particular order): +     1.  Git flows, both in general and in particular for research. +     2.  Provenance work, in general and with git in particular +     3.  Reprozip: https://www.reprozip.org/ +     4.  OCCAM: https://occam.cs.pitt.edu/ +     5.  Popper: http://getpopper.io/ +     6.  Whole Tale: https://wholetale.org/ +     7.  Snakemake: https://github.com/snakemake/snakemake +     8.  CWL https://www.commonwl.org/ and WDL https://openwdl.org/ +     9.  Nextflow: https://www.nextflow.io/ +     10. Sumatra: https://pythonhosted.org/Sumatra/ +     11. Podman: https://podman.io +     12. AppImage (https://appimage.org/) +     13. Flatpack (https://flatpak.org/) +     14. Snap (https://snapcraft.io/) +     15. nbdev https://github.com/fastai/nbdev and jupytext +     16. Bazel: https://bazel.build/ +     17. Debian reproducible builds: https://wiki.debian.org/ReproducibleBuilds + +ANSWER: + +1.  In Section IV, we have added that "Generally, any git flow (branching +    strategies) can be used by the high-level project authors or future +    readers." +2.  We have mentioned research objects as one mode of provenance tracking +    and the related provenance work that has already been done and can be +    exploited using these criteria and our proof of concept is indeed very +    large. However, the 6250 word-count limit is very tight and if we add +    more on it in this length, we would have to remove more directly +    relevant points. Hopefully this can be the subject of a follow up +    paper. +3.  A review of ReproZip is in Appendix B. +4.  A review of Occam is in Appendix B. +5.  A review of Popper is in Appendix B. +6.  A review of Whole tale is in Appendix B. +7.  A review of Snakemake is in Appendix A. +8.  CWL and WDL are described in Appendix A (job management). +9.  Nextflow is described in Appendix A (job management). +10. Sumatra is described in Appendix B. +11. Podman is mentioned in Appendix A (containers). +12. AppImage is mentioned in Appendix A (package management). +13. Flatpak is mentioned in Appendix A (package management). +14. nbdev and jupytext are high-level tools to generate documentation and +    packaging custom code in Conda or pypi. High-level package managers +    like Conda and Pypi have already been thoroughly reviewed in Appendix A +    for their longevity issues, so we feel there is no need to include +    these. +15. Bazel has been mentioned in Appendix A (job management). +16. Debian's reproducible builds is only for ensuring that software +    packaged for Debian are bitwise reproducible. As mentioned in the +    discussion of this paper, the bitwise reproducibility of software is +    not an issue in the context discussed here, the reproducibility of the +    relevant output data of the software is the main issue. + + +------------------------------ + + + + + +12. [Reviewer 1] Existing guidelines similar to the proposed "Criteria +    for longevity". Many articles of these in the form "10 simple +    rules for X", for example (not exhaustive list): +     * https://doi.org/10.1371/journal.pcbi.1003285 +     * https://arxiv.org/abs/1810.08055 +     * https://osf.io/fsd7t/ +     * A model project for reproducible papers: https://arxiv.org/abs/1401.2000 +     * Executable/reproducible paper articles and original concepts + +ANSWER: Thank you for highlighting these points. Appendix B starts with a +subsection titled "suggested rules, checklists or criteria" that review of +existing criteria. that include the proposed sources here (and others). + +arXiv:1401.2000 has been added in Appendix A as an example paper using +virtual machines. We thank the referee for bringing up this paper, because +the link to the VM provided in the paper no longer works (the file has been +removed on the server). Therefore added with SHARE, it very nicely +highlighting our main issue with binary containers or VMs and their lack of +longevity. + +------------------------------ + + + + + +13. [Reviewer 1] Several claims in the manuscript are not properly +    justified, neither in the text nor via citation. Examples (not +    exhaustive list): +     1. "it is possible to precisely identify the Docker “images” that +        are imported with their checksums, but that is rarely practiced +        in most solutions that we have surveyed [which ones?]" +     2. "Other OSes [which ones?] have similar issues because pre-built +        binary files are large and expensive to maintain and archive." +     3. "Researchers using free software tools have also already had +        some exposure to it" +     4. "A popular framework typically falls out of fashion and +        requires significant resources to translate or rewrite every +        few years." + +ANSWER: They have been clarified in the highlighted parts of the text: + +1. Many examples have been given throughout the newly added appendices. To +   avoid confusion in the main body of the paper, we have removed the "we +   have surveyed" part. It is already mentioned above it that a large +   survey of existing methods/solutions is given in the appendices. + +2. Due to the thorough discussion of this issue in the appendices with +   precise examples, this line has been removed to allow space for the +   other points raised by the referees. The main point (high cost of +   keeping binaries) is aldreay abundantly clear. + +   On a similar topic, Dockerhub's recent announcement that inactive images +   (for over 6 months) will be deleted has also been added. The announcemnt +   URL is here (it was too long to include in the paper, if IEEE has a +   special short-url format, we can add it): +   https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates + +3. A small statement has been added, reminding the readers that almost all +   free software projects are built with Make (note that CMake is just a +   high-level wrapper over Make: it finally produces a 'Makefile'). + +4. The example of Python 2 has been added. + + +------------------------------ + + + + + +14. [Reviewer 1] As mentioned in the discussion by the authors, not +    even Bash, Git or Make is reproducible, thus not even Maneage can +    address the longevity requirements. One possible alternative is +    the use of CI to ensure that papers are re-executable (several +    papers have been written on this topic). Note that CI is +    well-established technology (e.g. Jenkins is almost 10 years old). + +ANSWER: Thank you for raising this issue. We had initially planned to add +this issue also, but like many discussion points, we were forced to remove +it before the first submission due to the very tight word-count limit. We +have now added a sentence on CI in the discussion. + +On the initial note, indeed, the "executable" files of Bash, Git or Make +are not bitwise reproducible/identical on different systems. However, as +mentioned in the discussion, we are concerned with the _output_ of the +software's executable file, _after_ the execution of its job. We (or any +user of Bash) is not interested in the executable file itself. The +reproducibility of the binary file only becomes important if a bug is found +(very rare for common usage in such core software of the OS).  Hence even +though the compiled binary files of specific versions of Git, Bash or Make +will not be bitwise reproducible/identical on different systems, their +outputs are exactly reproducible: 'git describe' or Bash's 'for' loop will +have the same output on GNU/Linux, macOS or FreeBSD (that produce bit-wise +different executables). + +------------------------------ + + + + + +15. [Reviewer 1] Criterion has been proposed previously. Maneage itself +    provides little novelty (see comments below). + +ANSWER: The previously suggested criteria that were mentioned are reviewed +in the newly added Appendix B, and the novelty/necessity of the proposed +criteria is shown by comparison there. + +------------------------------ + + + + + +16. [Reviewer 2] Authors should add indication that using good practices it +    is possible to use Docker or VM to obtain identical OS usable for +    reproducible research. + +ANSWER: In the submitted version we had stated that "Ideally, it is +possible to precisely identify the Docker “images” that are imported with +their checksums ...". But to be more clear and directly to the point, it +has been edited to explicity say "... to recreate an identical OS image +later". + +------------------------------ + + + + + +17. [Reviewer 2] The CPU architecture of the platform used to run the +    workflow is not discussed in the manuscript. Authors should probably +    take into account the architecture used in their workflow or at least +    report it. + +ANSWER: Thank you very much for raising this important point. We hadn't +seen other reproducibility papers mention this important point and missed +it. In the acknowledgments (where we also mention the commit hashes) we now +explicity mention the exact CPU architecture used to build this paper: +"This project was built on an x86_64 machine with Little Endian byte-order +and address sizes 39 bits physical, 48 bits virtual.". This is because we +have already seen cases where the architecture is the same, but programs +fail because of the byte-order. + +Generally, Maneage will now extract this information from the running +system during its configuration phase and provide the users with three +different LaTeX macros that they can use anywhere in their paper. + +------------------------------ + + + + + +18. [Reviewer 2] I don’t understand the "no dependency beyond +    POSIX". Authors should more explained what they mean by this sentence. + +ANSWER: This has been clarified with the short extra statement "a minimal +Unix-like standard that is shared between many operating systems". We would +have liked to explain this more, but the word-limit is very constraining. + +------------------------------ + + + + + +19. [Reviewer 2] Unfortunately, sometime we need proprietary or specialized +    software to read raw data... For example in genetics, micro-array raw +    data are stored in binary proprietary formats. To convert this data +    into a plain text format, we need the proprietary software provided +    with the measurement tool. + +ANSWER: Thank you very much for this good point. A description of a +possible solution to this has been added after criteria 8. + +------------------------------ + + + + + +20. [Reviewer 2] I was not able to properly set up a project with +    Maneage. The configuration step failed during the download of tools +    used in the workflow. This is probably due to a firewall/antivirus +    restriction out of my control. How frequent this failure happen to +    users? + +ANSWER: Thank you for mentioning this. This has been fixed by archiving all +Maneage'd software on Zenodo (https://doi.org/10.5281/zenodo.3883409) and +also downloading from there. + +Until recently we would directly access each software's own webpage to +download the files, and this caused many problems like this. In other +cases, we were very frustrated when a software's webpage would temporarily +be unavailable (for maintainance reasons), this wouldn't allow us to build +new projects. + +Since all the software are free, we are allowed to re-distribute them and +Zenodo is defined for long-term archival of academic artifacts, so we +figured that a software source code repository on Zenodo would be the most +reliable solution. At configure time, Maneage now accesses Zenodo's DOI and +resolves the most recent URL to automatically download any necessary +software source code that the project needs from there. + +Generally, we also keep all software in a Git repository on our own +webpage: http://git.maneage.org/tarballs-software.git/tree. Also, Maneage +users can also identify their own custom URLs for downloading software, +which will be given higher priority than Zenodo (useful for situations when +a custom software is downloaded and built in a project branch (not the core +'maneage' branch). + +------------------------------ + + + + + +21. [Reviewer 2] The time to configure a new project is quite long because +    everything needs to be compiled. Authors should compare the time +    required to set up a project Maneage versus time used by other +    workflows to give an indication to the readers. + +ANSWER: Thank you for raising this point. it takes about 1.5 hours to +configure the default Maneage branch on an 8-core CPU (more than half of +this time is devoted to GCC on GNU/Linux operating systems, and the +building of GCC can optionally be disabled with the '--host-cc' option to +significantly speed up the build when the host's GCC is +similar). Furthermore, Maneage can be built within a Docker container. + +Generally, a paragraph has been added in Section IV on this issue (the +build time and building within a Docker container). We have also defined +task #15818 [1] to have our own core Docker image that is ready to build a +Maneaged project and will be adding it shortly. + +[1] https://savannah.nongnu.org/task/index.php?15818 + +------------------------------ + + + + + +22. [Reviewer 3] Authors should define their use of the term [Replicability +    or Reproducibility] briefly for their readers. + +ANSWER: "Reproducibility" has been defined along with "Longevity" and +"usage" at the start of Section II. + +------------------------------ + + + + + +23. [Reviewer 3] The introduction is consistent with the proposal of the +    article, but deals with the tools separately, many of which can be used +    together to minimize some of the problems presented. The use of +    Ansible, Helm, among others, also helps in minimizing problems. + +ANSWER: Ansible and Helm are primarily designed for distributed +computing. For example Helm is just a high-level package manager for a +Kubernetes cluster that is based on containers. A review of them can be +added in the Appendix, but we feel they may not be too relevant for this +paper. + +------------------------------ + + + + + +24. [Reviewer 3] When the authors use the Python example, I believe it is +    interesting to point out that today version 2 has been discontinued by +    the maintaining community, which creates another problem within the +    perspective of the article. + +ANSWER: Thank you very much for highlighting this point it was not included +for the sake of length, it has been fitted into the introduction now. + +------------------------------ + + + + + +25. [Reviewer 3] Regarding the use of VM's and containers, I believe that +    the discussion presented by THAIN et al., 2015 is interesting to +    increase essential points of the current work. + +ANSWER: Thank you very much for pointing this the works by Thain. We +couldn't find any first-author papers in 2015, but found Meng & Thain +(https://doi.org/10.1016/j.procs.2017.05.116) which had a related +discussion of why they didn't use Docker containers in their work. That +paper is now cited in the discussion of Containers in Appendix A. + +------------------------------ + + + + + +26. [Reviewer 3] About the Singularity, the description article was missing +    (Kurtzer GM, Sochat V, Bauer MW, 2017). + +ANSWER: Thank you for the reference, we could not put it in the main body +of the paper (like many others) due to the strict bibliography limit of 12, +but it has been cited in Appendix A (where we discuss Singularity). + +------------------------------ + + + + + +27. [Reviewer 3] I also believe that a reference to FAIR is interesting +    (WILKINSON et al., 2016). + +ANSWER: The FAIR principles have been mentioned in the main body of the +paper, but unfortunately we had to remove its citation the main paper (like +MANY others) within the maximum limit 12 references. We have cited it in +Appendix B. + +------------------------------ + + + + + +28. [Reviewer 3] In my opinion, the paragraph on IPOL seems to be out of +    context with the previous ones. This issue of end-to-end +    reproducibility of a publication could be better explored, which would +    further enrich the tool presented. + +##################################### +ANSWER: +##################################### + +------------------------------ + + + + + +29. [Reviewer 3] On the project website, I suggest that the information +    contained in README-hacking be presented on the same page as the +    Tutorial. A topic breakdown is interesting, as the markdown reading may +    be too long to find information. + +##################################### +ANSWER: +##################################### + +------------------------------ + + + + + +31. [Reviewer 3] The tool is suitable for Unix users, keeping users away +    from Microsoft environments. + +ANSWER: The issue of building on Windows has been discussed in Section IV, +either using Docker (or VMs) or using the Windows Subsystem for Linux. + +------------------------------ + + + + +32. [Reviewer 3] Important references are missing; more references are +    needed + +ANSWER: Two comprehensive Appendices have beed added to address this issue. + +------------------------------ + + + + + +33. [Reviewer 4] Revisit the criteria, show how you have come to decide on +    them, give some examples of why they are important, and address +    potential missing criteria. + +for example the referee already points to "how code is written" as a +criteria (for example for threading or floating point errors), or +"performance". + +################################# +ANSWER: +################################# + +------------------------------ + + + + + +34. [Reviewer 4] Clarify the discussion of challenges to adoption and make +    it clearer which tradeoffs are important to practitioners. + +########################## +ANSWER: +########################## + +------------------------------ + + + + + +35. [Reviewer 4] Be clearer about which sorts of research workflow are best +    suited to this approach. + +################################ +ANSWER: +################################ + +------------------------------ + + + + + +36. [Reviewer 4] There is also the challenge of mathematical +    reproducibility, particularly of the handling of floating point number, +    which might occur because of the way the code is written, and the +    hardware architecture (including if code is optimised / parallelised). + +################################ +ANSWER: +################################ + +------------------------------ + + + + + +37. [Reviewer 4] Performance ... is never mentioned + +################################ +ANSWER: +################################ + +------------------------------ + +38. [Reviewer 4] Tradeoff, which might affect Criterion 3 is time to result, +    people use popular frameworks because it is easier to use them. + +################################ +ANSWER: +################################ + +------------------------------ + + + + + +39. [Reviewer 4] I would liked to have seen explanation of how these +    challenges to adoption were identified: was this anecdotal, through +    surveys? participant observation? + +ANSWER: The results mentioned here are based on private discussions after +holding multiple seminars and Webinars with RDA's support, and also a +workshop that was planned for non-astronomers. We even invited (funded) +early career researchers to come to the workshop with the RDA funding, +however, that workshop was cancelled due to the pandemic and we had private +communications after. + +We would very much like to elaborate on this experience of training new +researchers with these tools. However, as with many of the cases above, the +very strict word-limit doesn't allow us to elaborate beyond what is already +there. + +------------------------------ + + + + + +40. [Reviewer 4] Potentially an interesting sidebar to investigate how +    LaTeX/TeX has ensured its longevity! + +############################## +ANSWER: +############################## + +------------------------------ + + + + + +41. [Reviewer 4] The title is not specific enough - it should refer to the +    reproducibility of workflows/projects. + +############################## +ANSWER: +############################## + +------------------------------ + + + + + +42. [Reviewer 4] Whilst the thesis stated is valid, it may not be useful to +    practitioners of computation science and engineering as it stands. + +ANSWER: We would appreciate if you could clarify this point a little +more. We have shown how it has already been used in many research projects +(also outside of observational astronomy which is the first author's main +background). It is precisely defined for computational science and +engineering problems where _publication_ of the human-readable workflow +source is also important. + +------------------------------ + + + + + +43. [Reviewer 4] Longevity is not defined. + +ANSWER: It has been defined now at the start of Section II. + +------------------------------ + + + + + +44. [Reviewer 4] Whilst various tools are discussed and discarded, no +    attempt is made to categorise the magnitude of longevity for which they +    are relevant. For instance, environment isolators are regarded by the +    software preservation community as adequate for timescale of the order +    of years, but may not be suitable for the timescale of decades where +    porting and emulation are used. + +ANSWER: Statements on quantifying their longevity have been added in +Section II. For example in the case of Docker images: "their longevity is +determined by the host kernel, usually a decade", for Python packages: +"Python installation with a usual longevity of a few years", for Nix/Guix: +"with considerably better longevity; same as supported CPU architectures." + +------------------------------ + + + + + +45. [Reviewer 4] The title of this section "Commonly used tools and their +    longevity" is confusing - do you mean the longevity of the tools or the +    longevity of the workflows that can be produced using these tools? +    What happens if you use a combination of all four categories of tools? + +########################## +ANSWER: +########################## + +------------------------------ + + + + + +46. [Reviewer 4] It wasn't clear to me if code was being run to generate +    the results and figures in a LaTeX paper that is part of a project in +    Maneage. It appears to be suggested this is the case, but Figure 1 +    doesn't show how this works - it just has the LaTeX files, the data +    files and the Makefiles. Is it being suggested that LaTeX itself is the +    programming language, using its macro functionality? + +ANSWER: Thank you for highlighting this point of confusion. The caption of +Figure 1 has been edited to hopefully clarify the point. In short, the +arrows represent the operation of software on their inputs (the file they +originate from) to generate their outputs (the file they point to). In the +case of generating 'paper.pdf' from its three dependencies +('references.tex', 'paper.tex' and 'project.tex'), yes, LaTeX is used. But +in other steps, other tools are used. For example as you see in [1] the +main step of the arrow connecting 'table-3.txt' to 'tools-per-year.txt' is +an AWK command (there are also a few 'echo' commands for meta data and +copyright in the output plain-text file [2]). + +[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/reproduce/analysis/make/demo-plot.mk#L51 +[2] https://zenodo.org/record/3911395/files/tools-per-year.txt + +------------------------------ + + + + + +47. [Reviewer 4] I was a bit confused on how collaboration is handled as +    well - this appears to be using the Git branching model, and the +    suggestion that Maneage is keeping track of all components from all +    projects - but what happens if you are working with collaborators that +    are using their own Maneage instance? + +ANSWER: Indeed, Maneage operates based on the Git branching model. As +mentioned in the text, Maneage is itself a Git branch. People create their +own branch from the 'maneage' branch and start customizing it for their +particular project in their own particular repository. They can also use +all types of Git-based collaborating models to work together on a project +that is not yet finished. + +Figure 2 infact explicitly shows such a case: the main project leader is +committing on the "project" branch. But a collaborator creates a separate +branch over commit '01dd812' and makes a couple of commits ('f69e1f4' and +'716b56b'), and finally asks the project leader to merge them into the +project. This can be generalized to any Git based collaboration model. + +------------------------------ + + + + + +48. [Reviewer 4] I would also liked to have seen a comparison between this +    approach and other "executable" paper approaches e.g. Jupyter +    notebooks, compared on completeness, time taken to write a "paper", +    ease of depositing in a repository, and ease of use by another +    researcher. + +####################### +ANSWER: +####################### + +------------------------------ + + + + + +49. [Reviewer 4] The weakest aspect is the assumption that research can be +    easily compartmentalized into simple and complete packages. Given that +    so much of research involves collaboration and interaction, this is not +    sufficiently addressed. In particular, the challenge of +    interdisciplinary work, where there may not be common languages to +    describe concepts and there may be different common workflow practices +    will be a barrier to wider adoption of the primary thesis and criteria. + +ANSWER: Maneage was precisely defined to address the problem of +publishing/collaborating on complete workflows. Hopefully with the +clarification to point 47 above, this should also become clear. + +------------------------------ + + + + + +50. [Reviewer 5] Major figures currently working in this exact field do not +    have their work acknowledged in this work. + +ANSWER: This was due to the strict word limit and the CiSE publication +policy (to not include a literature review because there is a limit of only +12 citations). But we had indeed done a comprehensive literature review and +the editors kindly agreed that we publish that review as appendices to the +main paper on arXiv and Zenodo. + +------------------------------ + + + + + +51. [Reviewer 5] The popper convention: Making reproducible systems +    evaluation practical ... and the later revision that uses GitHub +    Actions, is largely the same as this work. + +ANSWER: This work and the proposed criteria are very different from +Popper. A review of Popper has been given in Appendix B. + +------------------------------ + + + + + +52. [Reviewer 5] The lack of attention to virtual machines and containers +    is highly problematic. While a reader cannot rely on DockerHub or a +    generic OS version label for a VM or container, these are some of the +    most promising tools for offering true reproducibility. + +ANSWER: Containers and VMs have been more thoroughly discussed in the main +body and also extensively discussed in appendix A (that are now available +in the arXiv and Zenodo versions of this paper). As discussed (with many +cited examples), Contains and VMs are only good when they are themselves +reproducible (for example running the Dockerfile this year and next year +gives the same internal environment). However we show that this is not the +case in most solutions (a more comprehensive review would require its own +paper). + +However with complete/robust environment builders like Maneage, Nix or GNU +Guix, the analysis environment within a container can be exactly reproduced +later. But even so, due to their binary nature and large storage volume, +they are not trusable sources for the long term (it is expensive to archive +them). We show several example in the paper of how projects that relied on +VMs in 2011 and 2014 are no longer active, and how even Dockerhub will be +deleting containers that are not used for more than 6 months in free +accounts (due to the large storage costs). + +------------------------------ + + + + + +53. [Reviewer 5] On the data side, containers have the promise to manage +    data sets and workflows completely [Lofstead J, Baker J, Younge A. Data +    pallets: containerizing storage for reproducibility and +    traceability. InInternational Conference on High Performance Computing +    2019 Jun 16 (pp. 36-45). Springer, Cham.] Taufer has picked up this +    work and has graduated a MS student working on this topic with a +    published thesis. See also Jimenez's P-RECS workshop at HPDC for +    additional work highly relevant to this paper. + +ANSWER: Thank you for the interesting paper by Lofstead+2019 on Data +pallets. We have cited it in Appendix A as examples of how generic the +concept of containers is. + +The topic of linking data to analysis is also a core result of the criteria +presented here, and is also discussed shortly in the paper.  There are +indeed many very interesting works on this topic. But the format of CiSE is +very short (a maximum of ~6000 words with 12 references), so we don't have +the space to go into this any further. But this is indeed a very +interesting aspect for follow up studies, especially as the usage of +Maneage incrases, and we have more example workflows by users to study the +linkage of data analysis. + +------------------------------ + + + + + +54. [Reviewer 5] Some other systems that do similar things include: +    reprozip, occam, whole tale, snakemake. + +ANSWER: All these tools have been reviewed in the newly added appendices. + +------------------------------ + + + + + +55. [Reviewer 5] the paper needs to include the context of the current +    community development level to be a complete research paper. A revision +    that includes evaluation of (using the criteria) and comparison with +    the suggested systems and a related work section that seriously +    evaluates the work of the recommended authors, among others, would make +    this paper worthy for publication. + +ANSWER: A thorough review of current low-level tools and and high-level +reproducible workflow management systems has been added in the extended +Appendix. + +------------------------------ + + + + + +56. [Reviewer 5] Offers criteria any system that offers reproducibility +   should have. + +ANSWER: + +------------------------------ + + + + + +57. [Reviewer 5] Yet another example of a reproducible workflows project. + +ANSWER: As the newly added thorough comparisons with existing systems +shows, these set of criteria and the proof-of-concept offer uniquely new +features. As another referee summarized: "This manuscript describes a new +reproducible workflow which doesn't require another new trendy high-level +software. The proposed workflow is only based on low-level tools already +widely known." + +The fact that we don't define yet another workflow language and framework +and base the whole workflow on time-tested solutions in a framwork that +costs only ~100 kB to archive (in contrast to multi-GB containers or VMs) +is new. + +------------------------------ + + + + + +58. [Reviewer 5] There are numerous examples, mostly domain specific, and +    this one is not the most advanced general solution. + +ANSWER: As the comparisons in the appendices and clarifications above show, +there are many features in the proposed criteria and proof of concept that +are new. + +------------------------------ + + + + + +59. [Reviewer 5] Lack of context in the field missing very relevant work +    that eliminates much, if not all, of the novelty of this work. + +ANSWER: The newly added appendices thoroughly describe the context and +previous work that has been done in this field. + +------------------------------ | 
