1. [EiC] Some reviewers request additions, and overview of other tools. ANSWER: Indeed, there is already a large body work in various issues that have been touched upon in this paper. Before submitting the paper, we had already done a very comprehensive review of the tools (as you may notice from the Git repository[1]). However, the CiSE Author Information explicitly states: "The introduction should provide a modicum of background in one or two paragraphs, but should not attempt to give a literature review". This is the usual practice in previously published papers at CiSE and is in line with the very limited word count and maximum of 12 references to be used in bibliography. We agree with the need for this extensive review to be on the public record (creating the review took a lot of time and effort; most of the tools were run and tested). We discussed this with the editors and the following solution was agreed upon: we include the extended review as a set of appendices in the arXiv[2] and Zenodo[3] pre-prints of this paper and mention these publicly available appendices in the submitted paper so that any interested reader can easily access them. [1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/tex/src/paper-long.tex#L1579 [2] https://arxiv.org/abs/2006.03018 [3] https://doi.org/10.5281/zenodo.3872247 ------------------------------ 2. [Associate Editor] There are general concerns about the paper lacking focus ANSWER: With all the corrections/clarifications that have been done in this review the focus of the paper should be clear now. We are very grateful to the thorough listing of points by the referees. ------------------------------ 3. [Associate Editor] Some terminology is not well-defined (e.g. longevity). ANSWER: Reproducibility, Longevity and Usage have now been explicitly defined in the first paragraph of Section II. With this definition, the main argument of the paper is clearer, thank you (and thank you to the referees for highlighting this). ------------------------------ 4. [Associate Editor] The discussion of tools could benefit from some categorization to characterize their longevity. ANSWER: The longevity of the general tools reviewed in Section II is now mentioned immediately after each (VMs, SHARE: discontinued in 2019; Docker: 6 months; python-dependent package managers: a few years; Jupyter notebooks: shortest longevity non-core python dependency). ------------------------------ 5. [Associate Editor] Background and related efforts need significant improvement. (See below.) ANSWER: This has been done, as mentioned in (1.) above. ------------------------------ 6. [Associate Editor] There is consistency among the reviews that related work is particularly lacking. ANSWER: This has been done, as mentioned in (1.) above. ------------------------------ 7. [Associate Editor] The current work needs to do a better job of explaining how it deals with the nagging problem of running on CPU vs. different architectures. ANSWER: The CPU architecture of the running system is now reported in the "Acknowledgments" section and a description of the problem and its solution in Maneage is also added and illustrated in the "Proof of concept: Maneage" Section. ------------------------------ 8. [Associate Editor] At least one review commented on the need to include a discussion of continuous integration (CI) and its potential to help identify problems running on different architectures. Is CI employed in any way in the work presented in this article? ANSWER: CI has been added in the discussion section (V) as one solution to find breaking points in operating system updates and new/different architectures. For the core Maneage branch, we have defined task #15741 [1] to add CI on many architectures in the near future. [1] http://savannah.nongnu.org/task/?15741 ------------------------------ 9. [Associate Editor] The presentation of the Maneage tool is both lacking in clarity and consistency with the public information/documentation about the tool. While our review focus is on the article, it is important that readers not be confused when they visit your site to use your tools. ANSWER: Thank you for raising this important point. We have broken down the very long "About" page into multiple pages to help in readability: https://maneage.org/about.html Generally, the webpage will soon undergo major improvements to be even more clear. ------------------------------ 10. [Associate Editor] A significant question raised by one review is how this work compares to "executable" papers and Jupyter notebooks. Does this work embody similar/same design principles or expand upon the established alternatives? In any event, a discussion of this should be included in background/motivation and related work to help readers understand the clear need for a new approach, if this is being presented as new/novel. ANSWER: Thank you for highlighting this important point. We saw that it is necessary to contrast our Maneage proof-of-concept demonstration more directly against the Jupyter notebook type of approach. Two paragraphs have been added in Sections II and IV to clarify this (our criteria require and build in more modularity and longevity than Jupyter). ------------------------------ 11. [Reviewer 1] Adding an explicit list of contributions would make it easier to the reader to appreciate these. These are not mentioned/cited and are highly relevant to this paper (in no particular order): 1. Git flows, both in general and in particular for research. 2. Provenance work, in general and with git in particular 3. Reprozip: https://www.reprozip.org/ 4. OCCAM: https://occam.cs.pitt.edu/ 5. Popper: http://getpopper.io/ 6. Whole Tale: https://wholetale.org/ 7. Snakemake: https://github.com/snakemake/snakemake 8. CWL https://www.commonwl.org/ and WDL https://openwdl.org/ 9. Nextflow: https://www.nextflow.io/ 10. Sumatra: https://pythonhosted.org/Sumatra/ 11. Podman: https://podman.io 12. AppImage (https://appimage.org/) 13. Flatpack (https://flatpak.org/) 14. Snap (https://snapcraft.io/) 15. nbdev https://github.com/fastai/nbdev and jupytext 16. Bazel: https://bazel.build/ 17. Debian reproducible builds: https://wiki.debian.org/ReproducibleBuilds ANSWER: 1. In Section IV, we have added that "Generally, any git flow (branching strategies) can be used by the high-level project authors or future readers." 2. We have mentioned research objects as one mode of provenance tracking and the related provenance work that has already been done and can be exploited using these criteria and our proof of concept is indeed very large. However, the 6250 word-count limit is very tight and if we add more on it in this length, we would have to remove points of higher priority. Hopefully this can be the subject of a follow-up paper. 3. A review of ReproZip is in Appendix B. 4. A review of Occam is in Appendix B. 5. A review of Popper is in Appendix B. 6. A review of Whole Tale is in Appendix B. 7. A review of Snakemake is in Appendix A. 8. CWL and WDL are described in Appendix A (Job management). 9. Nextflow is described in Appendix A (Job management). 10. Sumatra is described in Appendix B. 11. Podman is mentioned in Appendix A (Containers). 12. AppImage is mentioned in Appendix A (Package management). 13. Flatpak is mentioned in Appendix A (Package management). 14. Snap is mentioned in Appendix A (Package management). 15. nbdev and jupytext are high-level tools to generate documentation and packaging custom code in Conda or pypi. High-level package managers like Conda and Pypi have already been thoroughly reviewed in Appendix A for their longevity issues, so we feel that there is no need to include these. 16. Bazel is mentioned in Appendix A (job management). 17. Debian's reproducible builds are only designed for ensuring that software packaged for Debian is bitwise reproducible. As mentioned in the discussion section of this paper, the bitwise reproducibility of software is not an issue in the context discussed here; the reproducibility of the relevant output data of the software is the main issue. ------------------------------ 12. [Reviewer 1] Existing guidelines similar to the proposed "Criteria for longevity". Many articles of these in the form "10 simple rules for X", for example (not exhaustive list): * https://doi.org/10.1371/journal.pcbi.1003285 * https://arxiv.org/abs/1810.08055 * https://osf.io/fsd7t/ * A model project for reproducible papers: https://arxiv.org/abs/1401.2000 * Executable/reproducible paper articles and original concepts ANSWER: Thank you for highlighting these points. Appendix B starts with a subsection titled "suggested rules, checklists or criteria" with a review of existing sets of criteria. This subsection includes the sources proposed by the reviewer [Sandve et al; Rule et al; Nust et al] (and others). ArXiv:1401.2000 has been added in Appendix A as an example paper using virtual machines. We thank the referee for bringing up this paper, because the link to the VM provided in the paper no longer works (the URL http://archive.comp-phys.org/provenance_challenge/provenance_machine.ova redirects to https://share.phys.ethz.ch//~alpsprovenance_challenge/provenance_machine.ova which gives a 'Not Found' html response). Together with SHARE, this very nicely highlights our main issue with binary containers or VMs: their lack of longevity. ------------------------------ 13. [Reviewer 1] Several claims in the manuscript are not properly justified, neither in the text nor via citation. Examples (not exhaustive list): 1. "it is possible to precisely identify the Docker “images” that are imported with their checksums, but that is rarely practiced in most solutions that we have surveyed [which ones?]" 2. "Other OSes [which ones?] have similar issues because pre-built binary files are large and expensive to maintain and archive." 3. "Researchers using free software tools have also already had some exposure to it" 4. "A popular framework typically falls out of fashion and requires significant resources to translate or rewrite every few years." ANSWER: These points have been clarified in the highlighted parts of the text: 1. Many examples have been given throughout the newly added appendices. To avoid confusion in the main body of the paper, we have removed the "we have surveyed" part. It is already mentioned above this point in the text that a large survey of existing methods/solutions is given in the appendices. 2. Due to the thorough discussion of this issue in the appendices with precise examples, this line has been removed to allow space for the other points raised by the referees. The main point (high cost of keeping binaries) is already abundantly clear. On a similar topic, Dockerhub's recent announcement that inactive images (for over 6 months) will be deleted has also been added. The announcemnt URL is here (it was too long to include in the paper, if IEEE has a special short-url format, we can add it): https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates 3. A small statement has been added, reminding the readers that almost all free software projects are built with Make (CMake is popular, but it is just a high-level wrapper over Make: it finally produces a 'Makefile'; practical usage of CMake generally obliges the user to understand Make). 4. The example of Python 2 has been added. ------------------------------ 14. [Reviewer 1] As mentioned in the discussion by the authors, not even Bash, Git or Make is reproducible, thus not even Maneage can address the longevity requirements. One possible alternative is the use of CI to ensure that papers are re-executable (several papers have been written on this topic). Note that CI is well-established technology (e.g. Jenkins is almost 10 years old). ANSWER: Thank you for raising these issues. We had initially planned to discuss CIs, but like many discussion points, we were forced to remove it before the first submission due to the very tight word-count limit. We have now added a sentence on CI in the discussion. On the issue of Bash/Git/Make, indeed, the _executable_ Bash, Git and Make binaries are not bitwise reproducible/identical on different systems. However, as mentioned in the discussion, we are concerned with the _output_ of the software's executable file, _after_ the execution of its job. We (or any user of Bash) is not interested in the executable file itself. The reproducibility of the binary file only becomes important if a significant bug is found (very rare for ordinary usage of such core software of the OS). Hence, even though the compiled binary files of specific versions of Git, Bash or Make will not be bitwise reproducible/identical on different systems, their scientific outputs are exactly reproducible: 'git describe' or Bash's 'for' loop will have the same output on GNU/Linux, macOS/Darwin or FreeBSD (despite having bit-wise different executables). ------------------------------ 15. [Reviewer 1] Criterion has been proposed previously. Maneage itself provides little novelty (see comments below). ANSWER: The previously suggested sets of criteria that were listed by Reviewer 1 are reviewed by us in the newly added Appendix B, and the novelty and advantages of our proposed criteria are contrasted there with the earlier sets of criteria. ------------------------------ 16. [Reviewer 2] Authors should add indication that using good practices it is possible to use Docker or VM to obtain identical OS usable for reproducible research. ANSWER: In the submitted version we had stated that "Ideally, it is possible to precisely identify the Docker “images” that are imported with their checksums ...". But to be more clear and go directly to the point, it has been edited to explicity say "... to recreate an identical OS image later". ------------------------------ 17. [Reviewer 2] The CPU architecture of the platform used to run the workflow is not discussed in the manuscript. Authors should probably take into account the architecture used in their workflow or at least report it. ANSWER: Thank you very much for raising this important point. We hadn't seen other reproducibility papers mention this important point and missed it. In the acknowledgments (where we also mention the commit hashes) we now explicitly mention the exact CPU architecture used to build this paper: "This project was built on an x86_64 machine with Little Endian byte-order and address sizes 39 bits physical, 48 bits virtual.". This is because we have already seen cases where the architecture is the same, but programs fail because of the byte order. Generally, Maneage will now extract this information from the running system during its configuration phase and provide the users with three different LaTeX macros that they can use anywhere in their paper. ------------------------------ 18. [Reviewer 2] I don’t understand the "no dependency beyond POSIX". Authors should more explained what they mean by this sentence. ANSWER: This has been clarified with the short extra statement "a minimal Unix-like standard that is shared between many operating systems". We would have liked to explain this more, but the word limit is very constraining. ------------------------------ 19. [Reviewer 2] Unfortunately, sometime we need proprietary or specialized software to read raw data... For example in genetics, micro-array raw data are stored in binary proprietary formats. To convert this data into a plain text format, we need the proprietary software provided with the measurement tool. ANSWER: Thank you very much for this good point. A description of a possible solution to this has been added after criterion 8. ------------------------------ 20. [Reviewer 2] I was not able to properly set up a project with Maneage. The configuration step failed during the download of tools used in the workflow. This is probably due to a firewall/antivirus restriction out of my control. How frequent this failure happen to users? ANSWER: Thank you for mentioning this. This has been fixed by archiving all Maneage'd software on Zenodo (https://doi.org/10.5281/zenodo.3883409) and also downloading from there. Until recently we would directly access each software's own webpage to download the source files, and this caused frequent problems of this sort. In other cases, we were very frustrated when a software's webpage would temporarily be unavailable (for maintenance reasons); this would be a hindrance in trying to build new projects. Since all the software is free-licensed, we are legally allowed to re-distribute it (within the conditions, such as not removing copyright notices) and Zenodo is defined for long-term archival of academic digital objects, so we decided that a software source code repository on Zenodo would be the most reliable solution. At configure time, Maneage now accesses Zenodo's DOI and resolves the most recent URL to automatically download any necessary software source code that the project needs from there. Generally, we also keep all software in a Git repository on our own webpage: http://git.maneage.org/tarballs-software.git/tree. Also, Maneage users can also identify their own custom URLs for downloading software, which will be given higher priority than Zenodo (useful for situations when custom software is downloaded and built in a project branch (not the core 'maneage' branch). ------------------------------ 21. [Reviewer 2] The time to configure a new project is quite long because everything needs to be compiled. Authors should compare the time required to set up a project Maneage versus time used by other workflows to give an indication to the readers. ANSWER: Thank you for raising this point. it takes about 1.5 hours to configure the default Maneage branch on an 8-core CPU (more than half of this time is devoted to GCC on GNU/Linux operating systems, and the building of GCC can optionally be disabled with the '--host-cc' option to significantly speed up the build when the host's GCC is similar). Furthermore, Maneage can be built within a Docker container. A paragraph has been added in Section IV on this issue (the build time and building within a Docker container). We have also defined task #15818 [1] to have our own core Docker image that is ready to build a Maneaged project and will be adding it shortly. [1] https://savannah.nongnu.org/task/index.php?15818 ------------------------------ 22. [Reviewer 3] Authors should define their use of the term [Replicability or Reproducibility] briefly for their readers. ANSWER: "Reproducibility" has been defined along with "Longevity" and "usage" at the start of Section II. ------------------------------ 23. [Reviewer 3] The introduction is consistent with the proposal of the article, but deals with the tools separately, many of which can be used together to minimize some of the problems presented. The use of Ansible, Helm, among others, also helps in minimizing problems. ANSWER: Ansible and Helm are primarily designed for distributed computing. For example Helm is just a high-level package manager for a Kubernetes cluster that is based on containers. A review of them could be added to the Appendices, but we feel they this would distract somewhat from the main points of our current paper. ------------------------------ 24. [Reviewer 3] When the authors use the Python example, I believe it is interesting to point out that today version 2 has been discontinued by the maintaining community, which creates another problem within the perspective of the article. ANSWER: Thank you very much for highlighting this point. We had excluded this point for the sake of article length, but we have restored it in the introduction of the revised version. ------------------------------ 25. [Reviewer 3] Regarding the use of VM's and containers, I believe that the discussion presented by THAIN et al., 2015 is interesting to increase essential points of the current work. ANSWER: Thank you very much for pointing out the works by Thain. We couldn't find any first-author papers in 2015, but found Meng & Thain (https://doi.org/10.1016/j.procs.2017.05.116) which had a related discussion of why they didn't use Docker containers in their work. That paper is now cited in the discussion of Containers in Appendix A. ------------------------------ 26. [Reviewer 3] About the Singularity, the description article was missing (Kurtzer GM, Sochat V, Bauer MW, 2017). ANSWER: Thank you for the reference. We are restricted in the main body of the paper due to the strict bibliography limit of 12 references; we have included Kurtzer et al 2017 in Appendix A (where we discuss Singularity). ------------------------------ 27. [Reviewer 3] I also believe that a reference to FAIR is interesting (WILKINSON et al., 2016). ANSWER: The FAIR principles have been mentioned in the main body of the paper, but unfortunately we had to remove its citation in the main paper (like many others) to keep to the maximum of 12 references. We have cited it in Appendix B. ------------------------------ 28. [Reviewer 3] In my opinion, the paragraph on IPOL seems to be out of context with the previous ones. This issue of end-to-end reproducibility of a publication could be better explored, which would further enrich the tool presented. ANSWER: Our section II discussing existing tools seems to be the most appropriate place to mention IPOL, so we have retained its position at the end of this section. We have indeed included an in-depth discussion of IPOL in Appendix B. We recommend it to the reader for any project written uniquely in C, and we comment on the readiness of Maneage'd projects for a similar level of peer-review control. ------------------------------ 29. [Reviewer 3] On the project website, I suggest that the information contained in README-hacking be presented on the same page as the Tutorial. A topic breakdown is interesting, as the markdown reading may be too long to find information. ANSWER: Thank you very much for this good suggestion, it has been implemented: https://maneage.org/about.html . The webpage will continuously be improved and such feedback is always very welcome. ------------------------------ 31. [Reviewer 3] The tool is suitable for Unix users, keeping users away from Microsoft environments. ANSWER: The issue of building on Windows has been discussed in Section IV, either using Docker (or VMs) or using the Windows Subsystem for Linux. ------------------------------ 32. [Reviewer 3] Important references are missing; more references are needed ANSWER: Two comprehensive Appendices have been added to address this issue. ------------------------------ 33. [Reviewer 4] Revisit the criteria, show how you have come to decide on them, give some examples of why they are important, and address potential missing criteria. ANSWER: Our selection of the criteria and their importance are questions of the philosophy of science: "what is good science? what should reproducibility aim for?" We feel that completeness; modularity; minimal complexity; scalability; verifiability of inputs and outputs; recording of the project history; linking of narrative to analysis; and the right to use, modify, and redistribute scientific software in original or modified form; constitute a set of criteria that should uncontroversially be seen as "important" from a wide range of ethical, social, political, and economic perspectives. An exception is probably the issue of proprietary versus free software (criterion 8), on which debate is far from closed. Within the constraints of space (the limit is 6500 words), we don't see how we could add more discussion of the history of our choice of criteria or more anecdotal examples of their relevance. We do discuss some alternatives lists of criteria in Appendix B.A, without debating the wider perspective of which criteria are the most desirable. ------------------------------ 34. [Reviewer 4] Clarify the discussion of challenges to adoption and make it clearer which tradeoffs are important to practitioners. ANSWER: We discuss many of these challenges and caveats in the Discussion Section (V), within the existing word limit. ------------------------------ 35. [Reviewer 4] Be clearer about which sorts of research workflow are best suited to this approach. ANSWER: Maneage is flexible enough to enable a wide range of workflows to be implemented. This is done by leveraging the highly modular and flexible nature of Makefiles run via 'Make'. ------------------------------ 36. [Reviewer 4] There is also the challenge of mathematical reproducibility, particularly of the handling of floating point number, which might occur because of the way the code is written, and the hardware architecture (including if code is optimised / parallelised). ANSWER: Floating point errors and optimizations have been mentioned in the discussion (Section V). The issue with parallelization has also been discussed in Section IV, in the part on verification ("Where exact reproducibility is not possible (for example due to paralleliza- tion), values can be verified by any statistical means, specified by the project authors."). ------------------------------ 37. [Reviewer 4] Performance ... is never mentioned ANSWER: Performance is indeed an important issue for _immediate_ reproducibility and we would have liked to discuss it. But due to the strict word-count, we feel that adding it to the discussion points, without having adequate space to elaborate, can confuse the readers of this paper (which is focused on long term usability). ------------------------------ 38. [Reviewer 4] Tradeoff, which might affect Criterion 3 is time to result, people use popular frameworks because it is easier to use them. ANSWER: That is true. In section IV, we have given the time it takes to build Maneage (only once for a project on each computer) to be around 1.5 hours on an 8-core CPU (a typical machine that may be used for data analysis). We therefore conclude that when the analysis is complex (and thus taking many hours or days to complete), this time is negligible. But if the project's full analysis takes 10 minutes or less (like the extremely simple analysis done in this paper which takes a fraction of a second). Indeed, the 1.5 hour building time is significant. In those cases, as discussed in the main body, the project can be built once in a Docker image and easily moved to other computers. ------------------------------ 39. [Reviewer 4] I would liked to have seen explanation of how these challenges to adoption were identified: was this anecdotal, through surveys? participant observation? ANSWER: The results mentioned here are based on private discussions after holding multiple seminars and Webinars with RDA's support, and also a workshop that was planned for non-astronomers. We even invited (funded) early career researchers to come to the workshop with the RDA funding, however, that workshop was cancelled due to the pandemic and we had private communications after. We would very much like to elaborate on this experience of training new researchers with these tools. However, as with many of the cases above, the very strict word-limit doesn't allow us to elaborate beyond what is already there. ------------------------------ 40. [Reviewer 4] Potentially an interesting sidebar to investigate how LaTeX/TeX has ensured its longevity! ANSWER: That is indeed a very interesting subject to study. We have been in touch with Karl Berry (one of the core people behind TeX Live, who also plays a prominent role in GNU) and have whitnessed the TeX Live community's efforts to become more and more portable and longer-lived. But after looking at the strict word limit, we couldn't find a place to highlight this. But it is indeed a subject worthy of a full paper (that can be very useful for many software projects0.. ------------------------------ 41. [Reviewer 4] The title is not specific enough - it should refer to the reproducibility of workflows/projects. ANSWER: Since this journal is focused on "Computing in Science and Engineering", the fact that it relates to computational workflows will be clear to any reader. Since the other referees didn't complain about this, we will keep it as it was, but of course, we are open to the suggestions of the editors in the final title. ------------------------------ 42. [Reviewer 4] Whilst the thesis stated is valid, it may not be useful to practitioners of computation science and engineering as it stands. ANSWER: We would appreciate if you could clarify this point a little more. We have shown how it has already been used in many research projects (also outside of observational astronomy which is the first author's main background). It is precisely defined for computational science and engineering problems where _publication_ of the human-readable workflow source is also important. ------------------------------ 43. [Reviewer 4] Longevity is not defined. ANSWER: It has been defined now at the start of Section II. ------------------------------ 44. [Reviewer 4] Whilst various tools are discussed and discarded, no attempt is made to categorise the magnitude of longevity for which they are relevant. For instance, environment isolators are regarded by the software preservation community as adequate for timescale of the order of years, but may not be suitable for the timescale of decades where porting and emulation are used. ANSWER: Statements on quantifying their longevity have been added in Section II. For example in the case of Docker images: "their longevity is determined by the host kernel, usually a decade", for Python packages: "Python installation with a usual longevity of a few years", for Nix/Guix: "with considerably better longevity; same as supported CPU architectures." ------------------------------ 45. [Reviewer 4] The title of this section "Commonly used tools and their longevity" is confusing - do you mean the longevity of the tools or the longevity of the workflows that can be produced using these tools? What happens if you use a combination of all four categories of tools? ANSWER: Thank you for highlighting this. The title has been shortend and the section immediately starts with definitions. The aspects of the tools discussed in this section are orthogonal to each other. For example a VM/container, package manager, notebook: some projects may have any different combinations of the three. In some aspects using them together can improve the operations, but for example building a VM/container with or without a package manager makes no difference on the main issue we raise about containers (that they are large binary blobs that don't necessarily contain how the environment within them was built). ------------------------------ 46. [Reviewer 4] It wasn't clear to me if code was being run to generate the results and figures in a LaTeX paper that is part of a project in Maneage. It appears to be suggested this is the case, but Figure 1 doesn't show how this works - it just has the LaTeX files, the data files and the Makefiles. Is it being suggested that LaTeX itself is the programming language, using its macro functionality? ANSWER: Thank you for highlighting this point of confusion. The caption of Figure 1 has been edited to hopefully clarify the point. In short, the arrows represent the operation of software on their inputs (the file they originate from) to generate their outputs (the file they point to). In the case of generating 'paper.pdf' from its three dependencies ('references.tex', 'paper.tex' and 'project.tex'), yes, LaTeX is used. But in other steps, other tools are used. For example as you see in [1] the main step of the arrow connecting 'table-3.txt' to 'tools-per-year.txt' is an AWK command (there are also a few 'echo' commands for meta data and copyright in the output plain-text file [2]). [1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/reproduce/analysis/make/demo-plot.mk#L51 [2] https://zenodo.org/record/3911395/files/tools-per-year.txt ------------------------------ 47. [Reviewer 4] I was a bit confused on how collaboration is handled as well - this appears to be using the Git branching model, and the suggestion that Maneage is keeping track of all components from all projects - but what happens if you are working with collaborators that are using their own Maneage instance? ANSWER: Indeed, Maneage operates based on the Git branching model. As mentioned in the text, Maneage is itself a Git branch. People create their own branch from the 'maneage' branch and start customizing it for their particular project in their own particular repository. They can also use all types of Git-based collaborating models to work together on a project that is not yet finished. Figure 2 in fact explicitly shows such a case: the main project leader is committing on the "project" branch. But a collaborator creates a separate branch over commit '01dd812' and makes a couple of commits ('f69e1f4' and '716b56b'), and finally asks the project leader to merge them into the project. This can be generalized to any Git based collaboration model. ------------------------------ 48. [Reviewer 4] I would also liked to have seen a comparison between this approach and other "executable" paper approaches e.g. Jupyter notebooks, compared on completeness, time taken to write a "paper", ease of depositing in a repository, and ease of use by another researcher. ANSWER: These have been highlighted in various parts of the text (also reviewed in previous points). ------------------------------ 49. [Reviewer 4] The weakest aspect is the assumption that research can be easily compartmentalized into simple and complete packages. Given that so much of research involves collaboration and interaction, this is not sufficiently addressed. In particular, the challenge of interdisciplinary work, where there may not be common languages to describe concepts and there may be different common workflow practices will be a barrier to wider adoption of the primary thesis and criteria. ANSWER: Maneage was precisely defined to address the problem of publishing/collaborating on complete workflows. Hopefully with the clarification to point 47 above, this should also become clear. ------------------------------ 50. [Reviewer 5] Major figures currently working in this exact field do not have their work acknowledged in this work. ANSWER: This was due to the strict word limit and the CiSE publication policy (to not include a literature review because there is a limit of only 12 citations). But we had indeed done a comprehensive literature review and the editors kindly agreed that we publish that review as appendices to the main paper on arXiv and Zenodo. ------------------------------ 51. [Reviewer 5] The popper convention: Making reproducible systems evaluation practical ... and the later revision that uses GitHub Actions, is largely the same as this work. ANSWER: This work and the proposed criteria are very different from Popper. A review of Popper has been given in Appendix B. ------------------------------ 52. [Reviewer 5] The lack of attention to virtual machines and containers is highly problematic. While a reader cannot rely on DockerHub or a generic OS version label for a VM or container, these are some of the most promising tools for offering true reproducibility. ANSWER: Containers and VMs have been more thoroughly discussed in the main body and also extensively discussed in appendix A (that are now available in the arXiv and Zenodo versions of this paper). As discussed (with many cited examples), Contains and VMs are only good when they are themselves reproducible (for example running the Dockerfile this year and next year gives the same internal environment). However we show that this is not the case in most solutions (a more comprehensive review would require its own paper). However with complete/robust environment builders like Maneage, Nix or GNU Guix, the analysis environment within a container can be exactly reproduced later. But even so, due to their binary nature and large storage volume, they are not trusable sources for the long term (it is expensive to archive them). We show several example in the paper of how projects that relied on VMs in 2011 and 2014 are no longer active, and how even Dockerhub will be deleting containers that are not used for more than 6 months in free accounts (due to the large storage costs). Furthermore, As a unique new feature, Maneage has the criterion of "Minimal complexity". This means that even if for any reason the project is not able to be run in the future, the content, analysis scripts, etc. are accesible for the interested reader (because it is in plain text). Unlike Nix or Guix it also doesn't have a third-party package package manager: the instructions of building all the software of a project are directly in the same project as the high-level analysis software. So, it is transparent in any case and the interested reader can follow the analysis and study the different decissions of each step (why and how the analysis was done). ------------------------------ 53. [Reviewer 5] On the data side, containers have the promise to manage data sets and workflows completely [Lofstead J, Baker J, Younge A. Data pallets: containerizing storage for reproducibility and traceability. InInternational Conference on High Performance Computing 2019 Jun 16 (pp. 36-45). Springer, Cham.] Taufer has picked up this work and has graduated a MS student working on this topic with a published thesis. See also Jimenez's P-RECS workshop at HPDC for additional work highly relevant to this paper. ANSWER: Thank you for the interesting paper by Lofstead+2019 on Data pallets. We have cited it in Appendix A as examples of how generic the concept of containers is. The topic of linking data to analysis is also a core result of the criteria presented here, and is also discussed shortly in the paper. There are indeed many very interesting works on this topic. But the format of CiSE is very short (a maximum of ~6000 words with 12 references), so we don't have the space to go into this any further. But this is indeed a very interesting aspect for follow up studies, especially as the usage of Maneage incrases, and we have more example workflows by users to study the linkage of data analysis. ------------------------------ 54. [Reviewer 5] Some other systems that do similar things include: reprozip, occam, whole tale, snakemake. ANSWER: All these tools have been reviewed in the newly added appendices. ------------------------------ 55. [Reviewer 5] the paper needs to include the context of the current community development level to be a complete research paper. A revision that includes evaluation of (using the criteria) and comparison with the suggested systems and a related work section that seriously evaluates the work of the recommended authors, among others, would make this paper worthy for publication. ANSWER: A thorough review of current low-level tools and and high-level reproducible workflow management systems has been added in the extended Appendix. ------------------------------ 56. [Reviewer 5] Offers criteria any system that offers reproducibility should have. ANSWER: ------------------------------ 57. [Reviewer 5] Yet another example of a reproducible workflows project. ANSWER: As the newly added thorough comparisons with existing systems shows, these set of criteria and the proof-of-concept offer uniquely new features. As another referee summarized: "This manuscript describes a new reproducible workflow which doesn't require another new trendy high-level software. The proposed workflow is only based on low-level tools already widely known." The fact that we don't define yet another workflow language and framework and base the whole workflow on time-tested solutions in a framwork that costs only ~100 kB to archive (in contrast to multi-GB containers or VMs) is new. ------------------------------ 58. [Reviewer 5] There are numerous examples, mostly domain specific, and this one is not the most advanced general solution. ANSWER: As the comparisons in the appendices and clarifications above show, there are many features in the proposed criteria and proof of concept that are new. ------------------------------ 59. [Reviewer 5] Lack of context in the field missing very relevant work that eliminates much, if not all, of the novelty of this work. ANSWER: The newly added appendices thoroughly describe the context and previous work that has been done in this field. ------------------------------