diff options
-rw-r--r-- | peer-review/1-answer.txt | 567 |
1 files changed, 305 insertions, 262 deletions
diff --git a/peer-review/1-answer.txt b/peer-review/1-answer.txt index 55be70a..8511715 100644 --- a/peer-review/1-answer.txt +++ b/peer-review/1-answer.txt @@ -1,24 +1,49 @@ +Dear CiSE editos, + +Thank you very much for the very complete and useful referee reports. They +have been fully implemented in this submission and have significantly +improved teh quality and clarity of the paper. + +Below all the points raised by the Editor in Chief (EiC), Associate editor, +and the 5 referees (in the same order as the review process report) are +addressed individually as a numbered list. + +Sincerely yours, +Dr. Mohammad Akhlaghi [on behalf of the co-authors] +Instituto de Astrofísica de Canarias, Tenerife, Spain. + +------------------------------ + + + + + 1. [EiC] Some reviewers request additions, and overview of other tools. -ANSWER: Indeed, there is already a large body work in various issues that -have been touched upon in this paper. Before submitting the paper, we had -already done a very comprehensive review of the tools (as you may notice -from the Git repository[1]). However, the CiSE Author Information -explicitly states: "The introduction should provide a modicum of background -in one or two paragraphs, but should not attempt to give a literature -review". This is the usual practice in previously published papers at CiSE -and is in line with the maximum 6250 word-count and maximum of 12 -references to be used in bibliography. - -We agree with the need for this extensive review to be on the public record -(creating the review took a lot of time and effort; most of the tools were -run and tested). We discussed this with the editors and the following -solution was agreed upon: the extended reviews will be published as a set -of appendices in the arXiv[2] and Zenodo[3] pre-prints of this paper. These -publicly available appendices are also mentioned in the submitted paper so -that any interested reader of the final paper published by CiSE can easily -access them. +ANSWER: Indeed, there is already a large body of previous work in this +field, and we had learnt a lot from them during the creation of the +criteria and the proof of concept tool (Maneage). Before submitting the +paper, we had already done a very comprehensive review of the tools (as you +may notice from the Git repository[1], where most of the tools were run and +practically tested). However, the CiSE Author Information explicitly +states: "The introduction should provide a modicum of background in one or +two paragraphs, but should not attempt to give a literature review". This +is the usual practice in previously published papers at CiSE and is in line +with the maximum 6250 word-count and maximum of 12 references to be used in +bibliography. + +We already discussed this point privately with you and we agreed upon the +following solution: the extended reviews will be submitted as supplementary +material, to accompany the paper as "Web extras". These appendices are also +mentioned in the submitted paper so that any interested CiSE reader can +easily be informed about the existance from the paper and access them. + +Appendix A is focused on the low-level "tools" that are commonly used in +the reproducible workflow solutions (including Maneage). In Appendix B, we +touch upon +25 reproducible solutions and compare them directly with our +criteria. In particular, we also review tools that have been abandoned or +discontinued and use the criteria to justify why this happened. [1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/tex/src/paper-long.tex#L1579 [2] https://arxiv.org/abs/2006.03018 @@ -33,9 +58,10 @@ access them. 2. [Associate Editor] There are general concerns about the paper lacking focus -ANSWER: With all the corrections/clarifications that have been done in this -review the focus of the paper should be clear now. We are very grateful to -the thorough listing of points by the referees. +ANSWER: Thanks to all the corrections/clarifications that have been done in +this review, the paper is much more focused and direct to the point. We are +very grateful to the thorough listing of points by the referees that helped +clarify points that we needed to improve. ------------------------------ @@ -46,10 +72,10 @@ the thorough listing of points by the referees. 3. [Associate Editor] Some terminology is not well-defined (e.g. longevity). -ANSWER: Reproducibility, Longevity and Usage have now been explicitly -defined in the first paragraph of Section II. With this definition, the -main argument of the paper is clearer, thank you (and thank you to the -referees for highlighting this). +ANSWER: In this revision, "Reproducibility", "Longevity" and "Usage" have +been explicitly defined in the first paragraph of Section II. With this +definition, the main argument of the paper has become much more clear. +Thank you (and the referees) for highlighting this. ------------------------------ @@ -60,10 +86,10 @@ referees for highlighting this). 4. [Associate Editor] The discussion of tools could benefit from some categorization to characterize their longevity. -ANSWER: The longevity of the general tools reviewed in Section II is now -mentioned immediately after each (VMs, SHARE: discontinued in 2019; -Docker: 6 months; python-dependent package managers: a few years; -Jupyter notebooks: shortest longevity non-core python dependency). +ANSWER: The approximate longevity of the various tools reviewed in Section +II is now mentioned immediately after each and highlighted in green. For +example we have added this after containers "(their longevity is determined +by the host kernel, typically a decade)". ------------------------------ @@ -97,10 +123,10 @@ ANSWER: This has been done, as mentioned in (1.) above. explaining how it deals with the nagging problem of running on CPU vs. different architectures. -ANSWER: The CPU architecture of the running system is now reported in -the "Acknowledgments" section and a description of the problem and its -solution in Maneage is also added and illustrated in the "Proof of -concept: Maneage" Section. +ANSWER: The CPU architecture of the running system is now precisely +reported in the "Acknowledgments" section (highlighted in green). Also, a +description of dependency on hardware architecture, and how Maneage reports +this, is also added in the "Proof of concept: Maneage" Section. ------------------------------ @@ -114,11 +140,10 @@ concept: Maneage" Section. architectures. Is CI employed in any way in the work presented in this article? -ANSWER: CI has been added in the discussion section (V) as one -solution to find breaking points in operating system updates and -new/different architectures. For the core Maneage branch, we have -defined task #15741 [1] to add CI on many architectures in the near -future. +ANSWER: CI has been added in the "Discussion" section as one solution to +find breaking points in operating system updates and new/different +architectures. For the core Maneage branch, we have defined task #15741 [1] +to add CI on many architectures in the near future. [1] http://savannah.nongnu.org/task/?15741 @@ -140,10 +165,11 @@ very long "About" page into multiple pages to help in readability: https://maneage.org/about.html Generally, the webpage will soon undergo major improvements to be even more -clear. The website is developed on a public git repository -(https://git.maneage.org/webpage.git), so any specific proposals for -improvements can be handled efficiently and transparently and we welcome -any feedback in this aspect. +clear (as part of our RDA grant for Maneage, after the paper we have +promised a clear and friendly webpage). The website is developed on a +public git repository (https://git.maneage.org/webpage.git), so any +specific proposals for improvements can be handled efficiently and +transparently and we welcome any feedback in this aspect. ------------------------------ @@ -159,12 +185,13 @@ any feedback in this aspect. related work to help readers understand the clear need for a new approach, if this is being presented as new/novel. -ANSWER: Thank you for highlighting this important point. We saw that -it is necessary to contrast our Maneage proof-of-concept demonstration -more directly against the Jupyter notebook type of approach. Two -paragraphs have been added in Sections II and IV to clarify this (our -criteria require and build in more modularity and longevity than -Jupyter). +ANSWER: Thank you for highlighting this important point. We saw that it is +necessary to compare and contrast our Maneage proof-of-concept +demonstration more directly against the Jupyter notebook type of +approach. Two paragraphs have been added in Sections II and IV to clarify +this (our criteria require and build in more modularity and longevity than +Jupyter). A much more extensive comparison and review is now also available +in Appendix A. ------------------------------ @@ -206,24 +233,24 @@ ANSWER: large. However, the 6250 word-count limit is very tight and if we add more on it in this length, we would have to remove points of higher priority. Hopefully this can be the subject of a follow-up paper. -3. A review of ReproZip is in Appendix C. -4. A review of Occam is in Appendix C. -5. A review of Popper is in Appendix C. -6. A review of Whole Tale is in Appendix C. -7. A review of Snakemake is in Appendix B. -8. CWL and WDL are described in Appendix B (Job management). -9. Nextflow is described in Appendix B (Job management). -10. Sumatra is described in Appendix C. -11. Podman is mentioned in Appendix B (Containers). -12. AppImage is mentioned in Appendix B (Package management). -13. Flatpak is mentioned in Appendix B (Package management). -14. Snap is mentioned in Appendix B (Package management). +3. A review of ReproZip is in Appendix B. +4. A review of Occam is in Appendix B. +5. A review of Popper is in Appendix B. +6. A review of Whole Tale is in Appendix B. +7. A review of Snakemake is in Appendix A. +8. CWL and WDL are described in Appendix A (Job management). +9. Nextflow is described in Appendix A (Job management). +10. Sumatra is described in Appendix B. +11. Podman is mentioned in Appendix A (Containers). +12. AppImage is mentioned in Appendix A (Package management). +13. Flatpak is mentioned in Appendix A (Package management). +14. Snap is mentioned in Appendix A (Package management). 15. nbdev and jupytext are high-level tools to generate documentation and packaging custom code in Conda or pypi. High-level package managers like Conda and Pypi have already been thoroughly reviewed in Appendix A for their longevity issues, so we feel that there is no need to include these. -16. Bazel is mentioned in Appendix B (job management). +16. Bazel is mentioned in Appendix A (job management). 17. Debian's reproducible builds are only designed for ensuring that software packaged for Debian is bitwise reproducible. As mentioned in the discussion section of this paper, the bitwise reproducibility of software is @@ -245,20 +272,21 @@ ANSWER: * A model project for reproducible papers: https://arxiv.org/abs/1401.2000 * Executable/reproducible paper articles and original concepts -ANSWER: Thank you for highlighting these points. Appendix C starts with a -subsection titled "suggested rules, checklists or criteria" with a review of -existing sets of criteria. This subsection includes the sources proposed -by the reviewer [Sandve et al; Rule et al; Nust et al] (and others). +ANSWER: Thank you for highlighting these points. Appendix B starts with a +subsection titled "suggested rules, checklists or criteria". In this +section, we review the existing sets of criteria. This subsection includes +the sources proposed by the reviewer [Sandve et al; Rule et al; Nust et al] +(and others). -ArXiv:1401.2000 has been added in Appendix B as an example paper using +ArXiv:1401.2000 has been added in Appendix A as an example paper using virtual machines. We thank the referee for bringing up this paper, because the link to the VM provided in the paper no longer works (the URL http://archive.comp-phys.org/provenance_challenge/provenance_machine.ova redirects to https://share.phys.ethz.ch//~alpsprovenance_challenge/provenance_machine.ova -which gives a 'Not Found' html response). Together with SHARE, this very nicely -highlights our main issue with binary containers or VMs: their lack of -longevity. +which gives a 'Not Found' html response). Together with SHARE, this very +nicely highlights our main issue with binary containers or VMs: their lack +of longevity due to the high cost of long term storage of large files. ------------------------------ @@ -295,16 +323,23 @@ ANSWER: These points have been clarified in the highlighted parts of the text: On a similar topic, Dockerhub's recent announcement that inactive images (for over 6 months) will be deleted has also been added. The announcemnt - URL is here (it was too long to include in the paper, if IEEE has a - special short-url format, we can add it): - https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates + URL is a hyperlink in the text (it was too long to print directly, if + IEEE has a special short-url format, we can add it). + + Another interesting News in relation to longevity has also been added + here: the decision by CentOS to abandon CentOS 8 next year. Again, the + URL is within a hyperlink on the text. Many scientific and industrial + projects have relied on CentOS for longevity over the last two decades, + but that didn't stop its creators from abandoning it 8 years early and + completely switching its release paradigm. 3. A small statement has been added, reminding the readers that almost all - free software projects are built with Make (CMake is popular, but it is just a - high-level wrapper over Make: it finally produces a 'Makefile'; practical - usage of CMake generally obliges the user to understand Make). + free software projects are built with Make (CMake is also used + sometimes, but CMake is just a high-level wrapper over Make: it finally + produces a 'Makefile'; practical usage of CMake generally obliges the + user to understand Make). -4. The example of Python 2 has been added. +4. The example of Python 2 has been added to clarify this point. ------------------------------ @@ -321,23 +356,30 @@ ANSWER: These points have been clarified in the highlighted parts of the text: well-established technology (e.g. Jenkins is almost 10 years old). ANSWER: Thank you for raising these issues. We had initially planned to -discuss CIs, but like many discussion points, we were forced to remove -it before the first submission due to the very tight word-count limit. We -have now added a sentence on CI in the discussion. - -On the issue of Bash/Git/Make, indeed, the _executable_ Bash, Git and -Make binaries are not bitwise reproducible/identical on different -systems. However, as mentioned in the discussion, we are concerned -with the _output_ of the software's executable file, _after_ the -execution of its job. We (or any user of Bash) is not interested in -the executable file itself. The reproducibility of the binary file -only becomes important if a significant bug is found (very rare for -ordinary usage of such core software of the OS). Hence, even though -the compiled binary files of specific versions of Git, Bash or Make -will not be bitwise reproducible/identical on different systems, their -scientific outputs are exactly reproducible: 'git describe' or Bash's -'for' loop will have the same output on GNU/Linux, macOS/Darwin or -FreeBSD (despite having bit-wise different executables). +discuss CIs, but like many discussion points, we were forced to remove it +before the first submission due to the very tight word-count limit. We have +now added a sentence on CI in the discussion. + +On the issue of Bash/Git/Make, indeed, the executable built files of Bash, +Git and Make binaries are not bitwise reproducible/identical on different +systems. However, as mentioned in the discussion, we are concerned with the +_output_ of the software's executable file. We are not interested in the +executable file itself (which should be different for different OSs or CPU +architectures). + +The reproducibility of a binary file only becomes important for security +purposes where binaries are downloaded. In Maneage, we download the +software source code tarball, confirm the tarball's SHA512 checksum with +the checksum that is recorded in Maneage [1], and build the software with +precisely defined build environment and dependencies. + +In summary, even though the compiled binary files of specific versions of +Git, Bash or Make will not be bitwise reproducible/identical on different +systems, their scientific outputs are exactly reproducible: 'git describe' +or Bash's 'for' loop will have the same output on GNU/Linux, macOS/Darwin +or FreeBSD (despite having bit-wise different executables). + +[1] http://git.maneage.org/project.git/tree/reproduce/software/config/checksums.conf ------------------------------ @@ -349,7 +391,7 @@ FreeBSD (despite having bit-wise different executables). provides little novelty (see comments below). ANSWER: The previously suggested sets of criteria that were listed by -Reviewer 1 are reviewed by us in the newly added Appendix C, and the +Reviewer 1 are reviewed by us in the newly added Appendix B, and the novelty and advantages of our proposed criteria are contrasted there with the earlier sets of criteria. @@ -364,7 +406,7 @@ with the earlier sets of criteria. reproducible research. ANSWER: In the submitted version we had stated that "Ideally, it is -possible to precisely identify the Docker “images” that are imported with +possible to precisely identify the Docker images that are imported with their checksums ...". But to be more clear and go directly to the point, it has been edited to explicity say "... to recreate an identical OS image later". @@ -381,17 +423,18 @@ later". report it. ANSWER: Thank you very much for raising this important point. We hadn't -seen other reproducibility papers mention this important point and missed -it. In the acknowledgments (where we also mention the commit hashes) we now -explicitly mention the exact CPU architecture used to build this paper: -"This project was built on an x86_64 machine with Little Endian byte-order -and address sizes 39 bits physical, 48 bits virtual.". This is because we -have already seen cases where the architecture is the same, but programs -fail because of the byte order. +seen other reproducibility papers mention this important point and thus +missed it. In the acknowledgments (where we also mention the commit hashes) +we now explicitly mention the exact CPU architecture used to build this +paper: "This project was built on an x86_64 machine with Little Endian +byte-order and address sizes 39 bits physical, 48 bits virtual.". This is +because we have already seen cases where the architecture is the same, but +programs fail because of the byte order. Generally, Maneage will now extract this information from the running -system during its configuration phase and provide the users with three -different LaTeX macros that they can use anywhere in their paper. +system during its configuration phase, and provide the users with three +different LaTeX macros that contain this information. Users can use these +LaTeX macros anywhere in their paper. ------------------------------ @@ -403,8 +446,13 @@ different LaTeX macros that they can use anywhere in their paper. POSIX". Authors should more explained what they mean by this sentence. ANSWER: This has been clarified with the short extra statement "a minimal -Unix-like standard that is shared between many operating systems". We would -have liked to explain this more, but the word limit is very constraining. +Unix-like standard that is shared between many operating systems". Also in +the appendix we now say "no execution requirement beyond a minimal +Unix-like operating system". + +We would have liked to explain this more, but the word limit is very +constraining. It is more clear in the appendices, and we will put more +clear explations in teh web page. ------------------------------ @@ -435,27 +483,28 @@ possible solution to this has been added after criterion 8. ANSWER: Thank you for mentioning this. This has been fixed by archiving all Maneage'd software on Zenodo (https://doi.org/10.5281/zenodo.3883409) and -also downloading from there. +also downloading them from there as highest precedence. Until recently we would directly access each software's own webpage to -download the source files, and this caused frequent problems of this sort. In other -cases, we were very frustrated when a software's webpage would temporarily -be unavailable (for maintenance reasons); this would be a hindrance in -trying to build new projects. +download the source files, and this caused frequent problems of the type +you mentioned (different servers in different ISPs/states/countries can +behave differentely). In other cases, we were very frustrated when a +software's webpage would temporarily be unavailable (e.g., for maintenance +reasons); this was a major hindrance in building new projects. Since all the software is free-licensed, we are legally allowed to -re-distribute it (within the conditions, such as not removing copyright -notices) and Zenodo is defined for long-term archival of -academic digital objects, so we decided that a software source code -repository on Zenodo would be the most reliable solution. At configure -time, Maneage now accesses Zenodo's DOI and resolves the most recent -URL to automatically download any necessary software source code that -the project needs from there. +re-distribute them (within the conditions, such as not removing copyright +notices) and Zenodo is defined for long-term archival of academic digital +objects, so we decided that a software source code repository on Zenodo +would be the most reliable solution. At configure time, Maneage now +accesses Zenodo's DOI and resolves the most recent URL to automatically +download any necessary software source code that the project needs from +there. Generally, we also keep all software in a Git repository on our own webpage: http://git.maneage.org/tarballs-software.git/tree. Also, Maneage -users can also identify their own custom URLs for downloading software, -which will be given higher priority than Zenodo (useful for situations when +users can identify their own custom URLs for downloading software, which +will be given higher priority than Zenodo (useful for situations when custom software is downloaded and built in a project branch (not the core 'maneage' branch). @@ -507,11 +556,15 @@ ANSWER: "Reproducibility" has been defined along with "Longevity" and together to minimize some of the problems presented. The use of Ansible, Helm, among others, also helps in minimizing problems. -ANSWER: Ansible and Helm are primarily designed for distributed +ANSWER: That is correct. In the new appendices we have touched upon this, +especially in Appendix B where we discuss the technologies used by various +reproducible workflow solutions. + +About Ansible and Helm; they are primarily designed for distributed computing. For example Helm is just a high-level package manager for a Kubernetes cluster that is based on containers. A review of them could be -added to the Appendices, but we feel they this would distract somewhat -from the main points of our current paper. +added to the Appendices, but we feel they this would distract somewhat from +the main points of our current paper. ------------------------------ @@ -540,9 +593,9 @@ the introduction of the revised version. ANSWER: Thank you very much for pointing out the works by Thain. We couldn't find any first-author papers in 2015, but found Meng & Thain -(https://doi.org/10.1016/j.procs.2017.05.116) which had a related +(https://doi.org/10.1016/j.procs.2017.05.116) which had a relevant discussion of why they didn't use Docker containers in their work. That -paper is now cited in the discussion of Containers in Appendix B. +paper is now cited in the discussion of Containers in Appendix A. ------------------------------ @@ -555,7 +608,7 @@ paper is now cited in the discussion of Containers in Appendix B. ANSWER: Thank you for the reference. We are restricted in the main body of the paper due to the strict bibliography limit of 12 -references; we have included Kurtzer et al 2017 in Appendix B (where +references; we have included Kurtzer et al 2017 in Appendix A (where we discuss Singularity). ------------------------------ @@ -570,7 +623,7 @@ we discuss Singularity). ANSWER: The FAIR principles have been mentioned in the main body of the paper, but unfortunately we had to remove its citation in the main paper (like many others) to keep to the maximum of 12 references. We have cited it in -Appendix C. +Appendix B. ------------------------------ @@ -584,10 +637,9 @@ Appendix C. further enrich the tool presented. -ANSWER: We agree and have removed the IPOL example from that section. -We have included an in-depth discussion of IPOL in Appendix C and we -comment on the readiness of Maneage'd projects for a similar level of -peer-review control. +ANSWER: We agree and have removed the IPOL example from that section. We +have included an in-depth discussion of IPOL in Appendix B and we comment +on how Maneage'd projects offer a similar level of peer-review control. ------------------------------ @@ -636,26 +688,11 @@ ANSWER: Two comprehensive Appendices have been added to address this issue. them, give some examples of why they are important, and address potential missing criteria. -ANSWER: Our selection of the criteria and their importance are -questions of the philosophy of science: "what is good science? what -should reproducibility aim for?" We feel that completeness; -modularity; minimal complexity; scalability; verifiability of inputs -and outputs; recording of the project history; linking of narrative -to analysis; and the right to use, modify, and redistribute -scientific software in original or modified form; constitute a set -of criteria that should uncontroversially be seen as "important" -from a wide range of ethical, social, political, and economic -perspectives. An exception is probably the issue of proprietary -versus free software (criterion 8), on which debate is far from -closed. - -Within the constraints of space (the limit is 6500 words), we don't -see how we could add more discussion of the history of our choice of -criteria or more anecdotal examples of their relevance. - -We do discuss some alternatives lists of criteria in Appendix C.A, -without debating the wider perspective of which criteria are the -most desirable. +ANSWER: In the new appendix B, we have added a new section, reviewing some +existing criteria. We would be very interested to discuss them even further +in the main body, Within the constraints of space (the limit is 6250 +words), it is almost impossible to discuss the history of each in detail or +add more anecdotal examples of their relevance. ------------------------------ @@ -678,9 +715,15 @@ Section (V), within the existing word limit. 35. [Reviewer 4] Be clearer about which sorts of research workflow are best suited to this approach. -ANSWER: Maneage is flexible enough to enable a wide range of -workflows to be implemented. This is done by leveraging the -highly modular and flexible nature of Makefiles run via 'Make'. +ANSWER: Maneage is flexible enough to enable a wide range of workflows to +be implemented. This is done by leveraging the highly modular and flexible +nature of Makefiles run via 'Make'. + +GUI-based operations (that involve human interaction and cannot be run in +batch-mode) are one type of workflow that our proof-of-concept will not +support. But as discussed in the completeness criteria, human interaction +is an incompleteness, dramatically reducing the reproducibility of a +result. ------------------------------ @@ -716,8 +759,8 @@ this Maneage'd paper. 37. [Reviewer 4] ... the handling of floating point number -[reproducibility] ... will come with a tradeoff agianst -performance, which is never mentioned. +[reproducibility] ... will come with a tradeoff agianst performance, which +is never mentioned. ANSWER: The criteria we propose and the proof-of-concept with Maneage do not force the choice of a tradeoff between exact bitwise floating point @@ -729,8 +772,8 @@ range. Performance is indeed an important issue for _immediate_ reproducibility and we would have liked to discuss it. But due to the strict word-count, we feel that adding it to the discussion points, without having adequate space -to elaborate, can confuse the readers away from the focus of this paper -(which is focused on long term usability). It has therefore not been added. +to elaborate, can confuse the readers away from the focus of this paper (on +long term usability). It has therefore not been added. ------------------------------ @@ -745,13 +788,13 @@ ANSWER: That is true. In section IV, we have given the time it takes to build Maneage (only once on each computer) to be around 1.5 hours on an 8-core CPU (a typical machine that may be used for data analysis). We therefore conclude that when the analysis is complex (and thus taking many -hours or days to complete), this time is negligible. +hours, or even days to complete), this time is negligible. But if the project's full analysis takes 10 minutes or less (like the -extremely simple analysis done in this paper which takes a fraction of a -second). Indeed, the 1.5 hour building time is significant. In those cases, -as discussed in the main body, the project can be built once in a Docker -image and easily moved to other computers. +extremely simple analysis done in this paper). Indeed, the 1.5 hour +building time is significant. In those cases, as discussed in the main +body, the project can be built once in a Docker image and easily moved to +other computers. Generally, it is true that the initial configuration time (only once on each computer) of a Maneage install may discourage some scientists; but a @@ -770,16 +813,17 @@ time scale of a few hours. ANSWER: The results mentioned here are anecdotal: based on private discussions after holding multiple seminars and Webinars with RDA's -support, and also a workshop that was planned for -non-astronomers. We invited (funded) early career researchers to -come to the workshop with the RDA funding. However, that workshop -was cancelled due to the pandemic and we had private communications -instead. +support, and also a workshop that was planned for non-astronomers. We +invited (funded) early career researchers to come to the workshop with the +RDA funding. However, that workshop was cancelled due to the COVID-19 +pandemic and we had private communications instead. We would very much like to elaborate on this experience of training new researchers with these tools. However, as with many of the cases above, the very strict word-limit doesn't allow us to elaborate beyond what we have -already written. +already written. Hopefully in a couple of years and with the wider usage of +Maneage or these criteria in research papers, we will be able to write a +paper that is directly focused on this. ------------------------------ @@ -799,7 +843,7 @@ efforts to become more and more portable and longer-lived. However, as the reviewer states, this would be a sidebar, and we are constrained for space, so we couldn't find a place to highlight this. But it is indeed a subject worthy of a full paper (that can be very useful for -many software projects0.. +many software projects). ------------------------------ @@ -834,20 +878,13 @@ our current short title is sufficient. 42. [Reviewer 4] Whilst the thesis stated is valid, it may not be useful to practitioners of computation science and engineering as it stands. -ANSWER: This point appears to refer to floating point bitwise reproducibility -and possibly to the conciseness of our paper. The former is fully allowed -for, as stated above, though not obligatory, using the "verify.mk" rule -file to (typically, but not obligatorily) force bitwise reproducibility. -The latter is constrained by the 6500-word limit. The addition of appendices -in the extended version may help respond to the latter point. - -The current small number of existing research projects using -Maneage, as indicated in the revised version of our paper includes -papers outside of observational astronomy (which is the first -author's main background). The fact that the approach is precisely -defined for computational science and engineering problems where -_publication_ of the human-readable workflow source is also -important may partly respond to this issue. +ANSWER: This point appears to refer to floating point bitwise +reproducibility and possibly to the conciseness of our paper. The former is +fully allowed for, as stated above, though not obligatory, using the +"verify.mk" rule file to (typically, but not obligatorily) force bitwise +reproducibility. The latter is constrained by the 6250-word limit of +CiSE. The addition of supplementary appendices in the extended version help +respond to the latter point. ------------------------------ @@ -872,10 +909,10 @@ ANSWER: This has been defined now at the start of Section II. of years, but may not be suitable for the timescale of decades where porting and emulation are used. -ANSWER: Statements on quantifying the longevity of specific tools -have been added in Section II. For example in the case of Docker -images: "their longevity is determined by the host kernel, usually a -decade", for Python packages: "Python installation with a usual +ANSWER: Statements on quantifying the longevity of specific tools have been +added in Section II and are highlighted in green. For example in the case +of Docker images: "their longevity is determined by the host kernel, +usually a decade", for Python packages: "Python installation with a usual longevity of a few years", for Nix/Guix: "with considerably better longevity; same as supported CPU architectures." @@ -893,9 +930,9 @@ longevity; same as supported CPU architectures." ANSWER: We have changed the section title to "Longevity of existing tools" to clarify that we refer to longevity of the tools. -If the four categories of tools were combined, then the overall -longevity would be that of the shortest intersection of the time -spans over which the tools remained viable. +If the four categories of tools were combined, then the overall longevity +would be that of the shortest intersection of the time spans over which the +tools remained viable. ------------------------------ @@ -912,14 +949,14 @@ spans over which the tools remained viable. ANSWER: Thank you for highlighting this point of confusion. The caption of Figure 1 has been edited to hopefully clarify the point. In short, the -arrows represent the operation of software on their inputs (the file they -originate from) to generate their outputs (the file they point to). In the -case of generating 'paper.pdf' from its three dependencies +arrows represent the operation of software and boxes represent files. In +the case of generating 'paper.pdf' from its three dependencies ('references.tex', 'paper.tex' and 'project.tex'), yes, LaTeX is used. But -in other steps, other tools are used. For example as you see in [1] the -main step of the arrow connecting 'table-3.txt' to 'tools-per-year.txt' is -an AWK command (there are also a few 'echo' commands for meta data and -copyright in the output plain-text file [2]). +in other steps, other tools are used (depending on the analysis). For +example as you see in [1] the main step of the arrow connecting +'table-3.txt' to 'tools-per-year.txt' is an AWK command (there are also a +few 'echo' commands for meta data and copyright in the output plain-text +file [2]). [1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/reproduce/analysis/make/demo-plot.mk#L51 [2] https://zenodo.org/record/3911395/files/tools-per-year.txt @@ -937,11 +974,11 @@ copyright in the output plain-text file [2]). are using their own Maneage instance? ANSWER: Indeed, Maneage operates based on the Git branching model. As -mentioned in the text, Maneage is itself a Git branch. People create their -own branch from the 'maneage' branch and start customizing it for their -particular project in their own particular repository. They can also use -all types of Git-based collaborating models to work together on a project -that is not yet finished. +mentioned in the text, Maneage is itself a Git branch. Researchers spin-off +their own branch for a specific project from the 'maneage' branch and start +customizing it for their particular project in their own particular +repository. They can also use all types of Git-based collaborating models +to work together on their branch. Figure 2 in fact explicitly shows such a case: the main project leader is committing on the "project" branch. But a collaborator creates a separate @@ -950,10 +987,10 @@ branch over commit '01dd812' and makes a couple of commits ('f69e1f4' and project. This can be generalized to any Git based collaboration model. Recent experience by one of us [Roukema] found that a merge of a -Maneage-based cosmology simulation project (now zenodo.4062460), -after separate evolution of about 30-40 commits on maneage and -possibly 100 on the project, needed about one day of straightforward -effort, without any major difficulties. +Maneage-based cosmology simulation project (now zenodo.4062460), after +separate evolution of about 30-40 commits on maneage and possibly 100 on +the project, needed about one day of straightforward effort, without any +major difficulties. So it is easy to update low-level infrastructure. ------------------------------ @@ -968,11 +1005,12 @@ effort, without any major difficulties. use by another researcher. ANSWER: This type of sociological survey will make sense once the number of -projects run with Maneage is sufficiently high. The time taken to write a -paper should be measurable automatically: from the git history. The other -parameters suggested would require cooperation from the scientists in -responding to the survey, or will have to be collected anecdotally in the -short term. +projects run with Maneage is sufficiently high and comparable to Jupyter +for example. The time taken to write a paper is be measurable +automatically: from the git history. The other parameters suggested would +require cooperation from the scientists in responding to the survey, or +will have to be collected anecdotally in the short term. This is a good +subject for a follow-up paper in a few years. ------------------------------ @@ -989,8 +1027,16 @@ short term. will be a barrier to wider adoption of the primary thesis and criteria. ANSWER: Maneage was precisely defined to address the problem of -publishing/collaborating on complete workflows. Hopefully with the -clarification to point 47 above, this should also become clear. +publishing/collaborating on complete workflows by many people (in this +paper itself, we are already 6 people who have been collaborating to +complete it and you can see this in the Git history). Git has been +exceptionally powerful in enabling collaborations of huge projects with +thousands of contributors like the Linux kernel. Exactly the same +collaborating style of the Linux kernel can be implemented in Maneage for +large scientific projects. + +Hopefully with the clarification to point 47 above, this should also become +clear. ------------------------------ @@ -1001,12 +1047,11 @@ clarification to point 47 above, this should also become clear. 50. [Reviewer 5] Major figures currently working in this exact field do not have their work acknowledged in this work. -ANSWER: This was due to the strict word limit and the CiSE -publication policy (to not include a literature review because there -is a limit of only 12 citations). But we had indeed already done a -comprehensive literature review and the editors kindly agreed that -we publish that review as appendices to the main paper on arXiv and -Zenodo. +ANSWER: This was due to the strict word limit and the CiSE publication +policy (to not include a literature review because there is a limit of only +12 citations). But we had indeed already done a comprehensive literature +review and the editors kindly agreed that we submit that review as +supplementary appendices. ------------------------------ @@ -1020,10 +1065,7 @@ Zenodo. work. ANSWER: This work and the proposed criteria are very different from -Popper. We agree that VMs and containers are an important component -of this field, and the appendices add depth to our discussion of this. -However, these do not appear to satisfy all our proposed criteria. -A detailed review of Popper, in particular, is given in Appendix C. +Popper. A detailed review of Popper, in particular, is given in Appendix B. ------------------------------ @@ -1036,37 +1078,36 @@ A detailed review of Popper, in particular, is given in Appendix C. generic OS version label for a VM or container, these are some of the most promising tools for offering true reproducibility. -ANSWER: Containers and VMs have been more thoroughly discussed in -the main body and also extensively discussed in appendix B (that are -now available in the arXiv and Zenodo versions of this paper). As -discussed (with many cited examples), Containers and VMs are only -appropriate when they are themselves reproducible (for example, if -running the Dockerfile this year and next year gives the same -internal environment). However, we show that this is not the case in -most solutions (a more comprehensive review would require its own -paper). - -Moreover, with complete, robust environment builders like Maneage, Nix or GNU -Guix, the analysis environment within a container can be exactly reproduced -later. But even so, due to their binary nature and large storage volume, -they are not trusable sources for the long term (it is expensive to archive -them). We show several example in the paper of how projects that relied on -VMs in 2011 and 2014 are no longer active, and how even Dockerhub will be -deleting containers that are not used for more than 6 months in free -accounts (due to the high storage costs). - -Furthermore, as a unique new feature, Maneage has the criterion of -"Minimal complexity". This means that even if for any reason the -project is not able to be run in the future, the content, analysis -scripts, etc. are accessible for the interested reader since they -are stored as plain text (only the development history - the git -history - is storied in git's binary format). Unlike Nix or Guix, -our approach doesn't need a third-party package package manager: the -instructions for building all the software of a project are directly -in the same project as the high-level analysis software. The full -end-to-end process is transparent in our case, and the interested -scientist can follow the analysis and study the different decisions -of each step (why and how the analysis was done). +ANSWER: Containers and VMs have been more thoroughly discussed in the main +body and also extensively discussed in appendix A. As discussed (with many +cited examples), Containers and VMs are only appropriate when they are +themselves reproducible (for example, if running the Dockerfile this year +and next year gives the same internal environment). However, we show that +this is not the case in most solutions (a more comprehensive review would +require its own paper). + +Moreover, with complete, robust environment builders like Maneage, Nix or +GNU Guix, the analysis environment within a container can be exactly +reproduced later. But even so, due to their binary nature and large storage +volume, they are not trusable sources for the long term (it is expensive to +archive them). We show several examples in the paper and appendices of how +projects that relied on VMs in 2011 and 2014 are no longer active, and how +even Dockerhub will be deleting containers that are not used for more than +6 months in free accounts (due to the high storage costs). + +Furthermore, as a unique new feature, Maneage has the criterion of "Minimal +complexity". This means that even if, for any reason, the project is not +able to be run in the future, the content, analysis scripts, etc. are +accessible for the interested reader as plain text (only the development +history - the git history - is storied in git's binary format). Unlike Nix +or Guix, our approach doesn't need a third-party package package manager: +the instructions for building all the software of a project are directly in +the same project as the high-level analysis software. The full end-to-end +process is transparent and archived in Maneage, and the interested +scientist can follow the analysis and study the different decisions of each +step (why and how the analysis was done). They can also modify it to work +on future hardware that we don't know about today (this is not possible on +a binary file like VMs or containers). ------------------------------ @@ -1084,7 +1125,7 @@ of each step (why and how the analysis was done). additional work highly relevant to this paper. ANSWER: Thank you for the interesting paper by Lofstead+2019 on Data -pallets. We have cited it in Appendix B as an example of how generic the +pallets. We have cited it in Appendix A as an example of how generic the concept of containers is. The topic of linking data to analysis is also a core result of the criteria @@ -1136,14 +1177,16 @@ Appendices. ANSWER: As the newly added thorough comparisons with existing systems shows, these set of criteria and the proof-of-concept offer uniquely new features. As another referee summarized: "This manuscript describes a new -reproducible workflow which doesn't require another new trendy high-level -software. The proposed workflow is only based on low-level tools already +reproducible workflow _which doesn't require another new trendy high-level +software_. The proposed workflow is only based on low-level tools already widely known." -The fact that we don't define yet another workflow language and framework -and base the whole workflow on time-tested solutions in a framwork that -costs only ~100 kB to archive (in contrast to multi-GB containers or VMs) -is new. +Interestingly, the fact that we don't define yet another workflow language +and framework is itself what makes our proof-of-concept unique. Other +unique features of Maneage is that it is based on time-tested solutions +(the youngest tool we use it Git which is already 15 years old) in a +framwork that costs only ~100 kB to archive (in contrast to multi-GB +containers or VMs). ------------------------------ |