diff options
Diffstat (limited to 'peer-review')
-rw-r--r-- | peer-review/1-answer.txt | 1173 | ||||
-rw-r--r-- | peer-review/1-review.txt | 788 |
2 files changed, 1961 insertions, 0 deletions
diff --git a/peer-review/1-answer.txt b/peer-review/1-answer.txt new file mode 100644 index 0000000..55be70a --- /dev/null +++ b/peer-review/1-answer.txt @@ -0,0 +1,1173 @@ +1. [EiC] Some reviewers request additions, and overview of other + tools. + +ANSWER: Indeed, there is already a large body work in various issues that +have been touched upon in this paper. Before submitting the paper, we had +already done a very comprehensive review of the tools (as you may notice +from the Git repository[1]). However, the CiSE Author Information +explicitly states: "The introduction should provide a modicum of background +in one or two paragraphs, but should not attempt to give a literature +review". This is the usual practice in previously published papers at CiSE +and is in line with the maximum 6250 word-count and maximum of 12 +references to be used in bibliography. + +We agree with the need for this extensive review to be on the public record +(creating the review took a lot of time and effort; most of the tools were +run and tested). We discussed this with the editors and the following +solution was agreed upon: the extended reviews will be published as a set +of appendices in the arXiv[2] and Zenodo[3] pre-prints of this paper. These +publicly available appendices are also mentioned in the submitted paper so +that any interested reader of the final paper published by CiSE can easily +access them. + +[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/tex/src/paper-long.tex#L1579 +[2] https://arxiv.org/abs/2006.03018 +[3] https://doi.org/10.5281/zenodo.3872247 + +------------------------------ + + + + + +2. [Associate Editor] There are general concerns about the paper + lacking focus + +ANSWER: With all the corrections/clarifications that have been done in this +review the focus of the paper should be clear now. We are very grateful to +the thorough listing of points by the referees. + +------------------------------ + + + + + +3. [Associate Editor] Some terminology is not well-defined + (e.g. longevity). + +ANSWER: Reproducibility, Longevity and Usage have now been explicitly +defined in the first paragraph of Section II. With this definition, the +main argument of the paper is clearer, thank you (and thank you to the +referees for highlighting this). + +------------------------------ + + + + + +4. [Associate Editor] The discussion of tools could benefit from some + categorization to characterize their longevity. + +ANSWER: The longevity of the general tools reviewed in Section II is now +mentioned immediately after each (VMs, SHARE: discontinued in 2019; +Docker: 6 months; python-dependent package managers: a few years; +Jupyter notebooks: shortest longevity non-core python dependency). + +------------------------------ + + + + + +5. [Associate Editor] Background and related efforts need significant + improvement. (See below.) + +ANSWER: This has been done, as mentioned in (1.) above. + +------------------------------ + + + + + +6. [Associate Editor] There is consistency among the reviews that + related work is particularly lacking. + +ANSWER: This has been done, as mentioned in (1.) above. + +------------------------------ + + + + + +7. [Associate Editor] The current work needs to do a better job of + explaining how it deals with the nagging problem of running on CPU + vs. different architectures. + +ANSWER: The CPU architecture of the running system is now reported in +the "Acknowledgments" section and a description of the problem and its +solution in Maneage is also added and illustrated in the "Proof of +concept: Maneage" Section. + +------------------------------ + + + + + +8. [Associate Editor] At least one review commented on the need to + include a discussion of continuous integration (CI) and its + potential to help identify problems running on different + architectures. Is CI employed in any way in the work presented in + this article? + +ANSWER: CI has been added in the discussion section (V) as one +solution to find breaking points in operating system updates and +new/different architectures. For the core Maneage branch, we have +defined task #15741 [1] to add CI on many architectures in the near +future. + +[1] http://savannah.nongnu.org/task/?15741 + +------------------------------ + + + + + +9. [Associate Editor] The presentation of the Maneage tool is both + lacking in clarity and consistency with the public + information/documentation about the tool. While our review focus + is on the article, it is important that readers not be confused + when they visit your site to use your tools. + +ANSWER: Thank you for raising this important point. We have broken down the +very long "About" page into multiple pages to help in readability: + +https://maneage.org/about.html + +Generally, the webpage will soon undergo major improvements to be even more +clear. The website is developed on a public git repository +(https://git.maneage.org/webpage.git), so any specific proposals for +improvements can be handled efficiently and transparently and we welcome +any feedback in this aspect. + +------------------------------ + + + + + +10. [Associate Editor] A significant question raised by one review is + how this work compares to "executable" papers and Jupyter + notebooks. Does this work embody similar/same design principles + or expand upon the established alternatives? In any event, a + discussion of this should be included in background/motivation and + related work to help readers understand the clear need for a new + approach, if this is being presented as new/novel. + +ANSWER: Thank you for highlighting this important point. We saw that +it is necessary to contrast our Maneage proof-of-concept demonstration +more directly against the Jupyter notebook type of approach. Two +paragraphs have been added in Sections II and IV to clarify this (our +criteria require and build in more modularity and longevity than +Jupyter). + + +------------------------------ + + + + + +11. [Reviewer 1] Adding an explicit list of contributions would make + it easier to the reader to appreciate these. These are not + mentioned/cited and are highly relevant to this paper (in no + particular order): + 1. Git flows, both in general and in particular for research. + 2. Provenance work, in general and with git in particular + 3. Reprozip: https://www.reprozip.org/ + 4. OCCAM: https://occam.cs.pitt.edu/ + 5. Popper: http://getpopper.io/ + 6. Whole Tale: https://wholetale.org/ + 7. Snakemake: https://github.com/snakemake/snakemake + 8. CWL https://www.commonwl.org/ and WDL https://openwdl.org/ + 9. Nextflow: https://www.nextflow.io/ + 10. Sumatra: https://pythonhosted.org/Sumatra/ + 11. Podman: https://podman.io + 12. AppImage (https://appimage.org/) + 13. Flatpack (https://flatpak.org/) + 14. Snap (https://snapcraft.io/) + 15. nbdev https://github.com/fastai/nbdev and jupytext + 16. Bazel: https://bazel.build/ + 17. Debian reproducible builds: https://wiki.debian.org/ReproducibleBuilds + +ANSWER: + +1. In Section IV, we have added that "Generally, any git flow (branching + strategies) can be used by the high-level project authors or future + readers." +2. We have mentioned research objects as one mode of provenance tracking + and the related provenance work that has already been done and can be + exploited using these criteria and our proof of concept is indeed very + large. However, the 6250 word-count limit is very tight and if we add + more on it in this length, we would have to remove points of higher priority. + Hopefully this can be the subject of a follow-up paper. +3. A review of ReproZip is in Appendix C. +4. A review of Occam is in Appendix C. +5. A review of Popper is in Appendix C. +6. A review of Whole Tale is in Appendix C. +7. A review of Snakemake is in Appendix B. +8. CWL and WDL are described in Appendix B (Job management). +9. Nextflow is described in Appendix B (Job management). +10. Sumatra is described in Appendix C. +11. Podman is mentioned in Appendix B (Containers). +12. AppImage is mentioned in Appendix B (Package management). +13. Flatpak is mentioned in Appendix B (Package management). +14. Snap is mentioned in Appendix B (Package management). +15. nbdev and jupytext are high-level tools to generate documentation and + packaging custom code in Conda or pypi. High-level package managers + like Conda and Pypi have already been thoroughly reviewed in Appendix A + for their longevity issues, so we feel that there is no need to + include these. +16. Bazel is mentioned in Appendix B (job management). +17. Debian's reproducible builds are only designed for ensuring that software + packaged for Debian is bitwise reproducible. As mentioned in the + discussion section of this paper, the bitwise reproducibility of software is + not an issue in the context discussed here; the reproducibility of the + relevant output data of the software is the main issue. + +------------------------------ + + + + + +12. [Reviewer 1] Existing guidelines similar to the proposed "Criteria + for longevity". Many articles of these in the form "10 simple + rules for X", for example (not exhaustive list): + * https://doi.org/10.1371/journal.pcbi.1003285 + * https://arxiv.org/abs/1810.08055 + * https://osf.io/fsd7t/ + * A model project for reproducible papers: https://arxiv.org/abs/1401.2000 + * Executable/reproducible paper articles and original concepts + +ANSWER: Thank you for highlighting these points. Appendix C starts with a +subsection titled "suggested rules, checklists or criteria" with a review of +existing sets of criteria. This subsection includes the sources proposed +by the reviewer [Sandve et al; Rule et al; Nust et al] (and others). + +ArXiv:1401.2000 has been added in Appendix B as an example paper using +virtual machines. We thank the referee for bringing up this paper, because +the link to the VM provided in the paper no longer works (the URL +http://archive.comp-phys.org/provenance_challenge/provenance_machine.ova +redirects to +https://share.phys.ethz.ch//~alpsprovenance_challenge/provenance_machine.ova +which gives a 'Not Found' html response). Together with SHARE, this very nicely +highlights our main issue with binary containers or VMs: their lack of +longevity. + +------------------------------ + + + + + +13. [Reviewer 1] Several claims in the manuscript are not properly + justified, neither in the text nor via citation. Examples (not + exhaustive list): + 1. "it is possible to precisely identify the Docker “images” that + are imported with their checksums, but that is rarely practiced + in most solutions that we have surveyed [which ones?]" + 2. "Other OSes [which ones?] have similar issues because pre-built + binary files are large and expensive to maintain and archive." + 3. "Researchers using free software tools have also already had + some exposure to it" + 4. "A popular framework typically falls out of fashion and + requires significant resources to translate or rewrite every + few years." + +ANSWER: These points have been clarified in the highlighted parts of the text: + +1. Many examples have been given throughout the newly added + appendices. To avoid confusion in the main body of the paper, we + have removed the "we have surveyed" part. It is already mentioned + above this point in the text that a large survey of existing + methods/solutions is given in the appendices. + +2. Due to the thorough discussion of this issue in the appendices with + precise examples, this line has been removed to allow space for the + other points raised by the referees. The main point (high cost of + keeping binaries) is already abundantly clear. + + On a similar topic, Dockerhub's recent announcement that inactive images + (for over 6 months) will be deleted has also been added. The announcemnt + URL is here (it was too long to include in the paper, if IEEE has a + special short-url format, we can add it): + https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates + +3. A small statement has been added, reminding the readers that almost all + free software projects are built with Make (CMake is popular, but it is just a + high-level wrapper over Make: it finally produces a 'Makefile'; practical + usage of CMake generally obliges the user to understand Make). + +4. The example of Python 2 has been added. + + +------------------------------ + + + + + +14. [Reviewer 1] As mentioned in the discussion by the authors, not + even Bash, Git or Make is reproducible, thus not even Maneage can + address the longevity requirements. One possible alternative is + the use of CI to ensure that papers are re-executable (several + papers have been written on this topic). Note that CI is + well-established technology (e.g. Jenkins is almost 10 years old). + +ANSWER: Thank you for raising these issues. We had initially planned to +discuss CIs, but like many discussion points, we were forced to remove +it before the first submission due to the very tight word-count limit. We +have now added a sentence on CI in the discussion. + +On the issue of Bash/Git/Make, indeed, the _executable_ Bash, Git and +Make binaries are not bitwise reproducible/identical on different +systems. However, as mentioned in the discussion, we are concerned +with the _output_ of the software's executable file, _after_ the +execution of its job. We (or any user of Bash) is not interested in +the executable file itself. The reproducibility of the binary file +only becomes important if a significant bug is found (very rare for +ordinary usage of such core software of the OS). Hence, even though +the compiled binary files of specific versions of Git, Bash or Make +will not be bitwise reproducible/identical on different systems, their +scientific outputs are exactly reproducible: 'git describe' or Bash's +'for' loop will have the same output on GNU/Linux, macOS/Darwin or +FreeBSD (despite having bit-wise different executables). + +------------------------------ + + + + + +15. [Reviewer 1] Criterion has been proposed previously. Maneage itself + provides little novelty (see comments below). + +ANSWER: The previously suggested sets of criteria that were listed by +Reviewer 1 are reviewed by us in the newly added Appendix C, and the +novelty and advantages of our proposed criteria are contrasted there +with the earlier sets of criteria. + +------------------------------ + + + + + +16. [Reviewer 2] Authors should add indication that using good practices it + is possible to use Docker or VM to obtain identical OS usable for + reproducible research. + +ANSWER: In the submitted version we had stated that "Ideally, it is +possible to precisely identify the Docker “images” that are imported with +their checksums ...". But to be more clear and go directly to the point, it +has been edited to explicity say "... to recreate an identical OS image +later". + +------------------------------ + + + + + +17. [Reviewer 2] The CPU architecture of the platform used to run the + workflow is not discussed in the manuscript. Authors should probably + take into account the architecture used in their workflow or at least + report it. + +ANSWER: Thank you very much for raising this important point. We hadn't +seen other reproducibility papers mention this important point and missed +it. In the acknowledgments (where we also mention the commit hashes) we now +explicitly mention the exact CPU architecture used to build this paper: +"This project was built on an x86_64 machine with Little Endian byte-order +and address sizes 39 bits physical, 48 bits virtual.". This is because we +have already seen cases where the architecture is the same, but programs +fail because of the byte order. + +Generally, Maneage will now extract this information from the running +system during its configuration phase and provide the users with three +different LaTeX macros that they can use anywhere in their paper. + +------------------------------ + + + + + +18. [Reviewer 2] I don’t understand the "no dependency beyond + POSIX". Authors should more explained what they mean by this sentence. + +ANSWER: This has been clarified with the short extra statement "a minimal +Unix-like standard that is shared between many operating systems". We would +have liked to explain this more, but the word limit is very constraining. + +------------------------------ + + + + + +19. [Reviewer 2] Unfortunately, sometime we need proprietary or specialized + software to read raw data... For example in genetics, micro-array raw + data are stored in binary proprietary formats. To convert this data + into a plain text format, we need the proprietary software provided + with the measurement tool. + +ANSWER: Thank you very much for this good point. A description of a +possible solution to this has been added after criterion 8. + +------------------------------ + + + + + +20. [Reviewer 2] I was not able to properly set up a project with + Maneage. The configuration step failed during the download of tools + used in the workflow. This is probably due to a firewall/antivirus + restriction out of my control. How frequent this failure happen to + users? + +ANSWER: Thank you for mentioning this. This has been fixed by archiving all +Maneage'd software on Zenodo (https://doi.org/10.5281/zenodo.3883409) and +also downloading from there. + +Until recently we would directly access each software's own webpage to +download the source files, and this caused frequent problems of this sort. In other +cases, we were very frustrated when a software's webpage would temporarily +be unavailable (for maintenance reasons); this would be a hindrance in +trying to build new projects. + +Since all the software is free-licensed, we are legally allowed to +re-distribute it (within the conditions, such as not removing copyright +notices) and Zenodo is defined for long-term archival of +academic digital objects, so we decided that a software source code +repository on Zenodo would be the most reliable solution. At configure +time, Maneage now accesses Zenodo's DOI and resolves the most recent +URL to automatically download any necessary software source code that +the project needs from there. + +Generally, we also keep all software in a Git repository on our own +webpage: http://git.maneage.org/tarballs-software.git/tree. Also, Maneage +users can also identify their own custom URLs for downloading software, +which will be given higher priority than Zenodo (useful for situations when +custom software is downloaded and built in a project branch (not the core +'maneage' branch). + +------------------------------ + + + + + +21. [Reviewer 2] The time to configure a new project is quite long because + everything needs to be compiled. Authors should compare the time + required to set up a project Maneage versus time used by other + workflows to give an indication to the readers. + +ANSWER: Thank you for raising this point. it takes about 1.5 hours to +configure the default Maneage branch on an 8-core CPU (more than half of +this time is devoted to GCC on GNU/Linux operating systems, and the +building of GCC can optionally be disabled with the '--host-cc' option to +significantly speed up the build when the host's GCC is +similar). Furthermore, Maneage can be built within a Docker container. + +A paragraph has been added in Section IV on this issue (the +build time and building within a Docker container). We have also defined +task #15818 [1] to have our own core Docker image that is ready to build a +Maneaged project and will be adding it shortly. + +[1] https://savannah.nongnu.org/task/index.php?15818 + +------------------------------ + + + + + +22. [Reviewer 3] Authors should define their use of the term [Replicability + or Reproducibility] briefly for their readers. + +ANSWER: "Reproducibility" has been defined along with "Longevity" and +"usage" at the start of Section II. + +------------------------------ + + + + + +23. [Reviewer 3] The introduction is consistent with the proposal of the + article, but deals with the tools separately, many of which can be used + together to minimize some of the problems presented. The use of + Ansible, Helm, among others, also helps in minimizing problems. + +ANSWER: Ansible and Helm are primarily designed for distributed +computing. For example Helm is just a high-level package manager for a +Kubernetes cluster that is based on containers. A review of them could be +added to the Appendices, but we feel they this would distract somewhat +from the main points of our current paper. + +------------------------------ + + + + + +24. [Reviewer 3] When the authors use the Python example, I believe it is + interesting to point out that today version 2 has been discontinued by + the maintaining community, which creates another problem within the + perspective of the article. + +ANSWER: Thank you very much for highlighting this point. We had excluded +this point for the sake of article length, but we have restored it in +the introduction of the revised version. + +------------------------------ + + + + + +25. [Reviewer 3] Regarding the use of VM's and containers, I believe that + the discussion presented by THAIN et al., 2015 is interesting to + increase essential points of the current work. + +ANSWER: Thank you very much for pointing out the works by Thain. We +couldn't find any first-author papers in 2015, but found Meng & Thain +(https://doi.org/10.1016/j.procs.2017.05.116) which had a related +discussion of why they didn't use Docker containers in their work. That +paper is now cited in the discussion of Containers in Appendix B. + +------------------------------ + + + + + +26. [Reviewer 3] About the Singularity, the description article was missing + (Kurtzer GM, Sochat V, Bauer MW, 2017). + +ANSWER: Thank you for the reference. We are restricted in the main +body of the paper due to the strict bibliography limit of 12 +references; we have included Kurtzer et al 2017 in Appendix B (where +we discuss Singularity). + +------------------------------ + + + + + +27. [Reviewer 3] I also believe that a reference to FAIR is interesting + (WILKINSON et al., 2016). + +ANSWER: The FAIR principles have been mentioned in the main body of the +paper, but unfortunately we had to remove its citation in the main paper (like +many others) to keep to the maximum of 12 references. We have cited it in +Appendix C. + +------------------------------ + + + + + +28. [Reviewer 3] In my opinion, the paragraph on IPOL seems to be out of + context with the previous ones. This issue of end-to-end + reproducibility of a publication could be better explored, which would + further enrich the tool presented. + + +ANSWER: We agree and have removed the IPOL example from that section. +We have included an in-depth discussion of IPOL in Appendix C and we +comment on the readiness of Maneage'd projects for a similar level of +peer-review control. + +------------------------------ + + + + + +29. [Reviewer 3] On the project website, I suggest that the information + contained in README-hacking be presented on the same page as the + Tutorial. A topic breakdown is interesting, as the markdown reading may + be too long to find information. + +ANSWER: Thank you very much for this good suggestion, it has been +implemented: https://maneage.org/about.html . The webpage will continuously +be improved and such feedback is always very welcome. + +------------------------------ + + + + + +31. [Reviewer 3] The tool is suitable for Unix users, keeping users away + from Microsoft environments. + +ANSWER: The issue of building on Windows has been discussed in Section IV, +either using Docker (or VMs) or using the Windows Subsystem for Linux. + +------------------------------ + + + + +32. [Reviewer 3] Important references are missing; more references are + needed + +ANSWER: Two comprehensive Appendices have been added to address this issue. + +------------------------------ + + + + + +33. [Reviewer 4] Revisit the criteria, show how you have come to decide on + them, give some examples of why they are important, and address + potential missing criteria. + +ANSWER: Our selection of the criteria and their importance are +questions of the philosophy of science: "what is good science? what +should reproducibility aim for?" We feel that completeness; +modularity; minimal complexity; scalability; verifiability of inputs +and outputs; recording of the project history; linking of narrative +to analysis; and the right to use, modify, and redistribute +scientific software in original or modified form; constitute a set +of criteria that should uncontroversially be seen as "important" +from a wide range of ethical, social, political, and economic +perspectives. An exception is probably the issue of proprietary +versus free software (criterion 8), on which debate is far from +closed. + +Within the constraints of space (the limit is 6500 words), we don't +see how we could add more discussion of the history of our choice of +criteria or more anecdotal examples of their relevance. + +We do discuss some alternatives lists of criteria in Appendix C.A, +without debating the wider perspective of which criteria are the +most desirable. + +------------------------------ + + + + + +34. [Reviewer 4] Clarify the discussion of challenges to adoption and make + it clearer which tradeoffs are important to practitioners. + +ANSWER: We discuss many of these challenges and caveats in the Discussion +Section (V), within the existing word limit. + +------------------------------ + + + + + +35. [Reviewer 4] Be clearer about which sorts of research workflow are best + suited to this approach. + +ANSWER: Maneage is flexible enough to enable a wide range of +workflows to be implemented. This is done by leveraging the +highly modular and flexible nature of Makefiles run via 'Make'. + +------------------------------ + + + + + +36. [Reviewer 4] There is also the challenge of mathematical + reproducibility, particularly of the handling of floating point number, + which might occur because of the way the code is written, and the + hardware architecture (including if code is optimised / parallelised). + +ANSWER: Floating point errors and optimizations have been mentioned in the +discussion (Section V). The issue with parallelization has also been +discussed in Section IV, in the part on verification ("Where exact +reproducibility is not possible (for example due to parallelization), +values can be verified by a statistical method specified by the project +authors."). We have linked keywords in the latter sentence to a Software +Heritage URI [1] with the specific file in a Maneage'd paper that +illustrates an example of how statistical verification of parallelised code +can work in practice (Peper & Roukema 2020; zenodo.4062460). + +We would be interested to hear if any other papers already exist that use +automatic statistical verification of parallelised code as has been done in +this Maneage'd paper. + +[1] https://archive.softwareheritage.org/browse/origin/content/?branch=refs/heads/postreferee_corrections&origin_url=https://codeberg.org/boud/elaphrocentre.git&path=reproduce/analysis/bash/verify-parameter-statistically.sh + +------------------------------ + + + + + +37. [Reviewer 4] ... the handling of floating point number +[reproducibility] ... will come with a tradeoff agianst +performance, which is never mentioned. + +ANSWER: The criteria we propose and the proof-of-concept with Maneage do +not force the choice of a tradeoff between exact bitwise floating point +reproducibility versus performance (e.g. speed). The specific concepts of +"verification" and "reproducibility" will vary between domains of +scientific computation, but we expect that the criteria allow this wide +range. + +Performance is indeed an important issue for _immediate_ reproducibility +and we would have liked to discuss it. But due to the strict word-count, we +feel that adding it to the discussion points, without having adequate space +to elaborate, can confuse the readers away from the focus of this paper +(which is focused on long term usability). It has therefore not been added. + +------------------------------ + + + + + +38. [Reviewer 4] Tradeoff, which might affect Criterion 3 is time to result, + people use popular frameworks because it is easier to use them. + +ANSWER: That is true. In section IV, we have given the time it takes to +build Maneage (only once on each computer) to be around 1.5 hours on an +8-core CPU (a typical machine that may be used for data analysis). We +therefore conclude that when the analysis is complex (and thus taking many +hours or days to complete), this time is negligible. + +But if the project's full analysis takes 10 minutes or less (like the +extremely simple analysis done in this paper which takes a fraction of a +second). Indeed, the 1.5 hour building time is significant. In those cases, +as discussed in the main body, the project can be built once in a Docker +image and easily moved to other computers. + +Generally, it is true that the initial configuration time (only once on +each computer) of a Maneage install may discourage some scientists; but a +serious scientific research project is never started and completed on a +time scale of a few hours. + +------------------------------ + + + + + +39. [Reviewer 4] I would liked to have seen explanation of how these + challenges to adoption were identified: was this anecdotal, through + surveys? participant observation? + +ANSWER: The results mentioned here are anecdotal: based on private +discussions after holding multiple seminars and Webinars with RDA's +support, and also a workshop that was planned for +non-astronomers. We invited (funded) early career researchers to +come to the workshop with the RDA funding. However, that workshop +was cancelled due to the pandemic and we had private communications +instead. + +We would very much like to elaborate on this experience of training new +researchers with these tools. However, as with many of the cases above, the +very strict word-limit doesn't allow us to elaborate beyond what we have +already written. + +------------------------------ + + + + + +40. [Reviewer 4] Potentially an interesting sidebar to investigate how + LaTeX/TeX has ensured its longevity! + +ANSWER: That is indeed a very interesting subject to study (an obvious link +is that LaTeX/TeX is very strongly based on plain text files). We have been +in touch with Karl Berry (one of the core people behind TeX Live, who also +plays a prominent role in GNU) and have whitnessed the TeX Live community's +efforts to become more and more portable and longer-lived. + +However, as the reviewer states, this would be a sidebar, and we are +constrained for space, so we couldn't find a place to highlight this. But +it is indeed a subject worthy of a full paper (that can be very useful for +many software projects0.. + +------------------------------ + + + + + +41. [Reviewer 4] The title is not specific enough - it should refer to the + reproducibility of workflows/projects. + +ANSWER: A problem here is that "workflow" and "project" taken in isolation +risk being vague for wider audiences. Also, we aim at covering a wider +range of aspects of a project than just than the workflow alone; in the +other direction, the word "project" could be seen as too broad, including +the funding, principal investigator, and team coordination. + +A specific title that might be appropriate could be, for example, "Towards +long-term and archivable reproducibility of scientific computational +research projects". Using a term proposed by one of our reviewers, "Towards +long-term and archivable end-to-end reproducibility of scientific +computational research projects" might also be appropriate. + +Nevertheless, we feel that in the context of an article published in CiSE, +our current short title is sufficient. + +------------------------------ + + + + + +42. [Reviewer 4] Whilst the thesis stated is valid, it may not be useful to + practitioners of computation science and engineering as it stands. + +ANSWER: This point appears to refer to floating point bitwise reproducibility +and possibly to the conciseness of our paper. The former is fully allowed +for, as stated above, though not obligatory, using the "verify.mk" rule +file to (typically, but not obligatorily) force bitwise reproducibility. +The latter is constrained by the 6500-word limit. The addition of appendices +in the extended version may help respond to the latter point. + +The current small number of existing research projects using +Maneage, as indicated in the revised version of our paper includes +papers outside of observational astronomy (which is the first +author's main background). The fact that the approach is precisely +defined for computational science and engineering problems where +_publication_ of the human-readable workflow source is also +important may partly respond to this issue. + +------------------------------ + + + + + +43. [Reviewer 4] Longevity is not defined. + +ANSWER: This has been defined now at the start of Section II. + +------------------------------ + + + + + +44. [Reviewer 4] Whilst various tools are discussed and discarded, no + attempt is made to categorise the magnitude of longevity for which they + are relevant. For instance, environment isolators are regarded by the + software preservation community as adequate for timescale of the order + of years, but may not be suitable for the timescale of decades where + porting and emulation are used. + +ANSWER: Statements on quantifying the longevity of specific tools +have been added in Section II. For example in the case of Docker +images: "their longevity is determined by the host kernel, usually a +decade", for Python packages: "Python installation with a usual +longevity of a few years", for Nix/Guix: "with considerably better +longevity; same as supported CPU architectures." + +------------------------------ + + + + + +45. [Reviewer 4] The title of this section "Commonly used tools and their + longevity" is confusing - do you mean the longevity of the tools or the + longevity of the workflows that can be produced using these tools? + What happens if you use a combination of all four categories of tools? + +ANSWER: We have changed the section title to "Longevity of existing tools" +to clarify that we refer to longevity of the tools. + +If the four categories of tools were combined, then the overall +longevity would be that of the shortest intersection of the time +spans over which the tools remained viable. + +------------------------------ + + + + + +46. [Reviewer 4] It wasn't clear to me if code was being run to generate + the results and figures in a LaTeX paper that is part of a project in + Maneage. It appears to be suggested this is the case, but Figure 1 + doesn't show how this works - it just has the LaTeX files, the data + files and the Makefiles. Is it being suggested that LaTeX itself is the + programming language, using its macro functionality? + +ANSWER: Thank you for highlighting this point of confusion. The caption of +Figure 1 has been edited to hopefully clarify the point. In short, the +arrows represent the operation of software on their inputs (the file they +originate from) to generate their outputs (the file they point to). In the +case of generating 'paper.pdf' from its three dependencies +('references.tex', 'paper.tex' and 'project.tex'), yes, LaTeX is used. But +in other steps, other tools are used. For example as you see in [1] the +main step of the arrow connecting 'table-3.txt' to 'tools-per-year.txt' is +an AWK command (there are also a few 'echo' commands for meta data and +copyright in the output plain-text file [2]). + +[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/reproduce/analysis/make/demo-plot.mk#L51 +[2] https://zenodo.org/record/3911395/files/tools-per-year.txt + +------------------------------ + + + + + +47. [Reviewer 4] I was a bit confused on how collaboration is handled as + well - this appears to be using the Git branching model, and the + suggestion that Maneage is keeping track of all components from all + projects - but what happens if you are working with collaborators that + are using their own Maneage instance? + +ANSWER: Indeed, Maneage operates based on the Git branching model. As +mentioned in the text, Maneage is itself a Git branch. People create their +own branch from the 'maneage' branch and start customizing it for their +particular project in their own particular repository. They can also use +all types of Git-based collaborating models to work together on a project +that is not yet finished. + +Figure 2 in fact explicitly shows such a case: the main project leader is +committing on the "project" branch. But a collaborator creates a separate +branch over commit '01dd812' and makes a couple of commits ('f69e1f4' and +'716b56b'), and finally asks the project leader to merge them into the +project. This can be generalized to any Git based collaboration model. + +Recent experience by one of us [Roukema] found that a merge of a +Maneage-based cosmology simulation project (now zenodo.4062460), +after separate evolution of about 30-40 commits on maneage and +possibly 100 on the project, needed about one day of straightforward +effort, without any major difficulties. + +------------------------------ + + + + + +48. [Reviewer 4] I would also [have] liked to have seen a comparison + between this approach and other "executable" paper approaches + e.g. Jupyter notebooks, compared on completeness, time taken to + write a "paper", ease of depositing in a repository, and ease of + use by another researcher. + +ANSWER: This type of sociological survey will make sense once the number of +projects run with Maneage is sufficiently high. The time taken to write a +paper should be measurable automatically: from the git history. The other +parameters suggested would require cooperation from the scientists in +responding to the survey, or will have to be collected anecdotally in the +short term. + +------------------------------ + + + + + +49. [Reviewer 4] The weakest aspect is the assumption that research can be + easily compartmentalized into simple and complete packages. Given that + so much of research involves collaboration and interaction, this is not + sufficiently addressed. In particular, the challenge of + interdisciplinary work, where there may not be common languages to + describe concepts and there may be different common workflow practices + will be a barrier to wider adoption of the primary thesis and criteria. + +ANSWER: Maneage was precisely defined to address the problem of +publishing/collaborating on complete workflows. Hopefully with the +clarification to point 47 above, this should also become clear. + +------------------------------ + + + + + +50. [Reviewer 5] Major figures currently working in this exact field do not + have their work acknowledged in this work. + +ANSWER: This was due to the strict word limit and the CiSE +publication policy (to not include a literature review because there +is a limit of only 12 citations). But we had indeed already done a +comprehensive literature review and the editors kindly agreed that +we publish that review as appendices to the main paper on arXiv and +Zenodo. + +------------------------------ + + + + + +51. [Reviewer 5] Jimenez I et al ... 2017 "The popper convention: Making + reproducible systems evaluation practical ..." and the later + revision that uses GitHub Actions, is largely the same as this + work. + +ANSWER: This work and the proposed criteria are very different from +Popper. We agree that VMs and containers are an important component +of this field, and the appendices add depth to our discussion of this. +However, these do not appear to satisfy all our proposed criteria. +A detailed review of Popper, in particular, is given in Appendix C. + +------------------------------ + + + + + +52. [Reviewer 5] The lack of attention to virtual machines and containers + is highly problematic. While a reader cannot rely on DockerHub or a + generic OS version label for a VM or container, these are some of the + most promising tools for offering true reproducibility. + +ANSWER: Containers and VMs have been more thoroughly discussed in +the main body and also extensively discussed in appendix B (that are +now available in the arXiv and Zenodo versions of this paper). As +discussed (with many cited examples), Containers and VMs are only +appropriate when they are themselves reproducible (for example, if +running the Dockerfile this year and next year gives the same +internal environment). However, we show that this is not the case in +most solutions (a more comprehensive review would require its own +paper). + +Moreover, with complete, robust environment builders like Maneage, Nix or GNU +Guix, the analysis environment within a container can be exactly reproduced +later. But even so, due to their binary nature and large storage volume, +they are not trusable sources for the long term (it is expensive to archive +them). We show several example in the paper of how projects that relied on +VMs in 2011 and 2014 are no longer active, and how even Dockerhub will be +deleting containers that are not used for more than 6 months in free +accounts (due to the high storage costs). + +Furthermore, as a unique new feature, Maneage has the criterion of +"Minimal complexity". This means that even if for any reason the +project is not able to be run in the future, the content, analysis +scripts, etc. are accessible for the interested reader since they +are stored as plain text (only the development history - the git +history - is storied in git's binary format). Unlike Nix or Guix, +our approach doesn't need a third-party package package manager: the +instructions for building all the software of a project are directly +in the same project as the high-level analysis software. The full +end-to-end process is transparent in our case, and the interested +scientist can follow the analysis and study the different decisions +of each step (why and how the analysis was done). + +------------------------------ + + + + + +53. [Reviewer 5] On the data side, containers have the promise to manage + data sets and workflows completely [Lofstead J, Baker J, Younge A. Data + pallets: containerizing storage for reproducibility and + traceability. InInternational Conference on High Performance Computing + 2019 Jun 16 (pp. 36-45). Springer, Cham.] Taufer has picked up this + work and has graduated a MS student working on this topic with a + published thesis. See also Jimenez's P-RECS workshop at HPDC for + additional work highly relevant to this paper. + +ANSWER: Thank you for the interesting paper by Lofstead+2019 on Data +pallets. We have cited it in Appendix B as an example of how generic the +concept of containers is. + +The topic of linking data to analysis is also a core result of the criteria +presented here, and is also discussed briefly in our paper. There are +indeed many very interesting works on this topic. But the format of CiSE is +very short (a maximum of ~6500 words with 12 references), so we don't have +the space to go into this any further. But this is indeed a very +interesting aspect for follow-up studies, especially as usage of +Maneage grows, and we have more example workflows by users to study the +linkage of data analysis. + +------------------------------ + + + + + +54. [Reviewer 5] Some other systems that do similar things include: + reprozip, occam, whole tale, snakemake. + +ANSWER: All these tools have been reviewed in the newly added appendices. + +------------------------------ + + + + + +55. [Reviewer 5] the paper needs to include the context of the current + community development level to be a complete research paper. A revision + that includes evaluation of (using the criteria) and comparison with + the suggested systems and a related work section that seriously + evaluates the work of the recommended authors, among others, would make + this paper worthy for publication. + +ANSWER: A thorough review of current low-level tools and and high-level +reproducible workflow management systems has been added in the extended +Appendices. + +------------------------------ + + + + + + +56. [Reviewer 5] Yet another example of a reproducible workflows project. + +ANSWER: As the newly added thorough comparisons with existing systems +shows, these set of criteria and the proof-of-concept offer uniquely new +features. As another referee summarized: "This manuscript describes a new +reproducible workflow which doesn't require another new trendy high-level +software. The proposed workflow is only based on low-level tools already +widely known." + +The fact that we don't define yet another workflow language and framework +and base the whole workflow on time-tested solutions in a framwork that +costs only ~100 kB to archive (in contrast to multi-GB containers or VMs) +is new. + +------------------------------ + + + + + +57. [Reviewer 5] There are numerous examples, mostly domain specific, and + this one is not the most advanced general solution. + +ANSWER: As the comparisons in the appendices and clarifications above show, +there are many features in the proposed criteria and proof of concept that +are new and not satisfied by the domain-specific solutions known to us. + +------------------------------ + + + + + +58. [Reviewer 5] Lack of context in the field missing very relevant work + that eliminates much, if not all, of the novelty of this work. + +ANSWER: The newly added appendices thoroughly describe the context and +previous work that has been done in this field. + +------------------------------ diff --git a/peer-review/1-review.txt b/peer-review/1-review.txt new file mode 100644 index 0000000..16e227b --- /dev/null +++ b/peer-review/1-review.txt @@ -0,0 +1,788 @@ +From: cise computer org +To: mohammad akhlaghi org, + infantesainz gmail com, + boud astro uni torun pl, + david valls-gabaud observatoiredeparis psl eu, + rbaena iac es +Received: Tue, 22 Sep 2020 15:28:21 -0400 +Subject: Computing in Science and Engineering, CiSESI-2020-06-0048 + major revision required + +-------------------------------------------------- + +Computing in Science and Engineering,CiSESI-2020-06-0048 +"Towards Long-term and Archivable Reproducibility" +manuscript type: Reproducible Research + +Dear Dr. Mohammad Akhlaghi, + +The manuscript that you submitted to Computing in Science and Engineering +has completed the review process. After carefully examining the manuscript +and reviews, we have decided that the manuscript needs major revisions +before it can be considered for a second review. + +Your revision is due before 22-Oct-2020. Please note that if your paper was +submitted to a special issue, this due date may be different. Contact the +peer review administrator, Ms. Jessica Ingle, at cise computer.org if you +have questions. + +The reviewer and editor comments are attached below for your +reference. Please maintain our 6,250–word limit as you make your revisions. + +To upload your revision and summary of changes, log on to +https://mc.manuscriptcentral.com/cise-cs, click on your Author Center, then +"Manuscripts with Decisions." Under "Actions," choose "Create a Revision" +next to the manuscript number. + +Highlight the changes to your manuscript by using the track changes mode in +MS Word, the latexdiff package if using LaTex, or by using bold or colored +text. + +When submitting your revised manuscript, you will need to respond to the +reviewer comments in the space provided. + +If you have questions regarding our policies or procedures, please refer to +the magazines' Author Information page linked from the Instructions and +Forms (top right corner of the ScholarOne Manuscripts screen) or you can +contact me. + +We look forward to receiving your revised manuscript. + +Sincerely, +Dr. Lorena A. Barba +George Washington University +Mechanical and Aerospace Engineering +Editor-in-Chief, Computing in Science and Engineering + +-------------------------------------------------- + + + + + +EiC comments: +Some reviewers request additions, and overview of other tools, etc. In +doing your revision, please remember space limitations: 6,250 words +maximum, including all main body, abstract, keyword, bibliography (12 +references or less), and biography text. See "Write For Us" section of the +website: https://www.computer.org/csdl/magazine/cs + +Comments of the Associate Editor: Associate Editor +Comments to the Author: Thank to the authors for your submission to the +Reproducible Research department. + +Thanks to the reviewers for your careful and thoughtful reviews. We would +appreciate it if you can make your reports available and share the DOI as +soon as possible, per our original invitation e-mail. We will follow up our +original invitation to obtain your review DOI, if you have not already +included it in your review comments. + +Based on the review feedback, there are a number of major issues that +require attention and many minor ones as well. Please take these into +account as you prepare your major revision for another round of +review. (See the actual review reports for details.) + +1. In general, there are a number of presentation issues needing +attention. There are general concerns about the paper lacking focus. Some +terminology is not well-defined (e.g. longevity). In addition, the +discussion of tools could benefit from some categorization to characterize +their longevity. Background and related efforts need significant +improvement. (See below.) + +2. There is consistency among the reviews that related work is particularly +lacking and not taking into account major works that have been written on +this topic. See the reviews for details about work that could potentially +be included in the discussion and how the current work is positioned with +respect to this work. + +3. The current work needs to do a better job of explaining how it deals +with the nagging problem of running on CPU vs. different architectures. At +least one review commented on the need to include a discussion of +continuous integration (CI) and its potential to help identify problems +running on different architectures. Is CI employed in any way in the work +presented in this article? + +4. The presentation of the Maneage tool is both lacking in clarity and +consistency with the public information/documentation about the tool. While +our review focus is on the article, it is important that readers not be +confused when they visit your site to use your tools. + +5. A significant question raised by one review is how this work compares to +"executable" papers and Jupyter notebooks. Does this work embody +similar/same design principles or expand upon the established alternatives? +In any event, a discussion of this should be included in +background/motivation and related work to help readers understand the clear +need for a new approach, if this is being presented as new/novel. + +Reviews: + +Please note that some reviewers may have included additional comments in a +separate file. If a review contains the note "see the attached file" under +Section III A - Public Comments, you will need to log on to ScholarOne +Manuscripts to view the file. After logging in, select the Author Center, +click on the "Manuscripts with Decisions" queue and then click on the "view +decision letter" link for this manuscript. You must scroll down to the very +bottom of the letter to see the file(s), if any. This will open the file +that the reviewer(s) or the Associate Editor included for you along with +their review. + +-------------------------------------------------- + + + + + +Reviewer: 1 +Recommendation: Author Should Prepare A Major Revision For A Second Review + +Comments: + + * Adding an explicit list of contributions would make it easier to the + reader to appreciate these. + + * These are not mentioned/cited and are highly relevant to this paper (in + no particular order): + + * Git flows, both in general and in particular for research. + * Provenance work, in general and with git in particular + * Reprozip: https://www.reprozip.org/ + * OCCAM: https://occam.cs.pitt.edu/ + * Popper: http://getpopper.io/ + * Whole Tale: https://wholetale.org/ + * Snakemake: https://github.com/snakemake/snakemake + * CWL https://www.commonwl.org/ and WDL https://openwdl.org/ + * Nextflow: https://www.nextflow.io/ + * Sumatra: https://pythonhosted.org/Sumatra/ + * Podman: https://podman.io + * AppImage (https://appimage.org/), Flatpack + (https://flatpak.org/), Snap (https://snapcraft.io/) + * nbdev https://github.com/fastai/nbdev and jupytext + * Bazel: https://bazel.build/ + * Debian reproducible builds: https://wiki.debian.org/ReproducibleBuilds + + * Existing guidelines similar to the proposed "Criteria for + longevity". Many articles of these in the form "10 simple rules for + X", for example (not exhaustive list): + * https://doi.org/10.1371/journal.pcbi.1003285 + * https://arxiv.org/abs/1810.08055 + * https://osf.io/fsd7t/ + + * A model project for reproducible papers: https://arxiv.org/abs/1401.2000 + + * Executable/reproducible paper articles and original concepts + + * Several claims in the manuscript are not properly justified, neither in + the text nor via citation. Examples (not exhaustive list): + + * "it is possible to precisely identify the Docker “images” that are + imported with their checksums, but that is rarely practiced in most + solutions that we have surveyed [which ones?]" + + * "Other OSes [which ones?] have similar issues because pre-built + binary files are large and expensive to maintain and archive." + + * "Researchers using free software tools have also already had some + exposure to it" + + * "A popular framework typically falls out of fashion and requires + significant resources to translate or rewrite every few years." + + * As mentioned in the discussion by the authors, not even Bash, Git or + Make is reproducible, thus not even Maneage can address the longevity + requirements. One possible alternative is the use of CI to ensure that + papers are re-executable (several papers have been written on this + topic). Note that CI is well-established technology (e.g. Jenkins is + almost 10 years old). + +Additional Questions: + +1. How relevant is this manuscript to the readers of this periodical? + Please explain your rating in the Detailed Comments section.: Very + Relevant + +2. To what extent is this manuscript relevant to readers around the world?: + The manuscript is of interest to readers throughout the world + +1. Please summarize what you view as the key point(s) of the manuscript and + the importance of the content to the readers of this periodical.: This + article introduces desiderata for long-term archivable reproduciblity + and presents Maneage, a system whose goal is to achieve these outlined + properties. + +2. Is the manuscript technically sound? Please explain your answer in the + Detailed Comments section.: Partially + +3. What do you see as this manuscript's contribution to the literature in + this field?: Presentation of Maneage + +4. What do you see as the strongest aspect of this manuscript?: A great + summary of Maneage, as well as its implementaiton. + +5. What do you see as the weakest aspect of this manuscript?: Criterion has + been proposed previously. Maneage itself provides little novelty (see + comments below). + +1. Does the manuscript contain title, abstract, and/or keywords?: Yes + +2. Are the title, abstract, and keywords appropriate? Please elaborate in + the Detailed Comments section.: Yes + +3. Does the manuscript contain sufficient and appropriate references + (maximum 12-unless the article is a survey or tutorial in scope)? Please + elaborate in the Detailed Comments section.: Important references are + missing; more references are needed + +4. Does the introduction clearly state a valid thesis? Please explain your + answer in the Detailed Comments section.: Could be improved + +5. How would you rate the organization of the manuscript? Please elaborate + in the Detailed Comments section.: Satisfactory + +6. Is the manuscript focused? Please elaborate in the Detailed Comments + section.: Satisfactory + +7. Is the length of the manuscript appropriate for the topic? Please + elaborate in the Detailed Comments section.: Satisfactory + +8. Please rate and comment on the readability of this manuscript in the + Detailed Comments section.: Easy to read + +9. Please rate and comment on the timeliness and long term interest of this + manuscript to CiSE readers in the Detailed Comments section. Select all + that apply.: Topic and content are of limited interest to CiSE readers. + +Please rate the manuscript. Explain your choice in the Detailed Comments +section.: Good + +-------------------------------------------------- + + + + + +Reviewer: 2 +Recommendation: Accept If Certain Minor Revisions Are Made + +Comments: https://doi.org/10.22541/au.159724632.29528907 + +Operating System: Authors mention that Docker is usually used with an image +of Ubuntu without precision about the version used. And Even if users take +care about the version, the image is updated monthly thus the image used +will have different OS components based on the generation time. This +difference in OS components will interfere on the reproducibility. I agree +on that, but I would like to add that it is a wrong habit of users. It is +possible to generate reproducible Docker images by generating it from an +ISO image of the OS. These ISO images are archived, at least for Ubuntu +(http://old-releases.ubuntu.com/releases) and for Debian +(https://cdimage.debian.org/mirror/cdimage/archive) thus allow users to +generate an OS with identical components. Combined with the +snapshot.debian.org service, it is even possible to update a Debian release +to a specific time point up to 2005 and with a precision of six hours. With +combination of both ISO image and snapshot.debian.org service it is +possible to obtain an OS for Docker or for a VM with identical components +even if users have to use the PM of the OS. Authors should add indication +that using good practices it is possible to use Docker or VM to obtain +identical OS usable for reproducible research. + +CPU architecture: The CPU architecture of the platform used to run the +workflow is not discussed in the manuscript. During software integration in +Debian, I have seen several software failing their unit tests due to +different behavior from itself or from a library dependency. This not +expected behavior was only present on non-x86 architectures, mainly because +developers use a x86 machine for their developments and tests. Bug or +feature? I don’t know, but nowadays, it is quite frequent to see computers +with a non-x86 CPU. It would be annoying to fail the reproducibility step +because of a different in CPU architecture. Authors should probably take +into account the architecture used in their workflow or at least report it. + +POSIX dependency: I don’t understand the "no dependency beyond +POSIX". Authors should more explained what they mean by this sentence. I +completely agree that the dependency hell must be avoided and dependencies +should be used with parsimony. Unfortunately, sometime we need proprietary +or specialized software to read raw data. For example in genetics, +micro-array raw data are stored in binary proprietary formats. To convert +this data into a plain text format, we need the proprietary software +provided with the measurement tool. + +Maneage: I was not able to properly set up a project with Maneage. The +configuration step failed during the download of tools used in the +workflow. This is probably due to a firewall/antivirus restriction out of +my control. How frequent this failure happen to users? Moreover, the time +to configure a new project is quite long because everything needs to be +compiled. Authors should compare the time required to set up a project +Maneage versus time used by other workflows to give an indication to the +readers. + +Disclaimer: For the sake of transparency, it should be noted that I am +involved in the development of Debian, thus my comments are probably +oriented. + +Additional Questions: + +1. How relevant is this manuscript to the readers of this periodical? + Please explain your rating in the Detailed Comments section.: Relevant + +2. To what extent is this manuscript relevant to readers around the world?: + The manuscript is of interest to readers throughout the world + +1. Please summarize what you view as the key point(s) of the manuscript and + the importance of the content to the readers of this periodical.: The + authors describe briefly the history of solutions proposed by + researchers to generate reproducible workflows. Then, they report the + problems with the current tools used to tackle the reproducible + problem. They propose a set of criteria to develop new reproducible + workflows and finally they describe their proof of concept workflow + called "Maneage". This manuscript could help researchers to improve + their workflow to obtain reproducible results. + +2. Is the manuscript technically sound? Please explain your answer in the + Detailed Comments section.: Yes + +3. What do you see as this manuscript's contribution to the literature in + this field?: The authors try to propose a simple answer to the + reproducibility problem by defining new criteria. They also propose a + proof of concept workflow which can be directly used by researchers for + their projects. + +4. What do you see as the strongest aspect of this manuscript?: This + manuscript describes a new reproducible workflow which doesn't require + another new trendy high-level software. The proposed workflow is only + based on low-level tools already widely known. Moreover, the workflow + takes into account the version of all software used in the chain of + dependencies. + +5. What do you see as the weakest aspect of this manuscript?: Authors don't + discuss the problem of results reproducibility when analysis are + performed using CPU with different architectures. Some libraries have + different behaviors when they ran on different architectures and it + could influence final results. Authors are probably talking about x86, + but there is no reference at all in the manuscript. + +1. Does the manuscript contain title, abstract, and/or keywords?: Yes + +2. Are the title, abstract, and keywords appropriate? Please elaborate in + the Detailed Comments section.: Yes + +3. Does the manuscript contain sufficient and appropriate references + (maximum 12-unless the article is a survey or tutorial in scope)? Please + elaborate in the Detailed Comments section.: References are sufficient + and appropriate + +4. Does the introduction clearly state a valid thesis? Please explain your + answer in the Detailed Comments section.: Yes + +5. How would you rate the organization of the manuscript? Please elaborate + in the Detailed Comments section.: Satisfactory + +6. Is the manuscript focused? Please elaborate in the Detailed Comments + section.: Satisfactory + +7. Is the length of the manuscript appropriate for the topic? Please + elaborate in the Detailed Comments section.: Satisfactory + +8. Please rate and comment on the readability of this manuscript in the + Detailed Comments section.: Easy to read + +9. Please rate and comment on the timeliness and long term interest of this + manuscript to CiSE readers in the Detailed Comments section. Select all + that apply.: Topic and content are of immediate and continuing interest + to CiSE readers + +Please rate the manuscript. Explain your choice in the Detailed Comments +section.: Good + +-------------------------------------------------- + + + + + +Reviewer: 3 +Recommendation: Accept If Certain Minor Revisions Are Made + +Comments: Longevity of workflows in a project is one of the problems for +reproducibility in different fields of computational research. Therefore, a +proposal that seeks to guarantee this longevity becomes relevant for the +entire community, especially when it is based on free software and is easy +to access and implement. + +GOODMAN et al., 2016, BARBA, 2018 and PLESSER, 2018 observed in their +research that the terms reproducibility and replicability are frequently +found in the scientific literature and their use interchangeably ends up +generating confusion due to the authors' lack of clarity. Thus, authors +should define their use of the term briefly for their readers. + +The introduction is consistent with the proposal of the article, but deals +with the tools separately, many of which can be used together to minimize +some of the problems presented. The use of Ansible, Helm, among others, +also helps in minimizing problems. When the authors use the Python example, +I believe it is interesting to point out that today version 2 has been +discontinued by the maintaining community, which creates another problem +within the perspective of the article. Regarding the use of VM's and +containers, I believe that the discussion presented by THAIN et al., 2015 +is interesting to increase essential points of the current work. About the +Singularity, the description article was missing (Kurtzer GM, Sochat V, +Bauer MW, 2017). I also believe that a reference to FAIR is interesting +(WILKINSON et al., 2016). + +In my opinion, the paragraph on IPOL seems to be out of context with the +previous ones. This issue of end-to-end reproducibility of a publication +could be better explored, which would further enrich the tool presented. + +The presentation of the longevity criteria was adequate in the context of +the article and explored the points that were dealt with later. + +The presentation of the tool was consistent. On the project website, I +suggest that the information contained in README-hacking be presented on +the same page as the Tutorial. A topic breakdown is interesting, as the +markdown reading may be too long to find information. + +Additional Questions: + +1. How relevant is this manuscript to the readers of this periodical? + Please explain your rating in the Detailed Comments section.: Relevant + +2. To what extent is this manuscript relevant to readers around the world?: + The manuscript is of interest to readers throughout the world + +1. Please summarize what you view as the key point(s) of the manuscript and + the importance of the content to the readers of this periodical.: In + this article, the authors discuss the problem of the longevity of + computational workflows, presenting what they consider to be criteria + for longevity and an implementation based on these criteria, called + Maneage, seeking to ensure a long lifespan for analysis projects. + +2. Is the manuscript technically sound? Please explain your answer in the + Detailed Comments section.: Yes + +3. What do you see as this manuscript's contribution to the literature in + this field?: In this article, the authors discuss the problem of the + longevity of computational workflows, presenting what they consider to + be criteria for longevity and an implementation based on these criteria, + called Maneage, seeking to ensure a long lifespan for analysis projects. + + As a key point, the authors enumerate quite clear criteria that can + guarantee the longevity of projects and present a free software-based + way of achieving this objective. The method presented by the authors is + not easy to implement for many end users, with low computer knowledge, + but it can be easily implemented by users with average knowledge in the + area. + +4. What do you see as the strongest aspect of this manuscript?: One of the + strengths of the manuscript is the implementation of Maneage entirely in + free software and the search for completeness presented in the + manuscript. The use of GNU software adds the guarantee of long + maintenance by one of the largest existing software communities. In + addition, the tool developed has already been tested in different + publications, showing itself consistent in different scenarios. + +5. What do you see as the weakest aspect of this manuscript?: For the + proper functioning of the proposed tool, the user needs prior knowledge + of LaTeX, GIT and the command line, which can keep inexperienced users + away. Likewise, the tool is suitable for Unix users, keeping users away + from Microsoft environments. + + Even though Unix-like environments are the majority in the areas of + scientific computing, many users still perform their analysis in + different areas on Windows computers or servers, with the assistance of + package managers. + +1. Does the manuscript contain title, abstract, and/or keywords?: Yes + +2. Are the title, abstract, and keywords appropriate? Please elaborate in + the Detailed Comments section.: Yes + +3. Does the manuscript contain sufficient and appropriate references + (maximum 12-unless the article is a survey or tutorial in scope)? Please + elaborate in the Detailed Comments section.: Important references are + missing; more references are needed + +4. Does the introduction clearly state a valid thesis? Please explain your + answer in the Detailed Comments section.: Could be improved + +5. How would you rate the organization of the manuscript? Please elaborate + in the Detailed Comments section.: Satisfactory + +6. Is the manuscript focused? Please elaborate in the Detailed Comments + section.: Could be improved + +7. Is the length of the manuscript appropriate for the topic? Please + elaborate in the Detailed Comments section.: Satisfactory + +8. Please rate and comment on the readability of this manuscript in the + Detailed Comments section.: Easy to read + +9. Please rate and comment on the timeliness and long term interest of this + manuscript to CiSE readers in the Detailed Comments section. Select all + that apply.: Topic and content are of immediate and continuing interest + to CiSE readers + +Please rate the manuscript. Explain your choice in the Detailed Comments +section.: Excellent + +-------------------------------------------------- + + + + + +Reviewer: 4 +Recommendation: Author Should Prepare A Major Revision For A Second Review + +Comments: Overall evaluation - Good. + +This paper is in scope, and the topic is of interest to the readers of +CiSE. However in its present form, I have concerns about whether the paper +presents enough new contributions to the area in a way that can then be +understood and reused by others. The main things I believe need addressing +are: 1) Revisit the criteria, show how you have come to decide on them, +give some examples of why they are important, and address potential missing +criteria. 2) Clarify the discussion of challenges to adoption and make it +clearer which tradeoffs are important to practitioners. 3) Be clearer about +which sorts of research workflow are best suited to this approach. + +B2.Technical soundness: here I am discussing the soundness of the paper, +rather than the soundness of the Maneage tool. There are some fundamental +additional challenges to reproducibility that are not addressed. Although +software library versions are addressed, there is also the challenge of +mathematical reproducibility, particularly of the handling of floating +point number, which might occur because of the way the code is written, and +the hardware architecture (including if code is optimised / +parallelised). This could obviously be addressed through a criterion around +how code is written, but this will also come with a tradeoff against +performance, which is never mentioned. Another tradeoff, which might affect +Criterion 3 is time to result - people use popular frameworks because it is +easier to use them. Regarding the discussion, I would liked to have seen +explanation of how these challenges to adoption were identified: was this +anecdotal, through surveys. participant observation? As a side note around +the technical aspects of Maneage - it is using LaTeX which in turn is built +on TeX which in turn has had many portability problems in the past due to +being written using WEB / Tangle, though with web2c this is largely now +resolved - potentially an interesting sidebar to investigate how LaTeX/TeX +has ensured its longevity! + +C2. The title is not specific enough - it should refer to the +reproducibility of workflows/projects. + +C4. As noted above, whilst the thesis stated is valid, it may not be useful +to practitioners of computation science and engineering as it stands. + +C6. Manuscript focus. I would have liked a more focussed approach to the +presentation of information in II. Longevity is not defined, and whilst +various tools are discussed and discarded, no attempt is made to categorise +the magnitude of longevity for which they are relevant. For instance, +environment isolators are regarded by the software preservation community +as adequate for timescale of the order of years, but may not be suitable +for the timescale of decades where porting and emulation are used. The +title of this section "Commonly used tools and their longevity" is also +confusing - do you mean the longevity of the tools or the longevity of the +workflows that can be produced using these tools? What happens if you use a +combination of all four categories of tools? + +C8. Readability. I found it difficult to follow the description of how +Maneage works. It wasn't clear to me if code was being run to generate the +results and figures in a LaTeX paper that is part of a project in +Maneage. It appears to be suggested this is the case, but Figure 1 doesn't +show how this works - it just has the LaTeX files, the data files and the +Makefiles. Is it being suggested that LaTeX itself is the programming +language, using its macro functionality? I was a bit confused on how +collaboration is handled as well - this appears to be using the Git +branching model, and the suggestion that Maneage is keeping track of all +components from all projects - but what happens if you are working with +collaborators that are using their own Maneage instance? + +I would also liked to have seen a comparison between this approach and +other "executable" paper approaches e.g. Jupyter notebooks, compared on +completeness, time taken to write a "paper", ease of depositing in a +repository, and ease of use by another researcher. + +Additional Questions: + +1. How relevant is this manuscript to the readers of this periodical? + Please explain your rating in the Detailed Comments section.: Relevant + +2. To what extent is this manuscript relevant to readers around the world?: + The manuscript is of interest to readers throughout the world + +1. Please summarize what you view as the key point(s) of the manuscript and + the importance of the content to the readers of this periodical.: This + manuscript discusses the challenges of reproducibility of computational + research workflows, suggests criteria for improving the "longevity" of + workflows, describes the proof-of-concept tool, Maneage, that has been + built to implement these criteria, and discusses the challenges to + adoption. + + Of primary importance is the discussion of the challenges to adoption, + as CiSE is about computational science which does not take place in a + theoretical vacuum. Many of the identified challenges relate to the + practice of computational science and the implementation of systems in + the real world. + +2. Is the manuscript technically sound? Please explain your answer in the + Detailed Comments section.: Partially + +3. What do you see as this manuscript's contribution to the literature in + this field?: The manuscript makes a modest contribution to the + literature through the description of the proof-of-concept, in + particular its approach to integrating asset management, version control + and build and the discussion of challenges to adoption. + + The proposed criteria have mostly been discussed at length in many other + works looking at computational reproducibility and executable papers. + +4. What do you see as the strongest aspect of this manuscript?: The + strongest aspect is the discussion of difficulties for widespread + adoption of this sort of approach. Because the proof-of-concept tool + received support through the RDA, it was possible to get feedback from + researchers who were likely to use it. This has highlighted and + reinforced a number of challenges and caveats. + +5. What do you see as the weakest aspect of this manuscript?: The weakest + aspect is the assumption that research can be easily compartmentalized + into simple and complete packages. Given that so much of research + involves collaboration and interaction, this is not sufficiently + addressed. In particular, the challenge of interdisciplinary work, where + there may not be common languages to describe concepts and there may be + different common workflow practices will be a barrier to wider adoption + of the primary thesis and criteria. + +1. Does the manuscript contain title, abstract, and/or keywords?: Yes + +2. Are the title, abstract, and keywords appropriate? Please elaborate in + the Detailed Comments section.: No + +3. Does the manuscript contain sufficient and appropriate references + (maximum 12-unless the article is a survey or tutorial in scope)? Please + elaborate in the Detailed Comments section.: References are sufficient + and appropriate + +4. Does the introduction clearly state a valid thesis? Please explain your + answer in the Detailed Comments section.: Could be improved + +5. How would you rate the organization of the manuscript? Please elaborate + in the Detailed Comments section.: Satisfactory + +6. Is the manuscript focused? Please elaborate in the Detailed Comments + section.: Could be improved + +7. Is the length of the manuscript appropriate for the topic? Please + elaborate in the Detailed Comments section.: Satisfactory + +8. Please rate and comment on the readability of this manuscript in the + Detailed Comments section.: Readable - but requires some effort to + understand + +9. Please rate and comment on the timeliness and long term interest of this + manuscript to CiSE readers in the Detailed Comments section. Select all + that apply.: Topic and content are of immediate and continuing interest + to CiSE readers + +Please rate the manuscript. Explain your choice in the Detailed Comments +section.: Good + +-------------------------------------------------- + + + + + +Reviewer: 5 +Recommendation: Author Should Prepare A Major Revision For A Second Review + +Comments: + +Major figures currently working in this exact field do not have their work +acknowledged in this work. In no particular order: Victoria Stodden, +Michael Heroux, Michela Taufer, and Ivo Jimenez. All of these authors have +multiple publications that are highly relevant to this paper. In the case +of Ivo Jimenez, his Popper work [Jimenez I, Sevilla M, Watkins N, Maltzahn +C, Lofstead J, Mohror K, Arpaci-Dusseau A, Arpaci-Dusseau R. The popper +convention: Making reproducible systems evaluation practical. In2017 IEEE +International Parallel and Distributed Processing Symposium Workshops +(IPDPSW) 2017 May 29 (pp. 1561-1570). IEEE.] and the later revision that +uses GitHub Actions, is largely the same as this work. The lack of +attention to virtual machines and containers is highly problematic. While a +reader cannot rely on DockerHub or a generic OS version label for a VM or +container, these are some of the most promising tools for offering true +reproducibility. On the data side, containers have the promise to manage +data sets and workflows completely [Lofstead J, Baker J, Younge A. Data +pallets: containerizing storage for reproducibility and +traceability. InInternational Conference on High Performance Computing 2019 +Jun 16 (pp. 36-45). Springer, Cham.] Taufer has picked up this work and has +graduated a MS student working on this topic with a published thesis. See +also Jimenez's P-RECS workshop at HPDC for additional work highly relevant +to this paper. + +Some other systems that do similar things include: reprozip, occam, whole +tale, snakemake. + +While the work here is a good start, the paper needs to include the context +of the current community development level to be a complete research +paper. A revision that includes evaluation of (using the criteria) and +comparison with the suggested systems and a related work section that +seriously evaluates the work of the recommended authors, among others, +would make this paper worthy for publication. + +Additional Questions: + +1. How relevant is this manuscript to the readers of this periodical? + Please explain your rating in the Detailed Comments section.: Very + Relevant + +2. To what extent is this manuscript relevant to readers around the world?: + The manuscript is of interest to readers throughout the world + +1. Please summarize what you view as the key point(s) of the manuscript and + the importance of the content to the readers of this periodical.: This + paper describes the Maneage system for reproducibile workflows. It lays + out a bit of the need, has very limited related work, and offers + criteria any system that offers reproducibility should have, and finally + describes how Maneage achieves these goals. + +2. Is the manuscript technically sound? Please explain your answer in the + Detailed Comments section.: Partially + +3. What do you see as this manuscript's contribution to the literature in + this field?: Yet another example of a reproducible workflows + project. There are numerous examples, mostly domain specific, and this + one is not the most advanced general solution. + +4. What do you see as the strongest aspect of this manuscript?: Working + code and published artifacts + +5. What do you see as the weakest aspect of this manuscript?: Lack of + context in the field missing very relevant work that eliminates much, if + not all, of the novelty of this work. + +1. Does the manuscript contain title, abstract, and/or keywords?: Yes + +2. Are the title, abstract, and keywords appropriate? Please elaborate in + the Detailed Comments section.: Yes + +3. Does the manuscript contain sufficient and appropriate references + (maximum 12-unless the article is a survey or tutorial in scope)? Please + elaborate in the Detailed Comments section.: Important references are + missing; more references are needed + +4. Does the introduction clearly state a valid thesis? Please explain your + answer in the Detailed Comments section.: Could be improved + +5. How would you rate the organization of the manuscript? Please elaborate + in the Detailed Comments section.: Satisfactory + +6. Is the manuscript focused? Please elaborate in the Detailed Comments + section.: Could be improved + +7. Is the length of the manuscript appropriate for the topic? Please + elaborate in the Detailed Comments section.: Could be improved + +8. Please rate and comment on the readability of this manuscript in the + Detailed Comaments section.: Easy to read + +9. Please rate and comment on the timeliness and long term interest of this + manuscript to CiSE readers in the Detailed Comments section. Select all + that apply.: Topic and content are likely to be of growing interest to + CiSE readers over the next 12 months + +Please rate the manuscript. Explain your choice in the Detailed Comments +section.: Fair |