diff options
-rw-r--r-- | paper.tex | 32 | ||||
-rw-r--r-- | peer-review/1-answer.txt | 102 | ||||
-rwxr-xr-x | project | 4 | ||||
-rw-r--r-- | reproduce/analysis/config/metadata.conf | 2 |
4 files changed, 88 insertions, 52 deletions
@@ -79,7 +79,7 @@ at the end (Appendices \ref{appendix:existingtools} and \ref{appendix:existingso \emph{Reproducible supplement} --- All products in \href{https://doi.org/10.5281/zenodo.\projectzenodoid}{\texttt{zenodo.\projectzenodoid}}, Git history of source at \href{https://gitlab.com/makhlaghi/maneage-paper}{\texttt{gitlab.com/makhlaghi/maneage-paper}}, - which is also archived on \href{https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://gitlab.com/makhlaghi/maneage-paper.git}{SoftwareHeritage}. + which is also archived in \href{https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://gitlab.com/makhlaghi/maneage-paper.git}{SoftwareHeritage}. \end{abstract} % Note that keywords are not normally used for peer-review papers. @@ -126,9 +126,10 @@ Decades later, scientists are still held accountable for their results and there \section{Longevity of existing tools} \label{sec:longevityofexisting} \new{Reproducibility is defined as ``obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis'' \cite{fineberg19}. -Longevity is defined as the length of time during which a project remains usable. -Usability is defined by context: for machines (machine-actionable, or executable files) \emph{and} humans (readability of the source). -Many usage contexts do not involve execution: for example, checking the configuration parameter of a single step of the analysis to re-\emph{use} in another project, or checking the version of used software, or the source of the input data (extracting these from the outputs of execution is not always possible).} +Longevity is defined as the length of time that a project remains \emph{usable}. +Usability is defined by context: for machines (machine-actionable, or executable files) \emph{and/or} humans (readability of the source). +Many usage contexts do not involve execution: for example, checking the configuration parameter of a single step of the analysis to re-\emph{use} in another project, or checking the version of used software, or the source of the input data. +Extracting these from the outputs of execution is not always possible.} Longevity is as important in science as in some fields of industry, but not all; e.g., fast-evolving tools can be appropriate in short-term commercial projects. To highlight the necessity, a short review of commonly-used tools is provided below: @@ -390,7 +391,7 @@ Other built files (intermediate analysis steps) cascade down in the lineage to o Just before reaching the ultimate target (\inlinecode{paper.pdf}), the lineage reaches a bottleneck in \inlinecode{verify.mk} to satisfy the verification criteria (this step was not yet available in \cite{akhlaghi19, infante20}). All project deliverables (macro files, plot or table data and other datasets) are verified at this stage, with their checksums, to automatically ensure exact reproducibility. -Where exact reproducibility is not possible, values can be verified by any statistical means, specified by the project authors. +Where exact reproducibility is not possible \new{(for example due to parallelization)}, values can be verified by any statistical means, specified by the project authors. \begin{figure*}[t] \begin{center} \includetikz{figure-branching}{scale=1}\end{center} @@ -498,13 +499,14 @@ However, because the PM and analysis components share the same job manager (Make They later share their low-level commits on the core branch, thus propagating it to all derived projects. A related caveat is that, POSIX is a fuzzy standard, not guaranteeing the bit-wise reproducibility of programs. -It has been chosen here, however, as the underlying platform because our focus is on reproducing the results (data), which does not necessarily need bit-wise reproducible software. -POSIX is ubiquitous and low-level software (e.g., core GNU tools) are install-able on most; each internally corrects for differences affecting its functionality (partly as part of the GNU portability library). +It has been chosen here, however, as the underlying platform \new{because our focus is on reproducing the results (output of software), not the software itself.} +POSIX is ubiquitous and low-level software (e.g., core GNU tools) are install-able on most. +Well written software internally corrects for differences in OS or hardware that may affect its functionality (through tools like the GNU portability library). On GNU/Linux hosts, Maneage builds precise versions of the compilation tool chain. -However, glibc is not install-able on some POSIX OSs (e.g., macOS). -All programs link with the C library, and this may hypothetically hinder the exact reproducibility \emph{of results} on non-GNU/Linux systems, but we have not encountered this in our research so far. -With everything else under precise control, the effect of differing Kernel and C libraries on high-level science can now be systematically studied with Maneage in follow-up research. -\new{Using continuous integration (CI) is one way to precisely identify breaking points with updated technologies on available systems.} +However, glibc is not install-able on some POSIX OSs (e.g., macOS) and all programs link with the C library. +This may hypothetically hinder the exact reproducibility \emph{of results} on non-GNU/Linux systems, but we have not encountered this in our research so far. +With everything else under precise control in Maneage, the effect of differing Kernel and C libraries on high-level science can now be systematically studied in follow-up research \new{(including floating-point arithmetic or optimization differences). +Using continuous integration (CI) is one way to precisely identify breaking points on multiple systems.} % DVG: It is a pity that the following paragraph cannot be included, as it is really important but perhaps goes beyond the intended goal. %Thirdly, publishing a project's reproducible data lineage immediately after publication enables others to continue with follow-up papers, which may provide unwanted competition against the original authors. @@ -528,7 +530,7 @@ From the data repository perspective, these criteria can also be useful, e.g., t (2) Automated and persistent bidirectional linking of data and publication can be established through the published \emph{and complete} data lineage that is under version control. (3) Software management: with these criteria, each project comes with its unique and complete software management. It does not use a third-party PM that needs to be maintained by the data center (and the many versions of the PM), hence enabling robust software management, preservation, publishing, and citation. -For example, see \href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}, \href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}, \href{https://doi.org/10.5281/zenodo.1163746}{zenodo.1163746}, where we have exploited the free-software criterion to distribute the tarballs of all the software used with each project's source as deliverables. +For example, see \href{https://doi.org/10.5281/zenodo.1163746}{zenodo.1163746}, \href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}, \href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}, \href{https://doi.org/10.5281/zenodo.3951151}{zenodo.3951151} or \href{https://doi.org/10.5281/zenodo.4062460}{zenodo.4062460} where we have exploited the free-software criterion to distribute the source code of all software used in each project as deliverables. (4) ``Linkages between documentation, code, data, and journal articles in an integrated environment'', which effectively summarizes the whole purpose of these criteria. @@ -1098,7 +1100,7 @@ In summary IDEs are generally very specialized tools, for special projects and a \label{appendix:jupyter} Jupyter (initially IPython) \citeappendix{kluyver16} is an implementation of Literate Programming \citeappendix{knuth84}. The main user interface is a web-based ``notebook'' that contains blobs of executable code and narrative. -Jupyter uses the custom built \inlinecode{.ipynb} format\footnote{\url{https://nbformat.readthedocs.io/en/latest}}. +Jupyter uses the custom built \inlinecode{.ipynb} format\footnote{\inlinecode{\url{https://nbformat.readthedocs.io/en/latest}}}. Jupyter's name is a combination of the three main languages it was designed for: Julia, Python and R. The \inlinecode{.ipynb} format, is a simple, human-readable (can be opened in a plain-text editor) file, formatted in JavaScript Object Notation (JSON). It contains various kinds of ``cells'', or blobs, that can contain narrative description, code, or multi-media visualizations (for example images/plots), that are all stored in one file. @@ -1110,7 +1112,7 @@ Defining dependencies between the cells can allow non-linear execution which is It allows automation, run-time optimization (deciding not to run a cell if its not necessary) and parallelization. However, Jupyter currently only supports a linear run of the cells: always from the start to the end. It is possible to manually execute only one cell, but the previous/next cells that may depend on it, also have to be manually run (a common source of human error, and frustration for complex operations). -Integration of directional graph features (dependencies between the cells) into Jupyter has been discussed, but as of this publication, there is no plan to implement it (see Jupyter's GitHub issue 1175\footnote{\url{https://github.com/jupyter/notebook/issues/1175}}). +Integration of directional graph features (dependencies between the cells) into Jupyter has been discussed, but as of this publication, there is no plan to implement it (see Jupyter's GitHub issue 1175\footnote{\inlinecode{\url{https://github.com/jupyter/notebook/issues/1175}}}). The fact that the \inlinecode{.ipynb} format stores narrative text, code and multi-media visualization of the outputs in one file, is another major hurdle: The files can easy become very large (in volume/bytes) and hard to read. @@ -1302,7 +1304,7 @@ Taverna is only a workflow manager and isn't integrated with a package manager, \label{appendix:madagascar} Madagascar\footnote{\inlinecode{\url{http://ahay.org}}} \citeappendix{fomel13} is a set of extensions to the SCons job management tool (reviewed in \ref{appendix:scons}). Madagascar is a continuation of the Reproducible Electronic Documents (RED) project that was discussed in Appendix \ref{appendix:red}. -Madagascar has been used in the production of hundreds of research papers or book chapters\footnote{\url{http://www.ahay.org/wiki/Reproducible_Documents}}, 120 prior to \citeappendix{fomel13}. +Madagascar has been used in the production of hundreds of research papers or book chapters\footnote{\inlinecode{\url{http://www.ahay.org/wiki/Reproducible_Documents}}}, 120 prior to \citeappendix{fomel13}. Madagascar does include project management tools in the form of SCons extensions. However, it isn't just a reproducible project management tool. diff --git a/peer-review/1-answer.txt b/peer-review/1-answer.txt index 5e612f8..e0b0da1 100644 --- a/peer-review/1-answer.txt +++ b/peer-review/1-answer.txt @@ -32,9 +32,10 @@ reader can easily access them. 2. [Associate Editor] There are general concerns about the paper lacking focus -########################### -ANSWER: -########################### +ANSWER: With all the corrections/clarifications that have been done in this +review the focus of the paper should be clear now. We are very grateful to +the thorough listing of points by the referees. + ------------------------------ @@ -45,9 +46,10 @@ ANSWER: 3. [Associate Editor] Some terminology is not well-defined (e.g. longevity). -ANSWER: Longevity has now been defined in the first paragraph of Section -II. With this definition, the main argument of the paper is clearer, -thank you (and thank you to the referees for highlighting this). +ANSWER: Reproducibility, Longevity and Usage have now been explicitly +defined in the first paragraph of Section II. With this definition, the +main argument of the paper is clearer, thank you (and thank you to the +referees for highlighting this). ------------------------------ @@ -132,10 +134,13 @@ future. is on the article, it is important that readers not be confused when they visit your site to use your tools. -########################### -ANSWER [NOT COMPLETE]: We should separate the various sections of the -README-hacking.md webpage into smaller pages that can be entered. -########################### +ANSWER: Thank you for raising this important point. We have broken down the +very long "About" page into multiple pages to help in readability: + +https://maneage.org/about.html + +Generally, the webpage will soon undergo major improvements to be even more +clear. ------------------------------ @@ -597,9 +602,9 @@ level of peer-review control. Tutorial. A topic breakdown is interesting, as the markdown reading may be too long to find information. -##################################### -ANSWER: -##################################### +ANSWER: Thank you very much for this good suggestion, it has been +implemented: https://maneage.org/about.html . The webpage will continuously +be improved and such feedback is always very welcome. ------------------------------ @@ -691,9 +696,12 @@ highly modular and flexible nature of Makefiles run via 'Make'. which might occur because of the way the code is written, and the hardware architecture (including if code is optimised / parallelised). -################################ -ANSWER: -################################ +ANSWER: Floating point errors and optimizations have been mentioned in the +discussion (Section V). The issue with parallelization has also been +discussed in Section IV, in the part on verification ("Where exact +reproducibility is not possible (for example due to paralleliza- tion), +values can be verified by any statistical means, specified by the project +authors."). ------------------------------ @@ -703,18 +711,32 @@ ANSWER: 37. [Reviewer 4] Performance ... is never mentioned -################################ -ANSWER: -################################ +ANSWER: Performance is indeed an important issue for _immediate_ +reproducibility and we would have liked to discuss it. But due to the +strict word-count, we feel that adding it to the discussion points, without +having adequate space to elaborate, can confuse the readers of this paper +(which is focused on long term usability). ------------------------------ + + + + 38. [Reviewer 4] Tradeoff, which might affect Criterion 3 is time to result, people use popular frameworks because it is easier to use them. -################################ -ANSWER: -################################ +ANSWER: That is true. In section IV, we have given the time it takes to +build Maneage (only once for a project on each computer) to be around 1.5 +hours on an 8-core CPU (a typical machine that may be used for data +analysis). We therefore conclude that when the analysis is complex (and +thus taking many hours or days to complete), this time is negligible. + +But if the project's full analysis takes 10 minutes or less (like the +extremely simple analysis done in this paper which takes a fraction of a +second). Indeed, the 1.5 hour building time is significant. In those cases, +as discussed in the main body, the project can be built once in a Docker +image and easily moved to other computers. ------------------------------ @@ -747,9 +769,13 @@ there. 40. [Reviewer 4] Potentially an interesting sidebar to investigate how LaTeX/TeX has ensured its longevity! -############################## -ANSWER: -############################## +ANSWER: That is indeed a very interesting subject to study. We have been in +touch with Karl Berry (one of the core people behind TeX Live, who also +plays a prominent role in GNU) and have whitnessed the TeX Live community's +efforts to become more and more portable and longer-lived. But after +looking at the strict word limit, we couldn't find a place to highlight +this. But it is indeed a subject worthy of a full paper (that can be very +useful for many software projects0.. ------------------------------ @@ -760,9 +786,11 @@ ANSWER: 41. [Reviewer 4] The title is not specific enough - it should refer to the reproducibility of workflows/projects. -############################## -ANSWER: -############################## +ANSWER: Since this journal is focused on "Computing in Science and +Engineering", the fact that it relates to computational workflows will be +clear to any reader. Since the other referees didn't complain about this, +we will keep it as it was, but of course, we are open to the suggestions of +the editors in the final title. ------------------------------ @@ -820,9 +848,16 @@ determined by the host kernel, usually a decade", for Python packages: longevity of the workflows that can be produced using these tools? What happens if you use a combination of all four categories of tools? -########################## -ANSWER: -########################## +ANSWER: Thank you for highlighting this. The title has been shortend and +the section immediately starts with definitions. + +The aspects of the tools discussed in this section are orthogonal to each +other. For example a VM/container, package manager, notebook: some projects +may have any different combinations of the three. In some aspects using +them together can improve the operations, but for example building a +VM/container with or without a package manager makes no difference on the +main issue we raise about containers (that they are large binary blobs that +don't necessarily contain how the environment within them was built). ------------------------------ @@ -888,9 +923,8 @@ project. This can be generalized to any Git based collaboration model. ease of depositing in a repository, and ease of use by another researcher. -####################### -ANSWER: -####################### +ANSWER: These have been highlighted in various parts of the text (also +reviewed in previous points). ------------------------------ @@ -505,9 +505,9 @@ case $operation in if [ -f paper.pdf ]; then if type pdftotext > /dev/null 2>/dev/null; then numwords=$(pdftotext paper.pdf && cat paper.txt | wc -w) - numeff=$(echo $numwords | awk '{print $1-850}') + numeff=$(echo $numwords | awk '{print $1-850+500}') echo; echo "Number of words in full PDF: $numwords" - echo "No abstract, and figure captions: $numeff" + echo "No abstract, and captions (250 for each figure): $numeff" rm paper.txt fi fi diff --git a/reproduce/analysis/config/metadata.conf b/reproduce/analysis/config/metadata.conf index 07a1145..cdf0e5a 100644 --- a/reproduce/analysis/config/metadata.conf +++ b/reproduce/analysis/config/metadata.conf @@ -14,7 +14,7 @@ metadata-title = Towards Long-term and Archivable Reproducibility # DOIs and identifiers. metadata-arxiv = 2006.03018 -metadata-doi-zenodo = https://doi.org/10.5281/zenodo.3911395 +metadata-doi-zenodo = https://doi.org/10.5281/zenodo.4291207 metadata-doi-journal = metadata-doi = $(metadata-doi-zenodo) metadata-git-repository = https://gitlab.com/makhlaghi/maneage-paper |