aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--paper.tex32
-rw-r--r--peer-review/1-answer.txt154
-rwxr-xr-xproject4
-rw-r--r--reproduce/analysis/config/metadata.conf2
4 files changed, 105 insertions, 87 deletions
diff --git a/paper.tex b/paper.tex
index e19a7df..b7d4d25 100644
--- a/paper.tex
+++ b/paper.tex
@@ -79,7 +79,7 @@ at the end (Appendices \ref{appendix:existingtools} and \ref{appendix:existingso
\emph{Reproducible supplement} ---
All products in \href{https://doi.org/10.5281/zenodo.\projectzenodoid}{\texttt{zenodo.\projectzenodoid}},
Git history of source at \href{https://gitlab.com/makhlaghi/maneage-paper}{\texttt{gitlab.com/makhlaghi/maneage-paper}},
- which is also archived at \href{https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://gitlab.com/makhlaghi/maneage-paper.git}{SoftwareHeritage}.
+ which is also archived in \href{https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://gitlab.com/makhlaghi/maneage-paper.git}{SoftwareHeritage}.
\end{abstract}
% Note that keywords are not normally used for peer-review papers.
@@ -126,9 +126,10 @@ Decades later, scientists are still held accountable for their results and there
\section{Longevity of existing tools}
\label{sec:longevityofexisting}
\new{Reproducibility is defined as ``obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis'' \cite{fineberg19}.
-Longevity is defined as the length of time during which a project remains usable.
-Usability is defined by context: for machines (machine-actionable, or executable files) \emph{and} humans (readability of the source).
-Many usage contexts do not involve execution: for example, checking the configuration parameter of a single step of the analysis to re-\emph{use} in another project, or checking the version of used software, or the source of the input data (extracting these from the outputs of execution is not always possible).}
+Longevity is defined as the length of time that a project remains \emph{usable}.
+Usability is defined by context: for machines (machine-actionable, or executable files) \emph{and/or} humans (readability of the source).
+Many usage contexts do not involve execution: for example, checking the configuration parameter of a single step of the analysis to re-\emph{use} in another project, or checking the version of used software, or the source of the input data.
+Extracting these from the outputs of execution is not always possible.}
Longevity is as important in science as in some fields of industry, but not all; e.g., fast-evolving tools can be appropriate in short-term commercial projects.
To highlight the necessity, a short review of commonly-used tools is provided below:
@@ -390,7 +391,7 @@ Other built files (intermediate analysis steps) cascade down in the lineage to o
Just before reaching the ultimate target (\inlinecode{paper.pdf}), the lineage reaches a bottleneck in \inlinecode{verify.mk} to satisfy the verification criteria (this step was not yet available in \cite{akhlaghi19, infante20}).
All project deliverables (macro files, plot or table data and other datasets) are verified at this stage, with their checksums, to automatically ensure exact reproducibility.
-Where exact reproducibility is not possible, values can be verified by any statistical means, specified by the project authors.
+Where exact reproducibility is not possible \new{(for example due to parallelization)}, values can be verified by any statistical means, specified by the project authors.
\begin{figure*}[t]
\begin{center} \includetikz{figure-branching}{scale=1}\end{center}
@@ -498,13 +499,14 @@ However, because the PM and analysis components share the same job manager (Make
They later share their low-level commits on the core branch, thus propagating it to all derived projects.
A related caveat is that, POSIX is a fuzzy standard, not guaranteeing the bit-wise reproducibility of programs.
-It has been chosen here, however, as the underlying platform because our focus is on reproducing the results (data), which does not necessarily need bit-wise reproducible software.
-POSIX is ubiquitous and low-level software (e.g., core GNU tools) are install-able on most; each internally corrects for differences affecting its functionality (partly as part of the GNU portability library).
+It has been chosen here, however, as the underlying platform \new{because our focus is on reproducing the results (output of software), not the software itself.}
+POSIX is ubiquitous and low-level software (e.g., core GNU tools) are install-able on most.
+Well written software internally corrects for differences in OS or hardware that may affect its functionality (through tools like the GNU portability library).
On GNU/Linux hosts, Maneage builds precise versions of the compilation tool chain.
-However, glibc is not install-able on some POSIX OSs (e.g., macOS).
-All programs link with the C library, and this may hypothetically hinder the exact reproducibility \emph{of results} on non-GNU/Linux systems, but we have not encountered this in our research so far.
-With everything else under precise control, the effect of differing Kernel and C libraries on high-level science can now be systematically studied with Maneage in follow-up research.
-\new{Using continuous integration (CI) is one way to precisely identify breaking points with updated technologies on available systems.}
+However, glibc is not install-able on some POSIX OSs (e.g., macOS) and all programs link with the C library.
+This may hypothetically hinder the exact reproducibility \emph{of results} on non-GNU/Linux systems, but we have not encountered this in our research so far.
+With everything else under precise control in Maneage, the effect of differing Kernel and C libraries on high-level science can now be systematically studied in follow-up research \new{(including floating-point arithmetic or optimization differences).
+Using continuous integration (CI) is one way to precisely identify breaking points on multiple systems.}
% DVG: It is a pity that the following paragraph cannot be included, as it is really important but perhaps goes beyond the intended goal.
%Thirdly, publishing a project's reproducible data lineage immediately after publication enables others to continue with follow-up papers, which may provide unwanted competition against the original authors.
@@ -528,7 +530,7 @@ From the data repository perspective, these criteria can also be useful, e.g., t
(2) Automated and persistent bidirectional linking of data and publication can be established through the published \emph{and complete} data lineage that is under version control.
(3) Software management: with these criteria, each project comes with its unique and complete software management.
It does not use a third-party PM that needs to be maintained by the data center (and the many versions of the PM), hence enabling robust software management, preservation, publishing, and citation.
-For example, see \href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}, \href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}, \href{https://doi.org/10.5281/zenodo.1163746}{zenodo.1163746}, where we have exploited the free-software criterion to distribute the tarballs of all the software used with each project's source as deliverables.
+For example, see \href{https://doi.org/10.5281/zenodo.1163746}{zenodo.1163746}, \href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}, \href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}, \href{https://doi.org/10.5281/zenodo.3951151}{zenodo.3951151} or \href{https://doi.org/10.5281/zenodo.4062460}{zenodo.4062460} where we have exploited the free-software criterion to distribute the source code of all software used in each project as deliverables.
(4) ``Linkages between documentation, code, data, and journal articles in an integrated environment'', which effectively summarizes the whole purpose of these criteria.
@@ -1098,7 +1100,7 @@ In summary IDEs are generally very specialized tools, for special projects and a
\label{appendix:jupyter}
Jupyter (initially IPython) \citeappendix{kluyver16} is an implementation of Literate Programming \citeappendix{knuth84}.
The main user interface is a web-based ``notebook'' that contains blobs of executable code and narrative.
-Jupyter uses the custom built \inlinecode{.ipynb} format\footnote{\url{https://nbformat.readthedocs.io/en/latest}}.
+Jupyter uses the custom built \inlinecode{.ipynb} format\footnote{\inlinecode{\url{https://nbformat.readthedocs.io/en/latest}}}.
Jupyter's name is a combination of the three main languages it was designed for: Julia, Python and R.
The \inlinecode{.ipynb} format, is a simple, human-readable (can be opened in a plain-text editor) file, formatted in JavaScript Object Notation (JSON).
It contains various kinds of ``cells'', or blobs, that can contain narrative description, code, or multi-media visualizations (for example images/plots), that are all stored in one file.
@@ -1110,7 +1112,7 @@ Defining dependencies between the cells can allow non-linear execution which is
It allows automation, run-time optimization (deciding not to run a cell if its not necessary) and parallelization.
However, Jupyter currently only supports a linear run of the cells: always from the start to the end.
It is possible to manually execute only one cell, but the previous/next cells that may depend on it, also have to be manually run (a common source of human error, and frustration for complex operations).
-Integration of directional graph features (dependencies between the cells) into Jupyter has been discussed, but as of this publication, there is no plan to implement it (see Jupyter's GitHub issue 1175\footnote{\url{https://github.com/jupyter/notebook/issues/1175}}).
+Integration of directional graph features (dependencies between the cells) into Jupyter has been discussed, but as of this publication, there is no plan to implement it (see Jupyter's GitHub issue 1175\footnote{\inlinecode{\url{https://github.com/jupyter/notebook/issues/1175}}}).
The fact that the \inlinecode{.ipynb} format stores narrative text, code and multi-media visualization of the outputs in one file, is another major hurdle:
The files can easy become very large (in volume/bytes) and hard to read.
@@ -1302,7 +1304,7 @@ Taverna is only a workflow manager and isn't integrated with a package manager,
\label{appendix:madagascar}
Madagascar\footnote{\inlinecode{\url{http://ahay.org}}} \citeappendix{fomel13} is a set of extensions to the SCons job management tool (reviewed in \ref{appendix:scons}).
Madagascar is a continuation of the Reproducible Electronic Documents (RED) project that was discussed in Appendix \ref{appendix:red}.
-Madagascar has been used in the production of hundreds of research papers or book chapters\footnote{\url{http://www.ahay.org/wiki/Reproducible_Documents}}, 120 prior to \citeappendix{fomel13}.
+Madagascar has been used in the production of hundreds of research papers or book chapters\footnote{\inlinecode{\url{http://www.ahay.org/wiki/Reproducible_Documents}}}, 120 prior to \citeappendix{fomel13}.
Madagascar does include project management tools in the form of SCons extensions.
However, it isn't just a reproducible project management tool.
diff --git a/peer-review/1-answer.txt b/peer-review/1-answer.txt
index 5c27866..b837ce4 100644
--- a/peer-review/1-answer.txt
+++ b/peer-review/1-answer.txt
@@ -32,10 +32,9 @@ reader can easily access them.
2. [Associate Editor] There are general concerns about the paper
lacking focus
-ANSWER: We believe that by responding to the specific concerns raised
-by the reviewers, as detailed below, we have tightened the focus of
-the paper.
-
+ANSWER: With all the corrections/clarifications that have been done in this
+review the focus of the paper should be clear now. We are very grateful to
+the thorough listing of points by the referees.
------------------------------
@@ -46,9 +45,10 @@ the paper.
3. [Associate Editor] Some terminology is not well-defined
(e.g. longevity).
-ANSWER: Longevity has now been defined in the first paragraph of Section
-II. With this definition, the main argument of the paper is clearer,
-thank you (and thank you to the referees for highlighting this).
+ANSWER: Reproducibility, Longevity and Usage have now been explicitly
+defined in the first paragraph of Section II. With this definition, the
+main argument of the paper is clearer, thank you (and thank you to the
+referees for highlighting this).
------------------------------
@@ -133,14 +133,16 @@ future.
is on the article, it is important that readers not be confused
when they visit your site to use your tools.
-ANSWER: Improving the consistency between this research paper and
-the Maneage website is a useful recommendation. We have listed
-this together with point 29 below at
-https://savannah.nongnu.org/task/index.php?15823
-on the Maneage development task list. As indicated there, the
-website is developed on a public git repository, so any specific
-proposals for improvements can be handled efficiently and
-transparently.
+ANSWER: Thank you for raising this important point. We have broken down the
+very long "About" page into multiple pages to help in readability:
+
+https://maneage.org/about.html
+
+Generally, the webpage will soon undergo major improvements to be even more
+clear. The website is developed on a public git repository
+(https://git.maneage.org/webpage.git), so any specific proposals for
+improvements can be handled efficiently and transparently and we welcome
+any feedback in this aspect.
------------------------------
@@ -602,9 +604,9 @@ level of peer-review control.
Tutorial. A topic breakdown is interesting, as the markdown reading may
be too long to find information.
-ANSWER: Thank you for the very useful suggestion. We have listed this as
-a task at https://savannah.nongnu.org/task/index.php?15823 .
-
+ANSWER: Thank you very much for this good suggestion, it has been
+implemented: https://maneage.org/about.html . The webpage will continuously
+be improved and such feedback is always very welcome.
------------------------------
@@ -696,18 +698,17 @@ highly modular and flexible nature of Makefiles run via 'Make'.
which might occur because of the way the code is written, and the
hardware architecture (including if code is optimised / parallelised).
+ANSWER: Floating point errors and optimizations have been mentioned in the
+discussion (Section V). The issue with parallelization has also been
+discussed in Section IV, in the part on verification ("Where exact
+reproducibility is not possible (for example due to parallelization),
+values can be verified by any statistical means, specified by the project
+authors.").
-ANSWER: The authors of particular projects have to choose the level
-floating point reproducibility that they judge viable. In section IV,
-within the 6500-word limit, this is briefly described in the discussion
-of the "verify.mk" rule file. The main paragraph is "Just before reaching ...
-All project deliverables ... are verified ... with their checksums, to
-automatically ensure exact reproducibility. .... [or] by any statistical
-means, specified by the project authors."
-
-We have added a brief reference to zenodo.3951151, pointing out that
-it illustrates an approach for statistical verifiability of
-parallelised code using Maneage.
+#####################
+Find a good way to link to (Peper and Roukema:
+https://doi.org/10.5281/zenodo.4062460)
+#####################
------------------------------
@@ -719,26 +720,44 @@ parallelised code using Maneage.
[reproducibility] ... will come with a tradeoff agianst
performance, which is never mentioned.
-ANSWER: The criteria we propose and the proof-of-concept with
-Maneage do not force the choice of a tradeoff between exact bitwise
-floating point reproducibility versus performance (e.g. speed). The
-specific concepts of "verification" and "reproducibility" will vary
-between domains of scientific computation, but we expect that the
-criteria allow this wide range. We did not add text on this point.
+ANSWER: The criteria we propose and the proof-of-concept with Maneage do
+not force the choice of a tradeoff between exact bitwise floating point
+reproducibility versus performance (e.g. speed). The specific concepts of
+"verification" and "reproducibility" will vary between domains of
+scientific computation, but we expect that the criteria allow this wide
+range.
+Performance is indeed an important issue for _immediate_ reproducibility
+and we would have liked to discuss it. But due to the strict word-count, we
+feel that adding it to the discussion points, without having adequate space
+to elaborate, can confuse the readers away from the focus of this paper
+(which is focused on long term usability). It has therefore not been added.
------------------------------
+
+
+
+
38. [Reviewer 4] Tradeoff, which might affect Criterion 3 is time to result,
people use popular frameworks because it is easier to use them.
-ANSWER: Section IV includes some quantified examples of timing
-involved in the Maneage implementation of the criteria of our
-paper. It is true that the initial build time of a Maneage install
-may discourage some scientists; but a serious scientific research
-project is never started and completed on a time scale of a few
-hours.
+ANSWER: That is true. In section IV, we have given the time it takes to
+build Maneage (only once on each computer) to be around 1.5 hours on an
+8-core CPU (a typical machine that may be used for data analysis). We
+therefore conclude that when the analysis is complex (and thus taking many
+hours or days to complete), this time is negligible.
+But if the project's full analysis takes 10 minutes or less (like the
+extremely simple analysis done in this paper which takes a fraction of a
+second). Indeed, the 1.5 hour building time is significant. In those cases,
+as discussed in the main body, the project can be built once in a Docker
+image and easily moved to other computers.
+
+Generally, it is true that the initial configuration time (only once on
+each computer) of a Maneage install may discourage some scientists; but a
+serious scientific research project is never started and completed on a
+time scale of a few hours.
------------------------------
@@ -772,14 +791,16 @@ already written.
40. [Reviewer 4] Potentially an interesting sidebar to investigate how
LaTeX/TeX has ensured its longevity!
+ANSWER: That is indeed a very interesting subject to study (an obvious link
+is that LaTeX/TeX is very strongly based on plain text files). We have been
+in touch with Karl Berry (one of the core people behind TeX Live, who also
+plays a prominent role in GNU) and have whitnessed the TeX Live community's
+efforts to become more and more portable and longer-lived.
-ANSWER: We agree that this would be interesting; an obvious link is
-that LaTeX/TeX is very strongly based on plain text files, making user
-hacking easy, provided that the user is willing to experiment and
-search and read through the source files. However, as the reviewer states,
-this would be a sidebar, and we are constrained for space.
-
-
+However, as the reviewer states, this would be a sidebar, and we are
+constrained for space, so we couldn't find a place to highlight this. But
+it is indeed a subject worthy of a full paper (that can be very useful for
+many software projects0..
------------------------------
@@ -790,19 +811,17 @@ this would be a sidebar, and we are constrained for space.
41. [Reviewer 4] The title is not specific enough - it should refer to the
reproducibility of workflows/projects.
-ANSWER: A problem here is that "workflow" and "project" taken in
-isolation risk being vague for wider audiences. Also, we aim at
-covering a wider range of aspects of a project than just than the
-workflow alone; in the other direction, the word "project" could be
-seen as too broad, including the funding, principal investigator,
-and team coordination.
+ANSWER: A problem here is that "workflow" and "project" taken in isolation
+risk being vague for wider audiences. Also, we aim at covering a wider
+range of aspects of a project than just than the workflow alone; in the
+other direction, the word "project" could be seen as too broad, including
+the funding, principal investigator, and team coordination.
-A specific title that might be appropriate could be, for example,
-"Towards long-term and archivable reproducibility of scientific
-computational research projects". Using a term proposed by one of
-our reviewers, "Towards long-term and archivable end-to-end
-reproducibility of scientific computational research projects"
-might also be appropriate.
+A specific title that might be appropriate could be, for example, "Towards
+long-term and archivable reproducibility of scientific computational
+research projects". Using a term proposed by one of our reviewers, "Towards
+long-term and archivable end-to-end reproducibility of scientific
+computational research projects" might also be appropriate.
Nevertheless, we feel that in the context of an article published in CiSE,
our current short title is sufficient.
@@ -872,7 +891,6 @@ longevity; same as supported CPU architectures."
longevity of the workflows that can be produced using these tools?
What happens if you use a combination of all four categories of tools?
-
ANSWER: We have changed the section title to "Longevity of existing tools"
to clarify that we refer to longevity of the tools.
@@ -950,14 +968,12 @@ effort, without any major difficulties.
write a "paper", ease of depositing in a repository, and ease of
use by another researcher.
-
-ANSWER: This type of sociological survey will make sense once the
- number of projects run with Maneage is sufficiently high. The
- time taken to write a paper should be measurably automatically,
- from the git history. The other parameters suggested would
- require cooperation from the scientists in responding to
- the survey, or will have to be collected anecdotally in the
- short term.
+ANSWER: This type of sociological survey will make sense once the number of
+projects run with Maneage is sufficiently high. The time taken to write a
+paper should be measurable automatically: from the git history. The other
+parameters suggested would require cooperation from the scientists in
+responding to the survey, or will have to be collected anecdotally in the
+short term.
------------------------------
diff --git a/project b/project
index 93c55e7..cdace62 100755
--- a/project
+++ b/project
@@ -505,9 +505,9 @@ case $operation in
if [ -f paper.pdf ]; then
if type pdftotext > /dev/null 2>/dev/null; then
numwords=$(pdftotext paper.pdf && cat paper.txt | wc -w)
- numeff=$(echo $numwords | awk '{print $1-850}')
+ numeff=$(echo $numwords | awk '{print $1-850+500}')
echo; echo "Number of words in full PDF: $numwords"
- echo "No abstract, and figure captions: $numeff"
+ echo "No abstract, and captions (250 for each figure): $numeff"
rm paper.txt
fi
fi
diff --git a/reproduce/analysis/config/metadata.conf b/reproduce/analysis/config/metadata.conf
index 07a1145..cdf0e5a 100644
--- a/reproduce/analysis/config/metadata.conf
+++ b/reproduce/analysis/config/metadata.conf
@@ -14,7 +14,7 @@ metadata-title = Towards Long-term and Archivable Reproducibility
# DOIs and identifiers.
metadata-arxiv = 2006.03018
-metadata-doi-zenodo = https://doi.org/10.5281/zenodo.3911395
+metadata-doi-zenodo = https://doi.org/10.5281/zenodo.4291207
metadata-doi-journal =
metadata-doi = $(metadata-doi-zenodo)
metadata-git-repository = https://gitlab.com/makhlaghi/maneage-paper