aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--paper.tex332
-rw-r--r--peer-review/1-answer.txt1040
-rw-r--r--peer-review/1-review.txt788
-rwxr-xr-xproject12
-rw-r--r--reproduce/analysis/config/metadata.conf2
-rw-r--r--reproduce/analysis/make/initialize.mk6
-rw-r--r--reproduce/analysis/make/paper.mk2
-rw-r--r--tex/src/preamble-project.tex2
-rw-r--r--tex/src/references.tex219
9 files changed, 2309 insertions, 94 deletions
diff --git a/paper.tex b/paper.tex
index 08b431b..b28e2af 100644
--- a/paper.tex
+++ b/paper.tex
@@ -32,7 +32,7 @@
%% The paper headers
\markboth{Computing in Science and Engineering, Vol. X, No. X, MM YYYY}%
-{Akhlaghi \MakeLowercase{\textit{et al.}}: Towards Long-term and Archivable Reproducibility}
+{Akhlaghi \MakeLowercase{\textit{et al.}}: \projecttitle}
@@ -53,27 +53,34 @@
% in the abstract or keywords.
\begin{abstract}
%% CONTEXT
- Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be sustainable in the long term.
+ Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term.
%% AIM
- We therefore aim to introduce a set of criteria to address this problem.
+ A set of criteria are introduced to address this problem.
%% METHOD
- These criteria have been tested in several research publications and have the following features: completeness (no dependency beyond POSIX, no administrator privileges, no network connection, and storage primarily in plain text); modular design; minimal complexity; scalability; verifiable inputs and outputs; version control; linking analysis with narrative; and free software.
+ Completeness (no dependency beyond POSIX, no administrator privileges, no network connection, and storage primarily in plain text); modular design; minimal complexity; scalability; verifiable inputs and outputs; version control; linking analysis with narrative; and free software.
+ They have been tested in several research publications in various fields.
%% RESULTS
As a proof of concept, ``Maneage'' is introduced for storing projects in machine-actionable and human-readable
plain text, enabling cheap archiving, provenance extraction, and peer verification.
%% CONCLUSION
We show that longevity is a realistic requirement that does not sacrifice immediate or short-term reproducibility.
- We then discuss the caveats (with proposed solutions) and conclude with the benefits for the various stakeholders.
+ The caveats (with proposed solutions) are then discussed and we conclude with the benefits for the various stakeholders.
This paper is itself written with Maneage (project commit \projectversion).
- \vspace{3mm}
+ \vspace{2.5mm}
+ \emph{Appendix} ---
+ Two comprehensive appendices that review existing solutions available
+\ifdefined\noappendix
+in \href{https://arxiv.org/abs/\projectarxivid}{\texttt{arXiv:\projectarxivid}} or \href{https://doi.org/10.5281/zenodo.\projectzenodoid}{\texttt{zenodo.\projectzenodoid}}.
+\else
+at the end (Appendices \ref{appendix:existingtools} and \ref{appendix:existingsolutions}).
+\fi
+
+ \vspace{2.5mm}
\emph{Reproducible supplement} ---
All products in \href{https://doi.org/10.5281/zenodo.\projectzenodoid}{\texttt{zenodo.\projectzenodoid}},
Git history of source at \href{https://gitlab.com/makhlaghi/maneage-paper}{\texttt{gitlab.com/makhlaghi/maneage-paper}},
which is also archived on \href{https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://gitlab.com/makhlaghi/maneage-paper.git}{SoftwareHeritage}.
-\ifdefined\noappendix
- Appendices reviewing existing reproducible solutions available in \href{https://arxiv.org/abs/\projectarxivid}{\texttt{arXiv:\projectarxivid}} or \href{https://doi.org/10.5281/zenodo.\projectzenodoid}{\texttt{zenodo.\projectzenodoid}}.
-\fi
\end{abstract}
% Note that keywords are not normally used for peer-review papers.
@@ -108,10 +115,9 @@ Reproducible research has been discussed in the sciences for at least 30 years \
Many reproducible workflow solutions (hereafter, ``solutions'') have been proposed that mostly rely on the common technology of the day,
starting with Make and Matlab libraries in the 1990s, Java in the 2000s, and mostly shifting to Python during the last decade.
-However, these technologies develop fast, e.g., Python 2 code often cannot run with Python 3.
+However, these technologies develop fast, e.g., code written in Python 2 \new{(which is no longer officially maintained)} often cannot run with Python 3.
The cost of staying up to date within this rapidly-evolving landscape is high.
-Scientific projects, in particular, suffer the most: scientists have to focus on their own research domain, but to some degree
-they need to understand the technology of their tools because it determines their results and interpretations.
+Scientific projects, in particular, suffer the most: scientists have to focus on their own research domain, but to some degree they need to understand the technology of their tools because it determines their results and interpretations.
Decades later, scientists are still held accountable for their results and therefore the evolving technology landscape
creates generational gaps in the scientific community, preventing previous generations from sharing valuable experience.
@@ -119,46 +125,64 @@ creates generational gaps in the scientific community, preventing previous gener
-\section{Commonly used tools and their longevity}
-Longevity is as important in science as in some fields of industry, but this ideal is not always necessary; e.g., fast-evolving tools can be appropriate in short-term commercial projects.
-To highlight the necessity, a sample set of commonly-used tools is reviewed here in the following order:
-(1) environment isolators -- virtual machines (VMs) or containers;
-(2) package managers (PMs) -- Conda, Nix, or Spack;
-(3) job management -- shell scripts, Make, SCons, or CGAT-core;
-(4) notebooks -- Jupyter.
+\section{Longevity of existing tools}
+\label{sec:longevityofexisting}
+\new{Reproducibility is defined as ``obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis'' \cite{fineberg19}.
+ Longevity is defined as the time that a project can be usable.
+ Usability is defined by context: for machines (machine-actionable, or executable files) \emph{and} humans (readability of the source).
+ Because many usage contexts don't involve execution; for example checking the configuration parameter of a single step of the analysis to re-\emph{use} in another project, or checking the version of used software, or source of the input data (extracting these from the outputs of execution is not always possible).}
+
+Longevity is as important in science as in some fields of industry, but not all; e.g., fast-evolving tools can be appropriate in short-term commercial projects.
+To highlight the necessity, a short review of commonly-used tools is provided below:
+(1) environment isolators (virtual machines, VMs, or containers);
+(2) package managers (PMs, like Conda, Nix, or Spack);
+(3) job management (like shell scripts or Make);
+(4) notebooks (like Jupyter).
+\new{For a much more comprehensive review of existing tools and solutions is available in the appendices.}
To isolate the environment, VMs have sometimes been used, e.g., in \href{https://is.ieis.tue.nl/staff/pvgorp/share}{SHARE} (which was awarded second prize in the Elsevier Executable Paper Grand Challenge of 2011 but was discontinued in 2019).
However, containers (in particular, Docker, and to a lesser degree, Singularity) are currently the most widely-used solution.
We will thus focus on Docker here.
-Ideally, it is possible to precisely identify the Docker ``images'' that are imported with their checksums, but that is rarely practiced in most solutions that we have surveyed.
-Usually, images are imported with generic operating system (OS) names; e.g., \cite{mesnard20} uses `\inlinecode{FROM ubuntu:16.04}'.
-The extracted tarball (from \url{https://partner-images.canonical.com/core/xenial}) is updated almost monthly and only the most recent five are archived.
-Hence, if the Dockerfile is run in different months, its output image will contain different OS components.
+\new{It is theoretically possible to precisely identify the used Docker ``images'' with their checksums (or ``digest'') to re-create an identical OS image later.
+ However, that is rarely practiced.}
+Usually images are imported with generic operating system (OS) names; e.g., \cite{mesnard20} uses `\inlinecode{FROM ubuntu:16.04}' \new{(more examples in the appendices)}.
+The extracted tarball (from \url{https://partner-images.canonical.com/core/xenial}) is updated almost monthly, and only the most recent five are archived there.
+Hence, if the image is built in different months, its output image will contain different OS components.
In the year 2024, when long-term support for this version of Ubuntu expires, the image will be unavailable at the expected URL.
Generally, Pre-built binary files (like Docker images) are large and expensive to maintain and archive.
-%% This URL: https://www.docker.com/blog/scaling-dockers-business-to-serve-millions-more-developers-storage/}
-This prompted DockerHub (an online service to host Docker images, including many reproducible workflows) to delete images that have not been used for over 6 months.
-Furthermore, Docker requires root permissions, and only supports recent (``long-term-support'') versions of the host kernel, so older Docker images may not be executable.
+%% This URL: https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates}
+\new{Because of this DockerHub (where many reproducible workflows are archived) announced that inactive images (for over 6 months) will be deleted in free accounts from mid 2021.}
+Furthermore, Docker requires root permissions, and only supports recent (``long-term-support'') versions of the host kernel, so older Docker images may not be executable \new{(their longevity is determined by the host kernel, usually a decade)}.
Once the host OS is ready, PMs are used to install the software or environment.
Usually the OS's PM, such as `\inlinecode{apt}' or `\inlinecode{yum}', is used first and higher-level software are built with generic PMs.
-The former suffers from the same longevity problem as the OS, while some of the latter (such as Conda and Spack) are written in high-level languages like Python, so the PM itself depends on the host's Python installation.
-Nix and GNU Guix produce bit-wise identical programs, but they need root permissions and are primarily targeted at the Linux kernel.
-Generally, the exact version of each software's dependencies is not precisely identified in the PM build instructions (although this could be implemented).
-Therefore, unless precise version identifiers of \emph{every software package} are stored by project authors, a PM will use the most recent version.
+The former has \new{the same longevity} as the OS, while some of the latter (such as Conda and Spack) are written in high-level languages like Python, so the PM itself depends on the host's Python installation \new{with a usual longevity of a few years}.
+Nix and GNU Guix produce bit-wise identical programs \new{with considerably better longevity; same as supported CPU architectures}.
+However, they need root permissions and are primarily targeted at the Linux kernel.
+Generally, in all the package managers, the exact version of each software (and its dependencies) is not precisely identified by default, although an advanced user can indeed fix them.
+Unless precise version identifiers of \emph{every software package} are stored by project authors, a PM will use the most recent version.
Furthermore, because third-party PMs introduce their own language, framework, and version history (the PM itself may evolve) and are maintained by an external team, they increase a project's complexity.
With the software environment built, job management is the next component of a workflow.
-Visual workflow tools like Apache Taverna, GenePattern, Kepler or VisTrails (mostly introduced in the 2000s and using Java) encourage modularity and robust job management, but the more recent tools (mostly in Python) leave this to the authors of the project.
-Designing a modular project needs to be encouraged and facilitated because scientists (who are not usually trained in project or data management) will rarely apply best practices.
+Visual/GUI workflow tools like Apache Taverna, GenePattern (depreciated), Kepler or VisTrails (depreciated), which were mostly introduced in the 2000s and used Java or Python 2 encourage modularity and robust job management.
+\new{However, a GUI environment is tailored to specific applications and is hard to genralize, while being hard to reproduce once the required Java Virtual Machines (JVM) is depreciated.
+Their data formats are also complex (designed for computers to read) and hard to read by humans without the GUI.}
+The more recent tools (mostly non-GUI, written in Python) leave this to the authors of the project.
+Designing a robust project needs to be encouraged and facilitated because scientists (who are not usually trained in project or data management) will rarely apply best practices.
This includes automatic verification, which is possible in many solutions, but is rarely practiced.
-Weak project management leads to many inefficiencies in project cost and/or scientific accuracy (reusing, expanding, or validating will be expensive).
+Besides non-reproducibility, weak project management leads to many inefficiencies in project cost and/or scientific accuracy (reusing, expanding, or validating will be expensive).
-Finally, to add narrative, computational notebooks \cite{rule18}, such as Jupyter, are currently gaining popularity.
-However, because of their complex dependency trees, they are vulnerable to the passage of time; e.g., see Figure 1 of \cite{alliez19} for the dependencies of Matplotlib, one of the simpler Jupyter dependencies.
+Finally, to blend narrative into the workflow, computational notebooks \cite{rule18}, such as Jupyter, are currently gaining popularity.
+However, because of their complex dependency trees, their build is vulnerable to the passage of time; e.g., see Figure 1 of \cite{alliez19} for the dependencies of Matplotlib, one of the simpler Jupyter dependencies.
It is important to remember that the longevity of a project is determined by its shortest-lived dependency.
-Further, as with job management, computational notebooks do not actively encourage good practices in programming or project management, hence they can rarely deliver their promised potential \cite{rule18} and may even hamper reproducibility \cite{pimentel19}.
+Furthermore, as with job management, computational notebooks do not actively encourage good practices in programming or project management.
+\new{The ``cells'' in a Jupyter notebook can either be run sequentially (from top to bottom, one after the other) or by manually selecting which cell to run.
+The default cells don't include dependencies (so some cells run only after certain others are re-done), parallel execution, or usage of more than one language.
+There are third party add-ons like \inlinecode{sos} or \inlinecode{nbextensions} (both written in Python) for some of these.
+However, since they aren't part of the core and have their own dependencies, their longevity can be assumed to be shorter.
+Therefore, the core Jupyter framework leaves very few options for project management, especially as the project grows beyond a small test or tutorial.}
+In summary, notebooks can rarely deliver their promised potential \cite{rule18} and may even hamper reproducibility \cite{pimentel19}.
An exceptional solution we encountered was the Image Processing Online Journal (IPOL, \href{https://www.ipol.im}{ipol.im}).
Submitted papers must be accompanied by an ISO C implementation of their algorithm (which is buildable on any widely used OS) with example images/data that can also be executed on their webpage.
@@ -178,7 +202,7 @@ We argue and propose that workflows satisfying the following criteria can not on
\textbf{Criterion 1: Completeness.}
A project that is complete (self-contained) has the following properties.
-(1) No dependency beyond the Portable Operating System Interface: POSIX (a minimal Unix-like environment).
+(1) No dependency beyond the Portable Operating System Interface (POSIX, \new{a minimal Unix-like standard that is shared between many operating systems}).
POSIX has been developed by the Austin Group (which includes IEEE) since 1988 and many OSes have complied.
(2) Primarily stored as plain text, not needing specialized software to open, parse, or execute.
(3) No impact on the host OS libraries, programs or environment.
@@ -200,8 +224,8 @@ Explicit communication between various modules enables optimizations on many lev
\textbf{Criterion 3: Minimal complexity.}
Minimal complexity can be interpreted as:
(1) Avoiding the language or framework that is currently in vogue (for the workflow, not necessarily the high-level analysis).
-A popular framework typically falls out of fashion and requires significant resources to translate or rewrite every few years.
-More stable/basic tools can be used with less long-term maintenance.
+A popular framework typically falls out of fashion and requires significant resources to translate or rewrite every few years \new{(for example Python 2, which is now a dead language and no longer supported)}.
+More stable/basic tools can be used with less long-term maintenance costs.
(2) Avoiding too many different languages and frameworks; e.g., when the workflow's PM and analysis are orchestrated in the same framework, it becomes easier to adopt and encourages good practices.
\textbf{Criterion 4: Scalability.}
@@ -225,11 +249,13 @@ A narrative description is also a deliverable (defined as ``data article'' in \c
This is related to longevity, because if a workflow contains only the steps to do the analysis or generate the plots, in time it may get separated from its accompanying published paper.
\textbf{Criterion 8: Free and open source software:}
-Reproducibility (defined in \cite{fineberg19}) is not possible with a black box (non-free or non-open-source software); this criterion is therefore necessary because nature is already a black box.
-A project that is free software (as formally defined), allows others to learn from, modify, and build upon it.
+Reproducibility is not possible with a black box (non-free or non-open-source software); this criterion is therefore necessary because nature is already a black box, we don't need an artificial source of ambiguity wraped over it.
+A project that is \href{https://www.gnu.org/philosophy/free-sw.en.html}{free software} (as formally defined), allows others to learn from, modify, and build upon it.
When the software used by the project is itself also free, the lineage can be traced to the core algorithms, possibly enabling optimizations on that level and it can be modified for future hardware.
In contrast, non-free tools typically cannot be distributed or modified by others, making it reliant on a single supplier (even without payments).
+\new{It may happen that proprietary software is necessary to convert proprietary data formats produced by special hardware (for example micro-arrays in genetics) into free data formats.
+ In such cases, it is best to immediately convert the data upon collection, and archive the data in free formats (for example on Zenodo).}
@@ -242,7 +268,8 @@ In contrast, non-free tools typically cannot be distributed or modified by other
\section{Proof of concept: Maneage}
With the longevity problems of existing tools outlined above, a proof-of-concept tool is presented here via an implementation that has been tested in published papers \cite{akhlaghi19, infante20}.
-It was in fact awarded a Research Data Alliance (RDA) adoption grant for implementing the recommendations of the joint RDA and World Data System (WDS) working group on Publishing Data Workflows \cite{austin17}, from the researchers' perspective.
+\new{Since the initial submission of this paper, it has also been used in \href{https://doi.org/10.5281/zenodo.3951151}{zenodo.3951151} (on the COVID-19 pandemic) and \href{https://doi.org/10.5281/zenodo.4062460}{zenodo.4062460}.}
+It was also awarded a Research Data Alliance (RDA) adoption grant for implementing the recommendations of the joint RDA and World Data System (WDS) working group on Publishing Data Workflows \cite{austin17}, from the researchers' perspective.
The tool is called Maneage, for \emph{Man}aging data Lin\emph{eage} (the ending is pronounced as in ``lineage''), hosted at \url{https://maneage.org}.
It was developed as a parallel research project over five years of publishing reproducible workflows of our research.
@@ -256,8 +283,7 @@ Inspired by GWL+Guix, a single job management tool was implemented for both inst
Make is not an analysis language, it is a job manager, deciding when and how to call analysis programs (in any language like Python, R, Julia, Shell, or C).
Make is standardized in POSIX and is used in almost all core OS components.
It is thus mature, actively maintained, highly optimized, efficient in managing exact provenance, and even recommended by the pioneers of reproducible research \cite{claerbout1992,schwab2000}.
-Researchers using free software tools have also already had some exposure to it.
-%However, because they didn't attempt to build the software environment, in 2006 they moved to SCons (Make-simulator in Python which also attempts to manage software dependencies) in a project called Madagascar (\url{http://ahay.org}), which is highly tailored to Geophysics.
+Researchers using free software tools have also already had some exposure to it \new{(almost all free software projects are built with Make)}
Linking the analysis and narrative (criterion 7) was historically our first design element.
To avoid the problems with computational notebooks mentioned above, our implementation follows a more abstract linkage, providing a more direct and precise, yet modular, connection.
@@ -268,23 +294,34 @@ The macro `\inlinecode{\small\textbackslash{}demosfoptimizedsn}' is generated du
Since values like this depend on the analysis, they should \emph{also} be reproducible, along with figures and tables.
These macros act as a quantifiable link between the narrative and analysis, with the granularity of a word in a sentence and a particular analysis command.
-This allows accurate post-publication provenance \emph{and} automatic updates to the embedded numbers during a project.
-Through the latter, manual updates by authors are by-passed, which are prone to errors, thus discouraging improvements after writing the first draft.
+This allows automatic updates to the embedded numbers during the experimentation phase of a project \emph{and} accurate post-publication provenance.
+Through the former, manual updates by authors (which are prone to errors and discourage improvements or experimentation after writing the first draft) are by-passed.
Acting as a link, the macro files build the core skeleton of Maneage.
-For example, during the software building phase, each software package is identified by a \LaTeX{} file, containing its official name, version and possible citation..
+For example, during the software building phase, each software package is identified by a \LaTeX{} file, containing its official name, version and possible citation.
These are combined at the end to generate precise software acknowledgment and citation (see \cite{akhlaghi19, infante20}), which are excluded here because of the strict word limit.
-Furthermore, machine related specifications including hardware name and byte-order are also collected and cited, as a reference point if they were needed for \emph{root cause analysis} of observed differences/issues in the execution of the wokflow on different machines.
+\new{Furthermore, the machine related specifications of the running system (including hardware name and byte-order) are also collected and cited.
+ These can help in \emph{root cause analysis} of observed differences/issues in the execution of the wokflow on different machines.}
The macro files also act as Make \emph{targets} and \emph{prerequisites} to allow accurate dependency tracking and optimized execution (in parallel, no redundancies), for any level of complexity (e.g., Maneage builds Matplotlib if requested; see Figure~1 of \cite{alliez19}).
All software dependencies are built down to precise versions of every tool, including the shell, POSIX tools (e.g., GNU Coreutils) and of course, the high-level science software.
+\new{The source code of all the free software used in Maneage is archived in and downloaded from \href{https://doi.org/10.5281/zenodo.3883409}{zenodo.3883409}.
+Zenodo promises long-term archival and also provides a persistant identifier for the files, which is rarely available in each software's webpage.}
+
On GNU/Linux distributions, even the GNU Compiler Collection (GCC) and GNU Binutils are built from source and the GNU C library (glibc) is being added (task \href{http://savannah.nongnu.org/task/?15390}{15390}).
Currently, {\TeX}Live is also being added (task \href{http://savannah.nongnu.org/task/?15267}{15267}), but that is only for building the final PDF, not affecting the analysis or verification.
-Temporary relocation of a built project, without building from source, can be done by building the project in a container or VM (\inlinecode{README.md} has recommendations on building a \inlinecode{Dockerfile}).
+\new{Finally, all software cannot be built on all CPU architectures, hence by default it is included in the final built paper automatically, see below.}
+
+\new{Because, everything is built from source, building the core Maneage environment on an 8-core CPU takes about 1.5 hours (GCC consumes more than half of the time).
+When the analysis involves complex computations, this is negligible compared to the actual analysis.
+Also, due to the Git features blended into Maneage, it is best (from the perspective of provenance) to start a Maneage'd project with the start of a project and keep the history of changes, as the project matures.
+To avoid repeating the build on different systems, Maneage'd projects can be built in a container or VM.
+In fact the \inlinecode{README.md} \href{https://archive.softwareheritage.org/browse/origin/directory/?origin_url=http://git.maneage.org/project.git}{has instructions} on building a Maneage'd project in Docker.
+Through Docker (or VMs) users on Microsoft Windows can benefit from Maneage, and for Windows-native software that can be run in batch-mode, technologies like Windows Subsystem for Linux can be used.}
The analysis phase of the project however is naturally different from one project to another at a low-level.
It was thus necessary to design a generic framework to comfortably host any project, while still satisfying the criteria of modularity, scalability, and minimal complexity.
-We demonstrate this design by replicating Figure 1C of \cite{menke20} in Figure \ref{fig:datalineage} (top).
-Figure \ref{fig:datalineage} (bottom) is the data lineage graph that produced it (including this complete paper).
+We demonstrate this design by replicating Figure 1C of \cite{menke20} in Figure \ref{fig:datalineage} (left).
+Figure \ref{fig:datalineage} (right) is the data lineage graph that produced it (including this complete paper).
\begin{figure*}[t]
\begin{center}
@@ -295,11 +332,12 @@ Figure \ref{fig:datalineage} (bottom) is the data lineage graph that produced it
\caption{\label{fig:datalineage}
Left: an enhanced replica of Figure 1C in \cite{menke20}, shown here for demonstrating Maneage.
It shows the ratio of the number of papers mentioning software tools (green line, left vertical axis) to the total number of papers studied in that year (light red bars, right vertical axis on a log scale).
- Right: Schematic representation of the data lineage, or workflow, to generate the plot above.
- Each colored box is a file in the project and the arrows show the dependencies between them.
+ Right: Schematic representation of the data lineage, or workflow, to generate the plot on the left.
+ Each colored box is a file in the project and \new{arrows show the operation of various software, showing what inputs it takes and what outputs it produces}.
Green files/boxes are plain-text files that are under version control and in the project source directory.
Blue files/boxes are output files in the build directory, shown within the Makefile (\inlinecode{*.mk}) where they are defined as a \emph{target}.
- For example, \inlinecode{paper.pdf} depends on \inlinecode{project.tex} (in the build directory; generated automatically) and \inlinecode{paper.tex} (in the source directory; written manually).
+ For example, \inlinecode{paper.pdf} \new{is created by running \LaTeX{} on} \inlinecode{project.tex} (in the build directory; generated automatically) and \inlinecode{paper.tex} (in the source directory; written manually).
+ \new{Other software are used in other steps.}
The solid arrows and full-opacity built boxes correspond to this paper.
The dotted arrows and built boxes show the scalability by adding hypothetical steps to the project.
The underlying data of the top plot is available at
@@ -309,7 +347,7 @@ Figure \ref{fig:datalineage} (bottom) is the data lineage graph that produced it
The analysis is orchestrated through a single point of entry (\inlinecode{top-make.mk}, which is a Makefile; see Listing \ref{code:topmake}).
It is only responsible for \inlinecode{include}-ing the modular \emph{subMakefiles} of the analysis, in the desired order, without doing any analysis itself.
-This is visualized in Figure \ref{fig:datalineage} (bottom) where no built (blue) file is placed directly over \inlinecode{top-make.mk} (they are produced by the subMakefiles under them).
+This is visualized in Figure \ref{fig:datalineage} (right) where no built (blue) file is placed directly over \inlinecode{top-make.mk} (they are produced by the subMakefiles under them).
A visual inspection of this file is sufficient for a non-expert to understand the high-level steps of the project (irrespective of the low-level implementation details), provided that the subMakefile names are descriptive (thus encouraging good practice).
A human-friendly design that is also optimized for execution is a critical component for the FAIRness of reproducible research.
@@ -372,12 +410,22 @@ Furthermore, the configuration files are a prerequisite of the targets that use
If changed, Make will \emph{only} re-execute the dependent recipe and all its descendants, with no modification to the project's source or other built products.
This fast and cheap testing encourages experimentation (without necessarily knowing the implementation details; e.g., by co-authors or future readers), and ensures self-consistency.
-Finally, to satisfy the recorded history criterion, version control (currently implemented in Git) is another component of Maneage (see Figure \ref{fig:branching}).
+\new{To summarize, in contrast to notebooks like Jupyter, in a ``Maneage''d project the analysis scripts and configuration parameters are not blended into the running code (and all stored in one file).
+ Based on the modularity criteria, the analysis steps are run in their own files (for their own respective language, thus maximally benefiting from its unique features) and the narrative has its own file(s).
+ The analysis communicates with the narrative through intermediate files (the \LaTeX{} macros), enabling much better blending of analysis outputs in the narrative sentences than is possible with the high-level notebooks and enabling direct provenance tracking.}
+
+To satisfy the recorded history criterion, version control (currently implemented in Git) is another component of Maneage (see Figure \ref{fig:branching}).
Maneage is a Git branch that contains the shared components (infrastructure) of all projects (e.g., software tarball URLs, build recipes, common subMakefiles, and interface script).
-Derived projects start by branching off and customizing it (e.g., adding a title, data links, narrative, and subMakefiles for its particular analysis, see Listing \ref{code:branching}, there is customization checklist in \inlinecode{README-hacking.md}).
+\new{The core Maneage git repository is hosted at \href{http://git.maneage.org/project.git}{git.maneage.org/project.git} (also archived on \href{https://archive.softwareheritage.org/browse/origin/directory/?origin_url=http://git.maneage.org/project.git}{Software Heritage}).}
+Derived projects start by creating a branch and customizing it (e.g., adding a title, data links, narrative, and subMakefiles for its particular analysis, see Listing \ref{code:branching}).
+There is a \new{thoroughly elaborated} customization checklist in \inlinecode{README-hacking.md}).
+
+The current project's Git hash is provided to the authors as a \LaTeX{} macro (shown here at the end of the abstract), as well as the Git hash of the last commit in the Maneage branch (shown here in the acknowledgments).
+These macros are created in \inlinecode{initialize.mk}, with \new{other basic information from the running system like the CPU architecture, byte order or address sizes (shown here in the acknowledgements)}.
The branch-based design of Figure \ref{fig:branching} allows projects to re-import Maneage at a later time (technically: \emph{merge}), thus improving its low-level infrastructure: in (a) authors do the merge during an ongoing project;
in (b) readers do it after publication; e.g., the project remains reproducible but the infrastructure is outdated, or a bug is fixed in Maneage.
+\new{Generally, any git flow (branching strategies) can be used by the high-level project authors or future readers.}
Low-level improvements in Maneage can thus propagate to all projects, greatly reducing the cost of curation and maintenance of each individual project, before \emph{and} after publication.
Finally, the complete project source is usually $\sim100$ kilo-bytes.
@@ -431,7 +479,8 @@ Scientists are rarely trained sufficiently in data management or software develo
Indeed the fast-evolving tools are primarily targeted at software developers, who are paid to learn and use them effectively for short-term projects before moving on to the next technology.
Scientists, on the other hand, need to focus on their own research fields, and need to consider longevity.
-Hence, arguably the most important feature of these criteria (as implemented in Maneage) is that they provide a fully working template, using mature and time-tested tools, for blending version control, the research paper's narrative, the software management \emph{and} a robust data carpentry.
+Hence, arguably the most important feature of these criteria (as implemented in Maneage) is that they provide a fully working template or bundle that works immediately out of the box by producing a paper with an example calculation that they just need to start customizing.
+Using mature and time-tested tools, for blending version control, the research paper's narrative, the software management \emph{and} a robust data management strategies.
We have noticed that providing a complete \emph{and} customizable template with a clear checklist of the initial steps is much more effective in encouraging mastery of these modern scientific tools than having abstract, isolated tutorials on each tool individually.
Secondly, to satisfy the completeness criterion, all the required software of the project must be built on various POSIX-compatible systems (Maneage is actively tested on different GNU/Linux distributions, macOS, and is being ported to FreeBSD also).
@@ -446,6 +495,7 @@ On GNU/Linux hosts, Maneage builds precise versions of the compilation tool chai
However, glibc is not install-able on some POSIX OSs (e.g., macOS).
All programs link with the C library, and this may hypothetically hinder the exact reproducibility \emph{of results} on non-GNU/Linux systems, but we have not encountered this in our research so far.
With everything else under precise control, the effect of differing Kernel and C libraries on high-level science can now be systematically studied with Maneage in follow-up research.
+\new{Using continuous integration (CI) is one way to precisely identify breaking points with updated technologies on available systems.}
% DVG: It is a pity that the following paragraph cannot be included, as it is really important but perhaps goes beyond the intended goal.
%Thirdly, publishing a project's reproducible data lineage immediately after publication enables others to continue with follow-up papers, which may provide unwanted competition against the original authors.
@@ -481,6 +531,7 @@ For example, see \href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937},
The authors wish to thank (sorted alphabetically)
Julia Aguilar-Cabello,
+Dylan A\"issi,
Marjan Akbari,
Alice Allen,
Pedram Ashofteh Ardakani,
@@ -506,6 +557,12 @@ Nadia Tonello,
Ignacio Trujillo and
the AMIGA team at the Instituto de Astrof\'isica de Andaluc\'ia
for their useful help, suggestions, and feedback on Maneage and this paper.
+\new{The five referees and editors of CiSE (Lorena Barba and George Thiruvathukal) also provided many very helpful points to clarify the points made in this paper.}
+
+This project was developed in the reproducible framework of Maneage (\emph{Man}aging data lin\emph{eage})
+\new{on Commit \inlinecode{\projectversion} (in the project branch).
+The latest merged Maneage commit was \inlinecode{\maneageversion} (\maneagedate).
+This project was built on an \inlinecode{\machinearchitecture} machine with {\machinebyteorder} byte-order and address sizes {\machineaddresssizes}}.
Work on Maneage, and this paper, has been partially funded/supported by the following institutions:
The Japanese Ministry of Education, Culture, Sports, Science, and Technology (MEXT) PhD scholarship to
@@ -597,7 +654,7 @@ The Pozna\'n Supercomputing and Networking Center (PSNC) computational grant 314
%% the appendix is built.
\ifdefined\noappendix
\else
-\newpage
+\clearpage
\appendices
\section{Survey of existing tools for various phases}
\label{appendix:existingtools}
@@ -626,6 +683,10 @@ Therefore, a process that is run inside a virtual machine can be much slower tha
An advantages of VMs is that they are a single file which can be copied from one computer to another, keeping the full environment within them if the format is recognized.
VMs are used by cloud service providers, enabling fully independent operating systems on their large servers (where the customer can have root access).
+VMs were used in solutions like SHARE \citeappendix{vangorp11} (which was awarded second prize in the Elsevier Executable Paper Grand Challenge of 2011 \citeappendix{gabriel11}), or in suggested reproducible papers like \citeappendix{dolfi14}.
+However, due to their very large size, they are expensive to maintain, thus leading SHARE to discontinue its services in 2019.
+Also, the URL to the VM that is mentioned in \citeappendix{dolfi14} is no longer accessible (probably due to the same reason of size and archival costs).
+
\subsubsection{Containers}
\label{appendix:containers}
Containers also host a binary copy of a running environment, but don't have their own kernel.
@@ -646,19 +707,27 @@ Below we'll review some of the most common container solutions: Docker and Singu
An important drawback of Docker for high performance scientific needs is that it runs as a daemon (a program that is always running in the background) with root permissions.
This is a major security flaw that discourages many high performance computing (HPC) facilities from providing it.
-\item {\bf\small Singularity:} Singularity is a single-image container (unlike Docker which is composed of modular/independent images).
+\item {\bf\small Singularity:} Singularity \citeappendix{kurtzer17} is a single-image container (unlike Docker which is composed of modular/independent images).
Although it needs root permissions to be installed on the system (once), it doesn't require root permissions every time it is run.
Its main program is also not a daemon, but a normal program that can be stopped.
These features make it much easier for HPC administrators to install compared to Docker.
However, the fact that it requires root access for initial install is still a hindrance for a random project: if its not already present on the HPC, the project can't be run as a normal user.
+
+\item {\bf\small Podman:} Podman uses the Linux kernel containerization features to enable containers without a daemon, and without root permissions.
+ It has a command-line interface very similar to Docker, but only works on GNU/Linux operating systems.
\end{itemize}
-Generally, VMs or containers are good solutions to reproducibly run/repeating an analysis.
+Generally, VMs or containers are good solutions to reproducibly run/repeating an analysis in the short term (a couple of years).
However, their focus is to store the already-built (binary, non-human readable) software environment.
-Storing \emph{how} the core environment was built is up to the user, in a third repository (not necessarily inside container or VM file).
-This is a major problem when considering reproducibility.
-The example of \cite{mesnard20} was previously mentioned in Section \ref{criteria}.
+Because of this they will be large (many Gigabytes) and expensive to archive, download or access.
+Recall the two examples above for VMs in Section \ref{appendix:virtualmachines}. But this is also valid for Docker images, as is clear from Dockerhub's recent decision to delete images of free accounts that haven't been used for more than 6 months.
+Meng \& Thain \citeappendix{meng17} also give similar reasons on why Docker images were not suitable in their trials.
+
+On a more fundamental level, VMs or contains don't store \emph{how} the core environment was built.
+This information is usually in a third-party repository, and not necessarily inside container or VM file, making it hard (if not impossible) to track for future users.
+This is a major problem when considering reproducibility which is also highlighted as a major issue in terms of long term reproducibility in \citeappendix{oliveira18}.
+The example of \cite{mesnard20} was previously mentioned in Section \ref{criteria}.
Another useful example is the \href{https://github.com/benmarwick/1989-excavation-report-Madjedbebe/blob/master/Dockerfile}{\inlinecode{Dockerfile}} of \citeappendix{clarkso15} (published in June 2015) which starts with \inlinecode{FROM rocker/verse:3.3.2}.
When we tried to build it (November 2020), the core downloaded image (\inlinecode{rocker/verse:3.3.2}, with image ``digest'' \inlinecode{sha256:c136fb0dbab...}) was created in October 2018 (long after the publication of that paper).
Theoretically it is possible to investigate the difference between this new image and the old one that the authors used, but that will require a lot of effort and may not be possible where the changes are not in a third public repository or not under version control.
@@ -669,6 +738,14 @@ A more generic/longterm approach to ensure identical core OS componets at a late
ISO files are pre-built binary files with volumes of hundreds of megabytes and not containing their build instructions).
For example the archives of Debian\footnote{\inlinecode{\url{https://cdimage.debian.org/mirror/cdimage/archive/}}} or Ubuntu\footnote{\inlinecode{\url{http://old-releases.ubuntu.com/releases}}} provide older ISO files.
+The concept of containers (and the independent images that build them) can also be extended beyond just the software environment.
+For example \citeappendix{lofstead19} propose a ``data pallet'' concept to containerize access to data and thus allow tracing data back wards to the application that produced them.
+
+In summary, containers or VMs are just a built product themselves.
+If they are built properly (for example building a Maneage'd project inside a Docker container), they can be useful for immediate usage and fast moving of the project from one system to another.
+With robust building, the container or VM can also be exactly reproduced later.
+However, attempting to archive the actual binary container or VM files as a black box (not knowing the precise versions of the software in them) is expensive, and will not be able to answer the most fundamental
+
\subsubsection{Independent build in host's file system}
\label{appendix:independentbuild}
The virtual machine and container solutions mentioned above, have their own independent file system.
@@ -722,6 +799,10 @@ Hence it is indeed theoretically possible to reproduce the software environment
In summary, the host OS package managers are primarily meant for the operating system components or very low-level components.
Hence, many robust reproducible analysis solutions (reviewed in Appendix \ref{appendix:existingsolutions}) don't use the host's package manager, but an independent package manager, like the ones below discussed below.
+\subsubsection{Packaging with Linux containerization}
+Once a software is packaged as an AppImage\footnote{\inlinecode{\url{https://appimage.org}}}, Flatpak\footnote{\inlinecode{\url{https://flatpak.org}}} or Snap\footnote{\inlinecode{\url{https://snapcraft.io}}} the software's binary product and all its dependencies (not including the core C library) are packaged into one file.
+This makes it very easy to move that single software's built product to newer systems: because the C library is not included, it can fail on older systems.
+However, these are designed for the Linux kernel (using its containerization features) and can thus only be run on GNU/Linux operating systems.
\subsubsection{Nix or GNU Guix}
\label{appendix:nixguix}
@@ -916,6 +997,17 @@ When all the prerequisites are older than the target, that target doesn't need t
The recipe can contain any number of commands, they should just all start with a \inlinecode{TAB}.
Going deeper into the syntax of Make is beyond the scope of this paper, but we recommend interested readers to consult the GNU Make manual for a nice introduction\footnote{\inlinecode{\url{http://www.gnu.org/software/make/manual/make.pdf}}}.
+\subsubsection{Snakemake}
+is a Python-based workflow management system, inspired by GNU Make (which is the job organizer in Maneage), that is aimed at reproducible and scalable data analysis \citeappendix{koster12}\footnote{\inlinecode{\url{https://snakemake.readthedocs.io/en/stable}}}.
+It defines its own language to implement the ``rule'' concept in Make within Python.
+Currently it requires Python 3.5 (released in September 2015) and above, while Snakemake was originally introduced in 2012.
+Hence it is not clear if older Snakemake source files can be executed today.
+This as reviewed in many tools here, this is a major longevity problem when using highlevel tools as the skeleton of the workflow.
+Technically, calling commond-line programs within Python is very slow and using complex shell scripts in each step will involve a lot quotations that make the code hard to read.
+
+\subsubsection{Bazel}
+Bazel\footnote{\inlinecode{\url{https://bazel.build}}} is a high-level job organizer that depends on Java and Python and is primarily tailored to software developers (with features like facilitating linking of libraries through its high level constructs).
+
\subsubsection{SCons}
\label{appendix:scons}
Scons is a Python package for managing operations outside of Python (in contrast to CGAT-core, discussed below, which only organizes Python functions).
@@ -960,8 +1052,12 @@ Furthermore, high-level and specific solutions will evolve very fast causing dis
A good example is Popper \citeappendix{jimenez17} which initially organized its workflow through the HashiCorp configuration language (HCL) because it was the default in GitHub.
However, in September 2019, GitHub dropped HCL as its default configuration language, so Popper is now using its own custom YAML-based workflow language, see Appendix \ref{appendix:popper} for more on Popper.
+\subsubsection{Nextflow (2013)}
+Nextflow\footnote{\inlinecode{\url{https://www.nextflow.io}}} \citeappendix{tommaso17} workflow language with a command-line interface that is written in Java.
-
+\subsubsection{Generic workflow specifications (CWL and WDL)}
+Due to the variety of custom workflows used in existing reproducibility solution (like those of Appendix \ref{appendix:existingsolutions}), some attempts have been made to define common workflow standards like the Common workflow language (CWL\footnote{\inlinecode{\url{https://www.commonwl.org}}}, with roots in Make, formatted in YAML or JSON) and Workflow Description Language (WDL\footnote{\inlinecode{\url{https://openwdl.org}}}, formatted in JSON).
+These are primarily specifications/standards rather than software, so ideally translators can be written between the various workflow systems to make them more interoperable.
\subsection{Editing steps and viewing results}
@@ -990,6 +1086,7 @@ Furthermore, they usually require a graphic user interface to run.
In summary IDEs are generally very specialized tools, for special projects and are not a good solution when portability (the ability to run on different systems and at different times) is required.
\subsubsection{Jupyter}
+\label{appendix:jupyter}
Jupyter (initially IPython) \citeappendix{kluyver16} is an implementation of Literate Programming \citeappendix{knuth84}.
The main user interface is a web-based ``notebook'' that contains blobs of executable code and narrative.
Jupyter uses the custom built \inlinecode{.ipynb} format\footnote{\url{https://nbformat.readthedocs.io/en/latest}}.
@@ -1119,6 +1216,7 @@ This failure to communicate in the details is a very serious problem, leading to
\label{appendix:existingsolutions}
As reviewed in the introduction, the problem of reproducibility has received a lot of attention over the last three decades and various solutions have already been proposed.
+The core principles that many of the existing solutions (including Maneage) aims to achieve are nicely summarized by the FAIR principles \citeappendix{wilkinson16}.
In this appendix, some of the solutions are reviewed.
The solutions are based on an evolving software landscape, therefore they are ordered by date: when the project has a webpage, the year of its first release is used for the sorting, otherwise their paper's publication year is used.
@@ -1127,9 +1225,27 @@ Freedom of the software/method is a core concept behind scientific reproducibili
Therefore proprietary solutions like Code Ocean\footnote{\inlinecode{\url{https://codeocean.com}}} or Nextjournal\footnote{\inlinecode{\url{https://nextjournal.com}}} will not be reviewed here.
Other studies have also attempted to review existing reproducible solutions, foro example \citeappendix{konkol20}.
+\subsection{Suggested rules, checklists, or criteria}
+Before going into the various implementations, it is also useful to review existing suggested rules, checklists or criteria for computationally reproducible research.
+
+All the cases below are primarily targetted to immediate reproducibility and don't consider longevity explicitly.
+Therefore, they lack a strong/clear completeness criteria (mainly merely suggesting to record versions, or the ultimate suggestion to store the full binary OS in a binary VM or container is problematic (as mentioned in \ref{appendix:independentenvironment} and \citeappendix{oliveira18}).
+Sandve et al. \citeappendix{sandve13} propose ``ten simple rule for reproducible computational research'' that can be applied in any project.
+Generally, the are very similar to the criteria proposed here and follow a similar spirit but they don't provide any actual research papers following all those points, or a proof of concept.
+The Popper convention \citeappendix{jimenez17} also provides a set of principles that are indeed generally useful and some are shared with the criteria here (for example automatic validation, and like Maneage, they suggest having a template for new users).
+but they don't include completness or attention to longevity as mentioned above (Popper itself is written in Python with many dependencies, and its core operating language has already changed once).
+For more on Popper, please see Section \ref{appendix:popper}.
+For improved reproducibility in Jupyter notebook users, \citeappendix{rule19} propose ten rules to improve reproducibility and also provide links to example implementations.
+They can be very useful for users of Jupyter and not generic to any computational project.
+Some criteria (which are indeed very good in a more general context) don't directly relate to reproducibility, for example their Rule 1: ``Tell a Story for an Audience''.
+Generally, as reviewed in Sections \ref{sec:longevityofexisting} and \ref{appendix:jupyter}, Jupyter itself has many issues regarding reproducibility.
+To create Docker images, N\"ust et al. propose ``ten simple rules'' in \citeappendix{nust20}.
+They do recommend some issues that can indeed help increase the quality of Docker images and their production/usage, for example their rule 7 to ``mount datasets at run time'' to separate the computational evironment from the data.
+However, like before, the long term reproducibility of the images is not a concern, for example in they recommend using base operating systems only with a version like \inlinecode{ubuntu:18.04}, which was clearly shown to have longevity issues in Section \ref{sec:longevityofexisting}.
+Furthermore, in their proof of concept Dockerfile (listing 1), \inlinecode{rocker} is used with a tag (not a digest), which can be problematic (as shown in Section \ref{appendix:containers}).
\subsection{Reproducible Electronic Documents, RED (1992)}
\label{appendix:red}
@@ -1242,8 +1358,7 @@ Since XML is a plane text format, as the user inspects the data and makes change
.
However, even though XML is in plain text, it is very hard to edit manually.
VisTrails therefore provides a graphic user interface with a visual representation of the project's inter-dependent steps (similar to Figure \ref{fig:analysisworkflow}).
-Besides the fact that it is no longer maintained, the conceptual differences with the proposed template are substantial.
-The most important is that VisTrails doesn't control the software that is run, it only controls the sequence of steps that they are run in.
+Besides the fact that it is no longer maintained, VisTrails didn't control the software that is run, it only controls the sequence of steps that they are run in.
@@ -1383,8 +1498,6 @@ In Maneage, instead of artificial/commented tags directly link the analysis inpu
-
-
\subsection{Sumatra (2012)}
Sumatra\footnote{\inlinecode{\url{http://neuralensemble.org/sumatra}}} \citeappendix{davison12} attempts to capture the environment information of a running project.
It is written in Python and is a command-line wrapper over the analysis script.
@@ -1420,14 +1533,39 @@ The important thing is that the research object concept is not specific to any s
Sciunit\footnote{\inlinecode{\url{https://sciunit.run}}} \citeappendix{meng15} defines ``sciunit''s that keep the executed commands for an analysis and all the necessary programs and libraries that are used in those commands.
It automatically parses all the executables in the script, and copies them, and their dependency libraries (down to the C library), into the sciunit.
Because the sciunit contains all the programs and necessary libraries, its possible to run it readily on other systems that have a similar CPU architecture.
+Sciunit was originally written in Python 2 (which reached its end-of-life in January 1st, 2020).
+Therefore Sciunit2 is a new implementation in Python 3.
-In our tests, Sciunit installed successfully, however we couldn't run it because of a dependency problem with the \inlinecode{tempfile} package (in the standard Python library).
-Sciunit is written in Python 2 (which reached its end-of-life in January 1st, 2020) and its last Git commit in its main branch is from June 2018 (+1.5 years ago).
-Recent activity in a \inlinecode{python3} branch shows that others are attempting to translate the code into Python 3 (the main author has graduated and apparently not working on Sciunit anymore).
-
-Because we weren't able to run it, the following discussion will just be theoretical.
The main issue with Sciunit's approach is that the copied binaries are just black boxes: it is not possible to see how the used binaries from the initial system were built.
-This is a major problem for scientific projects, in principle (not knowing how they programs were built) and practice (archiving a large volume sciunit for every step of an analysis requires a lot of space).
+This is a major problem for scientific projects: in principle (not knowing how they programs were built) and in practice (archiving a large volume sciunit for every step of an analysis requires a lot of storage space).
+
+
+
+
+
+\subsection{Umbrella (2015)}
+Umbrella \citeappendix{meng15b} is a high-level wrapper script for isolating the environment of an analysis.
+The user specifies the necessary operating system, and necessary packages and analysis steps in variuos JSON files.
+Umbrella will then study the host operating system and the various necessary inputs (including data and software) through a process similar to Sciunits mentioned above to find the best environment isolator (maybe using Linux containerization, containers or VMs).
+We couldn't find a URL to the source software of Umbrella (no source code repository has been mentioned in the papers we reviewed above), but from the descriptions in \citeappendix{meng17}, it is written in Python 2.6 (which is now depreciated).
+
+
+
+
+
+\subsection{ReproZip (2016)}
+ReproZip\footnote{\inlinecode{\url{https://www.reprozip.org}}} \citeappendix{chirigati16} is a Python package that is designed to automatically track all the necessary data files, libraries and environment variables into a single bundle.
+The tracking is done at the kernel system-call level, so any file that is accessed during the running of the project is identified.
+The tracked files can be packaged into a \inlinecode{.rpz} bundle that can then be unpacked into another system.
+
+ReproZip is therefore very good to take a ``snapshot'' of the running environment into a single file.
+The bundle can become very large if large, or many datasets, are used or if the software evironment is complex (many dependencies).
+Since it copies the binary software libraries, it can only be run on systems with a similar CPU architecture to the original.
+Furthermore, ReproZip just copies the binary/compiled files used in a project, it has no way to know how those software were built.
+As mentioned in this paper, and also \citeappendix{oliveira18} the question of ``how'' the environment was built is critical for understanding the results and simply having the binaries cannot necessarily be useful.
+
+For the data, it is similarly not possible to extract which data server they came from.
+Hence two projects that each use a 1 terra byte dataset will need a full copy of that same 1 terra byte file in their bundle, making long term preservation extremely expensive.
@@ -1459,7 +1597,6 @@ However, there is one directory which can be used to store files that must not b
-
\subsection{Popper (2017)}
\label{appendix:popper}
Popper\footnote{\inlinecode{\url{https://falsifiable.us}}} is a software implementation of the Popper Convention \citeappendix{jimenez17}.
@@ -1471,23 +1608,52 @@ This is an important issue when low-level choices are based on service providers
To start a project, the \inlinecode{popper} command-line program builds a template, or ``scaffold'', which is a minimal set of files that can be run.
However, as of this writing, the scaffold isn't complete: it lacks a manuscript and validation of outputs (as mentioned in the convention).
-By default Popper runs in a Docker image (so root permissions are necessary), but Singularity is also supported.
+By default Popper runs in a Docker image (so root permissions are necessary and reproducible issues with Docker images have been discussed above), but Singularity is also supported.
See Appendix \ref{appendix:independentenvironment} for more on containers, and Appendix \ref{appendix:highlevelinworkflow} for using high-level languages in the workflow.
+Igonoring the failure to comply with the completeness, minimal complexity and includig narrative, the scaffold that is provided by Popper is an output of the program that is not directly under version control.
+Hence tracking future changes in Popper and how they relate to the high level projects that depend on it will be very hard.
+In Maneage, the same \inlinecode{maneage} git branch is shared by the developers and users, any new feature or change in Maneage can thus be directly tracked with Git when the high-level project merges their branch with Maneage.
-
-
-\subsection{Whole Tale (2019)}
+\subsection{Whole Tale (2017)}
\label{appendix:wholetale}
-Whole Tale (\url{https://wholetale.org}) is a web-based platform for managing a project and organizing data provenance, see \citeappendix{brinckman19}
+Whole Tale\footnote{\inlinecode{\url{https://wholetale.org}}} is a web-based platform for managing a project and organizing data provenance, see \citeappendix{brinckman17}
It uses online editors like Jupyter or RStudio (see Appendix \ref{appendix:editors}) that are encapsulated in a Docker container (see Appendix \ref{appendix:independentenvironment}).
The web-based nature of Whole Tale's approach, and its dependency on many tools (which have many dependencies themselves) is a major limitation for future reproducibility.
For example, when following their own tutorial on ``Creating a new tale'', the provided Jupyter notebook could not be executed because of a dependency problem.
-This has been reported to the authors as issue 113\footnote{\url{https://github.com/whole-tale/wt-design-docs/issues/113}}, but as all the second-order dependencies evolve, its not hard to envisage such dependency incompatibilities being the primary issue for older projects on Whole Tale.
-Furthermore, the fact that a Tale is stored as a binary Docker container causes two important problems: 1) it requires a very large storage capacity for every project that is hosted there, making it very expensive to scale if demand expands. 2) It is not possible to see how the environment was built accurately (when the Dockerfile uses \inlinecode{apt}), for more on this, please see Appendix \ref{appendix:packagemanagement}.
+This was reported to the authors as issue 113\footnote{\inlinecode{\url{https://github.com/whole-tale/wt-design-docs/issues/113}}}, but as all the second-order dependencies evolve, its not hard to envisage such dependency incompatibilities being the primary issue for older projects on Whole Tale.
+Furthermore, the fact that a Tale is stored as a binary Docker container causes two important problems:
+1) it requires a very large storage capacity for every project that is hosted there, making it very expensive to scale if demand expands.
+2) It is not possible to see how the environment was built accurately (when the Dockerfile uses \inlinecode{apt}).
+This issue with Whole Tale (and generally all other solutions that only rely on preserving a container/VM) was also mentioned in \citeappendix{oliveira18}, for more on this, please see Appendix \ref{appendix:packagemanagement}.
+
+
+
+
+
+\subsection{Occam (2018)}
+Occam\footnote{\inlinecode{\url{https://occam.cs.pitt.edu}}} \citeappendix{oliveira18} is web-based application to preserve software and its execution.
+To achieve long-term reproducibility, Occam includes its own package manager (instructions to build software and their dependencies) to be in full control of the software build instructions, similar to Maneage.
+Besides Nix or Guix (which are primarily a package manager that can also do job management), Occum has been the only solution in our survey here that attempts to be complete in this aspect.
+
+However it is incomplete from the perspective of requirements: it works within a Docker image (that requires root permissions) and currently only runs on Debian-based, Redhat-based and Arch-based GNU/Linux operating systems that respectively use the \inlinecode{apt}, \inlinecode{pacman} or \inlinecode{yum} package managers.
+It is also itself written in Python (version 3.4 or above), hence it is not clear
+
+Furthermore, it also violates our complexity criteria because the instructions to build the software, their versions and etc are not immediately viewable or modifable by the user.
+Occam contains its own JSON database for this that should be parsed with its own custom program.
+The analysis phase of Occum is also through a drag-and-drop interface (similar to Taverna, Appendix \ref{appendix:taverna}) that is a web-based graphic user interface.
+All the connections between various phases of an analysis need to be pre-defined in a JSON file and manually linked in the GUI.
+Hence for complex data analysis operations with involve thousands of steps, it is not scalable.
+
+
+
+
+
+
+
diff --git a/peer-review/1-answer.txt b/peer-review/1-answer.txt
new file mode 100644
index 0000000..76244bc
--- /dev/null
+++ b/peer-review/1-answer.txt
@@ -0,0 +1,1040 @@
+1. [EiC] Some reviewers request additions, and overview of other
+ tools.
+
+ANSWER: Indeed, there is already a large body work in various issues that
+have been touched upon in this paper. Before submitting the paper, we had
+already done a very comprehensive review of the tools (as you may notice
+from the Git repository[1]). However, the CiSE Author Information
+explicitly states: "The introduction should provide a modicum of background
+in one or two paragraphs, but should not attempt to give a literature
+review". This is also practiced in previously published papers at CiSE and
+is in line with the very limited word-count and maximum of 12 references to
+be used in bibliography.
+
+We were also eager to get that extensive review out (which took a lot of
+time, and most of the tools were actually run andtested). Hence we
+discussed this privately with the editors and this solution was agreed
+upon: we include that extended review as appendices on the arXiv[2] and
+Zenodo[3] pre-prints of this paper and mention those publicly available
+appendices in the submitted paper for an interested reader to followup.
+
+[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/tex/src/paper-long.tex#L1579
+[2] https://arxiv.org/abs/2006.03018
+[3] https://doi.org/10.5281/zenodo.3872247
+
+------------------------------
+
+
+
+
+
+2. [Associate Editor] There are general concerns about the paper
+ lacking focus
+
+ANSWER:
+
+------------------------------
+
+
+
+
+
+3. [Associate Editor] Some terminology is not well-defined
+ (e.g. longevity).
+
+ANSWER: It has now been clearly defined in the first paragraph of Section
+II. With this definition, the main argument of the paper much more clear,
+thank you (and the referees for highlighting this).
+
+------------------------------
+
+
+
+
+
+4. [Associate Editor] The discussion of tools could benefit from some
+ categorization to characterize their longevity.
+
+ANSWER: The longevity of the general tools reviewed in Section II are now
+mentioned immediately after each (highlighted in green).
+
+------------------------------
+
+
+
+
+
+5. [Associate Editor] Background and related efforts need significant
+ improvement. (See below.)
+
+ANSWER: This has been done, as mentioned in (1).
+
+------------------------------
+
+
+
+
+
+6. [Associate Editor] There is consistency among the reviews that
+ related work is particularly lacking.
+
+ANSWER: This has been done, as mentioned in (1).
+
+------------------------------
+
+
+
+
+
+7. [Associate Editor] The current work needs to do a better job of
+ explaining how it deals with the nagging problem of running on CPU
+ vs. different architectures.
+
+ANSWER: The CPU architecture of the running system is now reported in the
+"Acknowledgments" section and a description of the problem and its solution
+in Maneage is also added in the "Proof of concept: Maneage" Section.
+
+------------------------------
+
+
+
+
+
+8. [Associate Editor] At least one review commented on the need to
+ include a discussion of continuous integration (CI) and its
+ potential to help identify problems running on different
+ architectures. Is CI employed in any way in the work presented in
+ this article?
+
+ANSWER: CI has been added in the discussion as one solution to find
+breaking points in operating system updates and new/different
+architectures. For the core Maneage branch, we have defined task #15741 [1]
+to add CI on many architectures in the near future.
+
+[1] http://savannah.nongnu.org/task/?15741
+
+------------------------------
+
+
+
+
+
+9. [Associate Editor] The presentation of the Maneage tool is both
+ lacking in clarity and consistency with the public
+ information/documentation about the tool. While our review focus
+ is on the article, it is important that readers not be confused
+ when they visit your site to use your tools.
+
+###########################
+ANSWER [NOT COMPLETE]: We should separate the various sections of the
+README-hacking.md webpage into smaller pages that can be entered.
+###########################
+
+------------------------------
+
+
+
+
+
+10. [Associate Editor] A significant question raised by one review is
+ how this work compares to "executable" papers and Jupyter
+ notebooks. Does this work embody similar/same design principles
+ or expand upon the established alternatives? In any event, a
+ discussion of this should be included in background/motivation and
+ related work to help readers understand the clear need for a new
+ approach, if this is being presented as new/novel.
+
+ANSWER: Thank you for highlighting this important point. We saw that its
+necessary to contrast our proof of concept demonstration more directly with
+Maneage. Two paragraphs have been added in Sections II and IV for this.
+
+------------------------------
+
+
+
+
+
+11. [Reviewer 1] Adding an explicit list of contributions would make
+ it easier to the reader to appreciate these. These are not
+ mentioned/cited and are highly relevant to this paper (in no
+ particular order):
+ 1. Git flows, both in general and in particular for research.
+ 2. Provenance work, in general and with git in particular
+ 3. Reprozip: https://www.reprozip.org/
+ 4. OCCAM: https://occam.cs.pitt.edu/
+ 5. Popper: http://getpopper.io/
+ 6. Whole Tale: https://wholetale.org/
+ 7. Snakemake: https://github.com/snakemake/snakemake
+ 8. CWL https://www.commonwl.org/ and WDL https://openwdl.org/
+ 9. Nextflow: https://www.nextflow.io/
+ 10. Sumatra: https://pythonhosted.org/Sumatra/
+ 11. Podman: https://podman.io
+ 12. AppImage (https://appimage.org/)
+ 13. Flatpack (https://flatpak.org/)
+ 14. Snap (https://snapcraft.io/)
+ 15. nbdev https://github.com/fastai/nbdev and jupytext
+ 16. Bazel: https://bazel.build/
+ 17. Debian reproducible builds: https://wiki.debian.org/ReproducibleBuilds
+
+ANSWER:
+
+1. In Section IV, we have added that "Generally, any git flow (branching
+ strategies) can be used by the high-level project authors or future
+ readers."
+2. We have mentioned research objects as one mode of provenance tracking
+ and the related provenance work that has already been done and can be
+ exploited using these criteria and our proof of concept is indeed very
+ large. However, the 6250 word-count limit is very tight and if we add
+ more on it in this length, we would have to remove more directly
+ relevant points. Hopefully this can be the subject of a follow up
+ paper.
+3. A review of ReproZip is in Appendix B.
+4. A review of Occam is in Appendix B.
+5. A review of Popper is in Appendix B.
+6. A review of Whole tale is in Appendix B.
+7. A review of Snakemake is in Appendix A.
+8. CWL and WDL are described in Appendix A (job management).
+9. Nextflow is described in Appendix A (job management).
+10. Sumatra is described in Appendix B.
+11. Podman is mentioned in Appendix A (containers).
+12. AppImage is mentioned in Appendix A (package management).
+13. Flatpak is mentioned in Appendix A (package management).
+14. nbdev and jupytext are high-level tools to generate documentation and
+ packaging custom code in Conda or pypi. High-level package managers
+ like Conda and Pypi have already been thoroughly reviewed in Appendix A
+ for their longevity issues, so we feel there is no need to include
+ these.
+15. Bazel has been mentioned in Appendix A (job management).
+16. Debian's reproducible builds is only for ensuring that software
+ packaged for Debian are bitwise reproducible. As mentioned in the
+ discussion of this paper, the bitwise reproducibility of software is
+ not an issue in the context discussed here, the reproducibility of the
+ relevant output data of the software is the main issue.
+
+
+------------------------------
+
+
+
+
+
+12. [Reviewer 1] Existing guidelines similar to the proposed "Criteria
+ for longevity". Many articles of these in the form "10 simple
+ rules for X", for example (not exhaustive list):
+ * https://doi.org/10.1371/journal.pcbi.1003285
+ * https://arxiv.org/abs/1810.08055
+ * https://osf.io/fsd7t/
+ * A model project for reproducible papers: https://arxiv.org/abs/1401.2000
+ * Executable/reproducible paper articles and original concepts
+
+ANSWER: Thank you for highlighting these points. Appendix B starts with a
+subsection titled "suggested rules, checklists or criteria" that review of
+existing criteria. that include the proposed sources here (and others).
+
+arXiv:1401.2000 has been added in Appendix A as an example paper using
+virtual machines. We thank the referee for bringing up this paper, because
+the link to the VM provided in the paper no longer works (the file has been
+removed on the server). Therefore added with SHARE, it very nicely
+highlighting our main issue with binary containers or VMs and their lack of
+longevity.
+
+------------------------------
+
+
+
+
+
+13. [Reviewer 1] Several claims in the manuscript are not properly
+ justified, neither in the text nor via citation. Examples (not
+ exhaustive list):
+ 1. "it is possible to precisely identify the Docker “images” that
+ are imported with their checksums, but that is rarely practiced
+ in most solutions that we have surveyed [which ones?]"
+ 2. "Other OSes [which ones?] have similar issues because pre-built
+ binary files are large and expensive to maintain and archive."
+ 3. "Researchers using free software tools have also already had
+ some exposure to it"
+ 4. "A popular framework typically falls out of fashion and
+ requires significant resources to translate or rewrite every
+ few years."
+
+ANSWER: They have been clarified in the highlighted parts of the text:
+
+1. Many examples have been given throughout the newly added appendices. To
+ avoid confusion in the main body of the paper, we have removed the "we
+ have surveyed" part. It is already mentioned above it that a large
+ survey of existing methods/solutions is given in the appendices.
+
+2. Due to the thorough discussion of this issue in the appendices with
+ precise examples, this line has been removed to allow space for the
+ other points raised by the referees. The main point (high cost of
+ keeping binaries) is aldreay abundantly clear.
+
+ On a similar topic, Dockerhub's recent announcement that inactive images
+ (for over 6 months) will be deleted has also been added. The announcemnt
+ URL is here (it was too long to include in the paper, if IEEE has a
+ special short-url format, we can add it):
+ https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates
+
+3. A small statement has been added, reminding the readers that almost all
+ free software projects are built with Make (note that CMake is just a
+ high-level wrapper over Make: it finally produces a 'Makefile').
+
+4. The example of Python 2 has been added.
+
+
+------------------------------
+
+
+
+
+
+14. [Reviewer 1] As mentioned in the discussion by the authors, not
+ even Bash, Git or Make is reproducible, thus not even Maneage can
+ address the longevity requirements. One possible alternative is
+ the use of CI to ensure that papers are re-executable (several
+ papers have been written on this topic). Note that CI is
+ well-established technology (e.g. Jenkins is almost 10 years old).
+
+ANSWER: Thank you for raising this issue. We had initially planned to add
+this issue also, but like many discussion points, we were forced to remove
+it before the first submission due to the very tight word-count limit. We
+have now added a sentence on CI in the discussion.
+
+On the initial note, indeed, the "executable" files of Bash, Git or Make
+are not bitwise reproducible/identical on different systems. However, as
+mentioned in the discussion, we are concerned with the _output_ of the
+software's executable file, _after_ the execution of its job. We (or any
+user of Bash) is not interested in the executable file itself. The
+reproducibility of the binary file only becomes important if a bug is found
+(very rare for common usage in such core software of the OS). Hence even
+though the compiled binary files of specific versions of Git, Bash or Make
+will not be bitwise reproducible/identical on different systems, their
+outputs are exactly reproducible: 'git describe' or Bash's 'for' loop will
+have the same output on GNU/Linux, macOS or FreeBSD (that produce bit-wise
+different executables).
+
+------------------------------
+
+
+
+
+
+15. [Reviewer 1] Criterion has been proposed previously. Maneage itself
+ provides little novelty (see comments below).
+
+ANSWER: The previously suggested criteria that were mentioned are reviewed
+in the newly added Appendix B, and the novelty/necessity of the proposed
+criteria is shown by comparison there.
+
+------------------------------
+
+
+
+
+
+16. [Reviewer 2] Authors should add indication that using good practices it
+ is possible to use Docker or VM to obtain identical OS usable for
+ reproducible research.
+
+ANSWER: In the submitted version we had stated that "Ideally, it is
+possible to precisely identify the Docker “images” that are imported with
+their checksums ...". But to be more clear and directly to the point, it
+has been edited to explicity say "... to recreate an identical OS image
+later".
+
+------------------------------
+
+
+
+
+
+17. [Reviewer 2] The CPU architecture of the platform used to run the
+ workflow is not discussed in the manuscript. Authors should probably
+ take into account the architecture used in their workflow or at least
+ report it.
+
+ANSWER: Thank you very much for raising this important point. We hadn't
+seen other reproducibility papers mention this important point and missed
+it. In the acknowledgments (where we also mention the commit hashes) we now
+explicity mention the exact CPU architecture used to build this paper:
+"This project was built on an x86_64 machine with Little Endian byte-order
+and address sizes 39 bits physical, 48 bits virtual.". This is because we
+have already seen cases where the architecture is the same, but programs
+fail because of the byte-order.
+
+Generally, Maneage will now extract this information from the running
+system during its configuration phase and provide the users with three
+different LaTeX macros that they can use anywhere in their paper.
+
+------------------------------
+
+
+
+
+
+18. [Reviewer 2] I don’t understand the "no dependency beyond
+ POSIX". Authors should more explained what they mean by this sentence.
+
+ANSWER: This has been clarified with the short extra statement "a minimal
+Unix-like standard that is shared between many operating systems". We would
+have liked to explain this more, but the word-limit is very constraining.
+
+------------------------------
+
+
+
+
+
+19. [Reviewer 2] Unfortunately, sometime we need proprietary or specialized
+ software to read raw data... For example in genetics, micro-array raw
+ data are stored in binary proprietary formats. To convert this data
+ into a plain text format, we need the proprietary software provided
+ with the measurement tool.
+
+ANSWER: Thank you very much for this good point. A description of a
+possible solution to this has been added after criteria 8.
+
+------------------------------
+
+
+
+
+
+20. [Reviewer 2] I was not able to properly set up a project with
+ Maneage. The configuration step failed during the download of tools
+ used in the workflow. This is probably due to a firewall/antivirus
+ restriction out of my control. How frequent this failure happen to
+ users?
+
+ANSWER: Thank you for mentioning this. This has been fixed by archiving all
+Maneage'd software on Zenodo (https://doi.org/10.5281/zenodo.3883409) and
+also downloading from there.
+
+Until recently we would directly access each software's own webpage to
+download the files, and this caused many problems like this. In other
+cases, we were very frustrated when a software's webpage would temporarily
+be unavailable (for maintainance reasons), this wouldn't allow us to build
+new projects.
+
+Since all the software are free, we are allowed to re-distribute them and
+Zenodo is defined for long-term archival of academic artifacts, so we
+figured that a software source code repository on Zenodo would be the most
+reliable solution. At configure time, Maneage now accesses Zenodo's DOI and
+resolves the most recent URL to automatically download any necessary
+software source code that the project needs from there.
+
+Generally, we also keep all software in a Git repository on our own
+webpage: http://git.maneage.org/tarballs-software.git/tree. Also, Maneage
+users can also identify their own custom URLs for downloading software,
+which will be given higher priority than Zenodo (useful for situations when
+a custom software is downloaded and built in a project branch (not the core
+'maneage' branch).
+
+------------------------------
+
+
+
+
+
+21. [Reviewer 2] The time to configure a new project is quite long because
+ everything needs to be compiled. Authors should compare the time
+ required to set up a project Maneage versus time used by other
+ workflows to give an indication to the readers.
+
+ANSWER: Thank you for raising this point. it takes about 1.5 hours to
+configure the default Maneage branch on an 8-core CPU (more than half of
+this time is devoted to GCC on GNU/Linux operating systems, and the
+building of GCC can optionally be disabled with the '--host-cc' option to
+significantly speed up the build when the host's GCC is
+similar). Furthermore, Maneage can be built within a Docker container.
+
+Generally, a paragraph has been added in Section IV on this issue (the
+build time and building within a Docker container). We have also defined
+task #15818 [1] to have our own core Docker image that is ready to build a
+Maneaged project and will be adding it shortly.
+
+[1] https://savannah.nongnu.org/task/index.php?15818
+
+------------------------------
+
+
+
+
+
+22. [Reviewer 3] Authors should define their use of the term [Replicability
+ or Reproducibility] briefly for their readers.
+
+ANSWER: "Reproducibility" has been defined along with "Longevity" and
+"usage" at the start of Section II.
+
+------------------------------
+
+
+
+
+
+23. [Reviewer 3] The introduction is consistent with the proposal of the
+ article, but deals with the tools separately, many of which can be used
+ together to minimize some of the problems presented. The use of
+ Ansible, Helm, among others, also helps in minimizing problems.
+
+ANSWER: Ansible and Helm are primarily designed for distributed
+computing. For example Helm is just a high-level package manager for a
+Kubernetes cluster that is based on containers. A review of them can be
+added in the Appendix, but we feel they may not be too relevant for this
+paper.
+
+------------------------------
+
+
+
+
+
+24. [Reviewer 3] When the authors use the Python example, I believe it is
+ interesting to point out that today version 2 has been discontinued by
+ the maintaining community, which creates another problem within the
+ perspective of the article.
+
+ANSWER: Thank you very much for highlighting this point it was not included
+for the sake of length, it has been fitted into the introduction now.
+
+------------------------------
+
+
+
+
+
+25. [Reviewer 3] Regarding the use of VM's and containers, I believe that
+ the discussion presented by THAIN et al., 2015 is interesting to
+ increase essential points of the current work.
+
+ANSWER: Thank you very much for pointing this the works by Thain. We
+couldn't find any first-author papers in 2015, but found Meng & Thain
+(https://doi.org/10.1016/j.procs.2017.05.116) which had a related
+discussion of why they didn't use Docker containers in their work. That
+paper is now cited in the discussion of Containers in Appendix A.
+
+------------------------------
+
+
+
+
+
+26. [Reviewer 3] About the Singularity, the description article was missing
+ (Kurtzer GM, Sochat V, Bauer MW, 2017).
+
+ANSWER: Thank you for the reference, we could not put it in the main body
+of the paper (like many others) due to the strict bibliography limit of 12,
+but it has been cited in Appendix A (where we discuss Singularity).
+
+------------------------------
+
+
+
+
+
+27. [Reviewer 3] I also believe that a reference to FAIR is interesting
+ (WILKINSON et al., 2016).
+
+ANSWER: The FAIR principles have been mentioned in the main body of the
+paper, but unfortunately we had to remove its citation the main paper (like
+MANY others) within the maximum limit 12 references. We have cited it in
+Appendix B.
+
+------------------------------
+
+
+
+
+
+28. [Reviewer 3] In my opinion, the paragraph on IPOL seems to be out of
+ context with the previous ones. This issue of end-to-end
+ reproducibility of a publication could be better explored, which would
+ further enrich the tool presented.
+
+#####################################
+ANSWER:
+#####################################
+
+------------------------------
+
+
+
+
+
+29. [Reviewer 3] On the project website, I suggest that the information
+ contained in README-hacking be presented on the same page as the
+ Tutorial. A topic breakdown is interesting, as the markdown reading may
+ be too long to find information.
+
+#####################################
+ANSWER:
+#####################################
+
+------------------------------
+
+
+
+
+
+31. [Reviewer 3] The tool is suitable for Unix users, keeping users away
+ from Microsoft environments.
+
+ANSWER: The issue of building on Windows has been discussed in Section IV,
+either using Docker (or VMs) or using the Windows Subsystem for Linux.
+
+------------------------------
+
+
+
+
+32. [Reviewer 3] Important references are missing; more references are
+ needed
+
+ANSWER: Two comprehensive Appendices have beed added to address this issue.
+
+------------------------------
+
+
+
+
+
+33. [Reviewer 4] Revisit the criteria, show how you have come to decide on
+ them, give some examples of why they are important, and address
+ potential missing criteria.
+
+for example the referee already points to "how code is written" as a
+criteria (for example for threading or floating point errors), or
+"performance".
+
+#################################
+ANSWER:
+#################################
+
+------------------------------
+
+
+
+
+
+34. [Reviewer 4] Clarify the discussion of challenges to adoption and make
+ it clearer which tradeoffs are important to practitioners.
+
+##########################
+ANSWER:
+##########################
+
+------------------------------
+
+
+
+
+
+35. [Reviewer 4] Be clearer about which sorts of research workflow are best
+ suited to this approach.
+
+################################
+ANSWER:
+################################
+
+------------------------------
+
+
+
+
+
+36. [Reviewer 4] There is also the challenge of mathematical
+ reproducibility, particularly of the handling of floating point number,
+ which might occur because of the way the code is written, and the
+ hardware architecture (including if code is optimised / parallelised).
+
+################################
+ANSWER:
+################################
+
+------------------------------
+
+
+
+
+
+37. [Reviewer 4] Performance ... is never mentioned
+
+################################
+ANSWER:
+################################
+
+------------------------------
+
+38. [Reviewer 4] Tradeoff, which might affect Criterion 3 is time to result,
+ people use popular frameworks because it is easier to use them.
+
+################################
+ANSWER:
+################################
+
+------------------------------
+
+
+
+
+
+39. [Reviewer 4] I would liked to have seen explanation of how these
+ challenges to adoption were identified: was this anecdotal, through
+ surveys? participant observation?
+
+ANSWER: The results mentioned here are based on private discussions after
+holding multiple seminars and Webinars with RDA's support, and also a
+workshop that was planned for non-astronomers. We even invited (funded)
+early career researchers to come to the workshop with the RDA funding,
+however, that workshop was cancelled due to the pandemic and we had private
+communications after.
+
+We would very much like to elaborate on this experience of training new
+researchers with these tools. However, as with many of the cases above, the
+very strict word-limit doesn't allow us to elaborate beyond what is already
+there.
+
+------------------------------
+
+
+
+
+
+40. [Reviewer 4] Potentially an interesting sidebar to investigate how
+ LaTeX/TeX has ensured its longevity!
+
+##############################
+ANSWER:
+##############################
+
+------------------------------
+
+
+
+
+
+41. [Reviewer 4] The title is not specific enough - it should refer to the
+ reproducibility of workflows/projects.
+
+##############################
+ANSWER:
+##############################
+
+------------------------------
+
+
+
+
+
+42. [Reviewer 4] Whilst the thesis stated is valid, it may not be useful to
+ practitioners of computation science and engineering as it stands.
+
+ANSWER: We would appreciate if you could clarify this point a little
+more. We have shown how it has already been used in many research projects
+(also outside of observational astronomy which is the first author's main
+background). It is precisely defined for computational science and
+engineering problems where _publication_ of the human-readable workflow
+source is also important.
+
+------------------------------
+
+
+
+
+
+43. [Reviewer 4] Longevity is not defined.
+
+ANSWER: It has been defined now at the start of Section II.
+
+------------------------------
+
+
+
+
+
+44. [Reviewer 4] Whilst various tools are discussed and discarded, no
+ attempt is made to categorise the magnitude of longevity for which they
+ are relevant. For instance, environment isolators are regarded by the
+ software preservation community as adequate for timescale of the order
+ of years, but may not be suitable for the timescale of decades where
+ porting and emulation are used.
+
+ANSWER: Statements on quantifying their longevity have been added in
+Section II. For example in the case of Docker images: "their longevity is
+determined by the host kernel, usually a decade", for Python packages:
+"Python installation with a usual longevity of a few years", for Nix/Guix:
+"with considerably better longevity; same as supported CPU architectures."
+
+------------------------------
+
+
+
+
+
+45. [Reviewer 4] The title of this section "Commonly used tools and their
+ longevity" is confusing - do you mean the longevity of the tools or the
+ longevity of the workflows that can be produced using these tools?
+ What happens if you use a combination of all four categories of tools?
+
+##########################
+ANSWER:
+##########################
+
+------------------------------
+
+
+
+
+
+46. [Reviewer 4] It wasn't clear to me if code was being run to generate
+ the results and figures in a LaTeX paper that is part of a project in
+ Maneage. It appears to be suggested this is the case, but Figure 1
+ doesn't show how this works - it just has the LaTeX files, the data
+ files and the Makefiles. Is it being suggested that LaTeX itself is the
+ programming language, using its macro functionality?
+
+ANSWER: Thank you for highlighting this point of confusion. The caption of
+Figure 1 has been edited to hopefully clarify the point. In short, the
+arrows represent the operation of software on their inputs (the file they
+originate from) to generate their outputs (the file they point to). In the
+case of generating 'paper.pdf' from its three dependencies
+('references.tex', 'paper.tex' and 'project.tex'), yes, LaTeX is used. But
+in other steps, other tools are used. For example as you see in [1] the
+main step of the arrow connecting 'table-3.txt' to 'tools-per-year.txt' is
+an AWK command (there are also a few 'echo' commands for meta data and
+copyright in the output plain-text file [2]).
+
+[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/reproduce/analysis/make/demo-plot.mk#L51
+[2] https://zenodo.org/record/3911395/files/tools-per-year.txt
+
+------------------------------
+
+
+
+
+
+47. [Reviewer 4] I was a bit confused on how collaboration is handled as
+ well - this appears to be using the Git branching model, and the
+ suggestion that Maneage is keeping track of all components from all
+ projects - but what happens if you are working with collaborators that
+ are using their own Maneage instance?
+
+ANSWER: Indeed, Maneage operates based on the Git branching model. As
+mentioned in the text, Maneage is itself a Git branch. People create their
+own branch from the 'maneage' branch and start customizing it for their
+particular project in their own particular repository. They can also use
+all types of Git-based collaborating models to work together on a project
+that is not yet finished.
+
+Figure 2 infact explicitly shows such a case: the main project leader is
+committing on the "project" branch. But a collaborator creates a separate
+branch over commit '01dd812' and makes a couple of commits ('f69e1f4' and
+'716b56b'), and finally asks the project leader to merge them into the
+project. This can be generalized to any Git based collaboration model.
+
+------------------------------
+
+
+
+
+
+48. [Reviewer 4] I would also liked to have seen a comparison between this
+ approach and other "executable" paper approaches e.g. Jupyter
+ notebooks, compared on completeness, time taken to write a "paper",
+ ease of depositing in a repository, and ease of use by another
+ researcher.
+
+#######################
+ANSWER:
+#######################
+
+------------------------------
+
+
+
+
+
+49. [Reviewer 4] The weakest aspect is the assumption that research can be
+ easily compartmentalized into simple and complete packages. Given that
+ so much of research involves collaboration and interaction, this is not
+ sufficiently addressed. In particular, the challenge of
+ interdisciplinary work, where there may not be common languages to
+ describe concepts and there may be different common workflow practices
+ will be a barrier to wider adoption of the primary thesis and criteria.
+
+ANSWER: Maneage was precisely defined to address the problem of
+publishing/collaborating on complete workflows. Hopefully with the
+clarification to point 47 above, this should also become clear.
+
+------------------------------
+
+
+
+
+
+50. [Reviewer 5] Major figures currently working in this exact field do not
+ have their work acknowledged in this work.
+
+ANSWER: This was due to the strict word limit and the CiSE publication
+policy (to not include a literature review because there is a limit of only
+12 citations). But we had indeed done a comprehensive literature review and
+the editors kindly agreed that we publish that review as appendices to the
+main paper on arXiv and Zenodo.
+
+------------------------------
+
+
+
+
+
+51. [Reviewer 5] The popper convention: Making reproducible systems
+ evaluation practical ... and the later revision that uses GitHub
+ Actions, is largely the same as this work.
+
+ANSWER: This work and the proposed criteria are very different from
+Popper. A review of Popper has been given in Appendix B.
+
+------------------------------
+
+
+
+
+
+52. [Reviewer 5] The lack of attention to virtual machines and containers
+ is highly problematic. While a reader cannot rely on DockerHub or a
+ generic OS version label for a VM or container, these are some of the
+ most promising tools for offering true reproducibility.
+
+ANSWER: Containers and VMs have been more thoroughly discussed in the main
+body and also extensively discussed in appendix A (that are now available
+in the arXiv and Zenodo versions of this paper). As discussed (with many
+cited examples), Contains and VMs are only good when they are themselves
+reproducible (for example running the Dockerfile this year and next year
+gives the same internal environment). However we show that this is not the
+case in most solutions (a more comprehensive review would require its own
+paper).
+
+However with complete/robust environment builders like Maneage, Nix or GNU
+Guix, the analysis environment within a container can be exactly reproduced
+later. But even so, due to their binary nature and large storage volume,
+they are not trusable sources for the long term (it is expensive to archive
+them). We show several example in the paper of how projects that relied on
+VMs in 2011 and 2014 are no longer active, and how even Dockerhub will be
+deleting containers that are not used for more than 6 months in free
+accounts (due to the large storage costs).
+
+------------------------------
+
+
+
+
+
+53. [Reviewer 5] On the data side, containers have the promise to manage
+ data sets and workflows completely [Lofstead J, Baker J, Younge A. Data
+ pallets: containerizing storage for reproducibility and
+ traceability. InInternational Conference on High Performance Computing
+ 2019 Jun 16 (pp. 36-45). Springer, Cham.] Taufer has picked up this
+ work and has graduated a MS student working on this topic with a
+ published thesis. See also Jimenez's P-RECS workshop at HPDC for
+ additional work highly relevant to this paper.
+
+ANSWER: Thank you for the interesting paper by Lofstead+2019 on Data
+pallets. We have cited it in Appendix A as examples of how generic the
+concept of containers is.
+
+The topic of linking data to analysis is also a core result of the criteria
+presented here, and is also discussed shortly in the paper. There are
+indeed many very interesting works on this topic. But the format of CiSE is
+very short (a maximum of ~6000 words with 12 references), so we don't have
+the space to go into this any further. But this is indeed a very
+interesting aspect for follow up studies, especially as the usage of
+Maneage incrases, and we have more example workflows by users to study the
+linkage of data analysis.
+
+------------------------------
+
+
+
+
+
+54. [Reviewer 5] Some other systems that do similar things include:
+ reprozip, occam, whole tale, snakemake.
+
+ANSWER: All these tools have been reviewed in the newly added appendices.
+
+------------------------------
+
+
+
+
+
+55. [Reviewer 5] the paper needs to include the context of the current
+ community development level to be a complete research paper. A revision
+ that includes evaluation of (using the criteria) and comparison with
+ the suggested systems and a related work section that seriously
+ evaluates the work of the recommended authors, among others, would make
+ this paper worthy for publication.
+
+ANSWER: A thorough review of current low-level tools and and high-level
+reproducible workflow management systems has been added in the extended
+Appendix.
+
+------------------------------
+
+
+
+
+
+56. [Reviewer 5] Offers criteria any system that offers reproducibility
+ should have.
+
+ANSWER:
+
+------------------------------
+
+
+
+
+
+57. [Reviewer 5] Yet another example of a reproducible workflows project.
+
+ANSWER: As the newly added thorough comparisons with existing systems
+shows, these set of criteria and the proof-of-concept offer uniquely new
+features. As another referee summarized: "This manuscript describes a new
+reproducible workflow which doesn't require another new trendy high-level
+software. The proposed workflow is only based on low-level tools already
+widely known."
+
+The fact that we don't define yet another workflow language and framework
+and base the whole workflow on time-tested solutions in a framwork that
+costs only ~100 kB to archive (in contrast to multi-GB containers or VMs)
+is new.
+
+------------------------------
+
+
+
+
+
+58. [Reviewer 5] There are numerous examples, mostly domain specific, and
+ this one is not the most advanced general solution.
+
+ANSWER: As the comparisons in the appendices and clarifications above show,
+there are many features in the proposed criteria and proof of concept that
+are new.
+
+------------------------------
+
+
+
+
+
+59. [Reviewer 5] Lack of context in the field missing very relevant work
+ that eliminates much, if not all, of the novelty of this work.
+
+ANSWER: The newly added appendices thoroughly describe the context and
+previous work that has been done in this field.
+
+------------------------------
diff --git a/peer-review/1-review.txt b/peer-review/1-review.txt
new file mode 100644
index 0000000..6c72f29
--- /dev/null
+++ b/peer-review/1-review.txt
@@ -0,0 +1,788 @@
+From: cise@computer.org
+To: mohammad@akhlaghi.org,
+ infantesainz@gmail.com,
+ boud@astro.uni.torun.pl,
+ david.valls-gabaud@observatoiredeparis.psl.eu,
+ rbaena@iac.es
+Received: Tue, 22 Sep 2020 15:28:21 -0400
+Subject: Computing in Science and Engineering, CiSESI-2020-06-0048
+ major revision required
+
+--------------------------------------------------
+
+Computing in Science and Engineering,CiSESI-2020-06-0048
+"Towards Long-term and Archivable Reproducibility"
+manuscript type: Reproducible Research
+
+Dear Dr. Mohammad Akhlaghi,
+
+The manuscript that you submitted to Computing in Science and Engineering
+has completed the review process. After carefully examining the manuscript
+and reviews, we have decided that the manuscript needs major revisions
+before it can be considered for a second review.
+
+Your revision is due before 22-Oct-2020. Please note that if your paper was
+submitted to a special issue, this due date may be different. Contact the
+peer review administrator, Ms. Jessica Ingle, at cise@computer.org if you
+have questions.
+
+The reviewer and editor comments are attached below for your
+reference. Please maintain our 6,250–word limit as you make your revisions.
+
+To upload your revision and summary of changes, log on to
+https://mc.manuscriptcentral.com/cise-cs, click on your Author Center, then
+"Manuscripts with Decisions." Under "Actions," choose "Create a Revision"
+next to the manuscript number.
+
+Highlight the changes to your manuscript by using the track changes mode in
+MS Word, the latexdiff package if using LaTex, or by using bold or colored
+text.
+
+When submitting your revised manuscript, you will need to respond to the
+reviewer comments in the space provided.
+
+If you have questions regarding our policies or procedures, please refer to
+the magazines' Author Information page linked from the Instructions and
+Forms (top right corner of the ScholarOne Manuscripts screen) or you can
+contact me.
+
+We look forward to receiving your revised manuscript.
+
+Sincerely,
+Dr. Lorena A. Barba
+George Washington University
+Mechanical and Aerospace Engineering
+Editor-in-Chief, Computing in Science and Engineering
+
+--------------------------------------------------
+
+
+
+
+
+EiC comments:
+Some reviewers request additions, and overview of other tools, etc. In
+doing your revision, please remember space limitations: 6,250 words
+maximum, including all main body, abstract, keyword, bibliography (12
+references or less), and biography text. See "Write For Us" section of the
+website: https://www.computer.org/csdl/magazine/cs
+
+Comments of the Associate Editor: Associate Editor
+Comments to the Author: Thank to the authors for your submission to the
+Reproducible Research department.
+
+Thanks to the reviewers for your careful and thoughtful reviews. We would
+appreciate it if you can make your reports available and share the DOI as
+soon as possible, per our original invitation e-mail. We will follow up our
+original invitation to obtain your review DOI, if you have not already
+included it in your review comments.
+
+Based on the review feedback, there are a number of major issues that
+require attention and many minor ones as well. Please take these into
+account as you prepare your major revision for another round of
+review. (See the actual review reports for details.)
+
+1. In general, there are a number of presentation issues needing
+attention. There are general concerns about the paper lacking focus. Some
+terminology is not well-defined (e.g. longevity). In addition, the
+discussion of tools could benefit from some categorization to characterize
+their longevity. Background and related efforts need significant
+improvement. (See below.)
+
+2. There is consistency among the reviews that related work is particularly
+lacking and not taking into account major works that have been written on
+this topic. See the reviews for details about work that could potentially
+be included in the discussion and how the current work is positioned with
+respect to this work.
+
+3. The current work needs to do a better job of explaining how it deals
+with the nagging problem of running on CPU vs. different architectures. At
+least one review commented on the need to include a discussion of
+continuous integration (CI) and its potential to help identify problems
+running on different architectures. Is CI employed in any way in the work
+presented in this article?
+
+4. The presentation of the Maneage tool is both lacking in clarity and
+consistency with the public information/documentation about the tool. While
+our review focus is on the article, it is important that readers not be
+confused when they visit your site to use your tools.
+
+5. A significant question raised by one review is how this work compares to
+"executable" papers and Jupyter notebooks. Does this work embody
+similar/same design principles or expand upon the established alternatives?
+In any event, a discussion of this should be included in
+background/motivation and related work to help readers understand the clear
+need for a new approach, if this is being presented as new/novel.
+
+Reviews:
+
+Please note that some reviewers may have included additional comments in a
+separate file. If a review contains the note "see the attached file" under
+Section III A - Public Comments, you will need to log on to ScholarOne
+Manuscripts to view the file. After logging in, select the Author Center,
+click on the "Manuscripts with Decisions" queue and then click on the "view
+decision letter" link for this manuscript. You must scroll down to the very
+bottom of the letter to see the file(s), if any. This will open the file
+that the reviewer(s) or the Associate Editor included for you along with
+their review.
+
+--------------------------------------------------
+
+
+
+
+
+Reviewer: 1
+Recommendation: Author Should Prepare A Major Revision For A Second Review
+
+Comments:
+
+ * Adding an explicit list of contributions would make it easier to the
+ reader to appreciate these.
+
+ * These are not mentioned/cited and are highly relevant to this paper (in
+ no particular order):
+
+ * Git flows, both in general and in particular for research.
+ * Provenance work, in general and with git in particular
+ * Reprozip: https://www.reprozip.org/
+ * OCCAM: https://occam.cs.pitt.edu/
+ * Popper: http://getpopper.io/
+ * Whole Tale: https://wholetale.org/
+ * Snakemake: https://github.com/snakemake/snakemake
+ * CWL https://www.commonwl.org/ and WDL https://openwdl.org/
+ * Nextflow: https://www.nextflow.io/
+ * Sumatra: https://pythonhosted.org/Sumatra/
+ * Podman: https://podman.io
+ * AppImage (https://appimage.org/), Flatpack
+ (https://flatpak.org/), Snap (https://snapcraft.io/)
+ * nbdev https://github.com/fastai/nbdev and jupytext
+ * Bazel: https://bazel.build/
+ * Debian reproducible builds: https://wiki.debian.org/ReproducibleBuilds
+
+ * Existing guidelines similar to the proposed "Criteria for
+ longevity". Many articles of these in the form "10 simple rules for
+ X", for example (not exhaustive list):
+ * https://doi.org/10.1371/journal.pcbi.1003285
+ * https://arxiv.org/abs/1810.08055
+ * https://osf.io/fsd7t/
+
+ * A model project for reproducible papers: https://arxiv.org/abs/1401.2000
+
+ * Executable/reproducible paper articles and original concepts
+
+ * Several claims in the manuscript are not properly justified, neither in
+ the text nor via citation. Examples (not exhaustive list):
+
+ * "it is possible to precisely identify the Docker “images” that are
+ imported with their checksums, but that is rarely practiced in most
+ solutions that we have surveyed [which ones?]"
+
+ * "Other OSes [which ones?] have similar issues because pre-built
+ binary files are large and expensive to maintain and archive."
+
+ * "Researchers using free software tools have also already had some
+ exposure to it"
+
+ * "A popular framework typically falls out of fashion and requires
+ significant resources to translate or rewrite every few years."
+
+ * As mentioned in the discussion by the authors, not even Bash, Git or
+ Make is reproducible, thus not even Maneage can address the longevity
+ requirements. One possible alternative is the use of CI to ensure that
+ papers are re-executable (several papers have been written on this
+ topic). Note that CI is well-established technology (e.g. Jenkins is
+ almost 10 years old).
+
+Additional Questions:
+
+1. How relevant is this manuscript to the readers of this periodical?
+ Please explain your rating in the Detailed Comments section.: Very
+ Relevant
+
+2. To what extent is this manuscript relevant to readers around the world?:
+ The manuscript is of interest to readers throughout the world
+
+1. Please summarize what you view as the key point(s) of the manuscript and
+ the importance of the content to the readers of this periodical.: This
+ article introduces desiderata for long-term archivable reproduciblity
+ and presents Maneage, a system whose goal is to achieve these outlined
+ properties.
+
+2. Is the manuscript technically sound? Please explain your answer in the
+ Detailed Comments section.: Partially
+
+3. What do you see as this manuscript's contribution to the literature in
+ this field?: Presentation of Maneage
+
+4. What do you see as the strongest aspect of this manuscript?: A great
+ summary of Maneage, as well as its implementaiton.
+
+5. What do you see as the weakest aspect of this manuscript?: Criterion has
+ been proposed previously. Maneage itself provides little novelty (see
+ comments below).
+
+1. Does the manuscript contain title, abstract, and/or keywords?: Yes
+
+2. Are the title, abstract, and keywords appropriate? Please elaborate in
+ the Detailed Comments section.: Yes
+
+3. Does the manuscript contain sufficient and appropriate references
+ (maximum 12-unless the article is a survey or tutorial in scope)? Please
+ elaborate in the Detailed Comments section.: Important references are
+ missing; more references are needed
+
+4. Does the introduction clearly state a valid thesis? Please explain your
+ answer in the Detailed Comments section.: Could be improved
+
+5. How would you rate the organization of the manuscript? Please elaborate
+ in the Detailed Comments section.: Satisfactory
+
+6. Is the manuscript focused? Please elaborate in the Detailed Comments
+ section.: Satisfactory
+
+7. Is the length of the manuscript appropriate for the topic? Please
+ elaborate in the Detailed Comments section.: Satisfactory
+
+8. Please rate and comment on the readability of this manuscript in the
+ Detailed Comments section.: Easy to read
+
+9. Please rate and comment on the timeliness and long term interest of this
+ manuscript to CiSE readers in the Detailed Comments section. Select all
+ that apply.: Topic and content are of limited interest to CiSE readers.
+
+Please rate the manuscript. Explain your choice in the Detailed Comments
+section.: Good
+
+--------------------------------------------------
+
+
+
+
+
+Reviewer: 2
+Recommendation: Accept If Certain Minor Revisions Are Made
+
+Comments: https://doi.org/10.22541/au.159724632.29528907
+
+Operating System: Authors mention that Docker is usually used with an image
+of Ubuntu without precision about the version used. And Even if users take
+care about the version, the image is updated monthly thus the image used
+will have different OS components based on the generation time. This
+difference in OS components will interfere on the reproducibility. I agree
+on that, but I would like to add that it is a wrong habit of users. It is
+possible to generate reproducible Docker images by generating it from an
+ISO image of the OS. These ISO images are archived, at least for Ubuntu
+(http://old-releases.ubuntu.com/releases) and for Debian
+(https://cdimage.debian.org/mirror/cdimage/archive) thus allow users to
+generate an OS with identical components. Combined with the
+snapshot.debian.org service, it is even possible to update a Debian release
+to a specific time point up to 2005 and with a precision of six hours. With
+combination of both ISO image and snapshot.debian.org service it is
+possible to obtain an OS for Docker or for a VM with identical components
+even if users have to use the PM of the OS. Authors should add indication
+that using good practices it is possible to use Docker or VM to obtain
+identical OS usable for reproducible research.
+
+CPU architecture: The CPU architecture of the platform used to run the
+workflow is not discussed in the manuscript. During software integration in
+Debian, I have seen several software failing their unit tests due to
+different behavior from itself or from a library dependency. This not
+expected behavior was only present on non-x86 architectures, mainly because
+developers use a x86 machine for their developments and tests. Bug or
+feature? I don’t know, but nowadays, it is quite frequent to see computers
+with a non-x86 CPU. It would be annoying to fail the reproducibility step
+because of a different in CPU architecture. Authors should probably take
+into account the architecture used in their workflow or at least report it.
+
+POSIX dependency: I don’t understand the "no dependency beyond
+POSIX". Authors should more explained what they mean by this sentence. I
+completely agree that the dependency hell must be avoided and dependencies
+should be used with parsimony. Unfortunately, sometime we need proprietary
+or specialized software to read raw data. For example in genetics,
+micro-array raw data are stored in binary proprietary formats. To convert
+this data into a plain text format, we need the proprietary software
+provided with the measurement tool.
+
+Maneage: I was not able to properly set up a project with Maneage. The
+configuration step failed during the download of tools used in the
+workflow. This is probably due to a firewall/antivirus restriction out of
+my control. How frequent this failure happen to users? Moreover, the time
+to configure a new project is quite long because everything needs to be
+compiled. Authors should compare the time required to set up a project
+Maneage versus time used by other workflows to give an indication to the
+readers.
+
+Disclaimer: For the sake of transparency, it should be noted that I am
+involved in the development of Debian, thus my comments are probably
+oriented.
+
+Additional Questions:
+
+1. How relevant is this manuscript to the readers of this periodical?
+ Please explain your rating in the Detailed Comments section.: Relevant
+
+2. To what extent is this manuscript relevant to readers around the world?:
+ The manuscript is of interest to readers throughout the world
+
+1. Please summarize what you view as the key point(s) of the manuscript and
+ the importance of the content to the readers of this periodical.: The
+ authors describe briefly the history of solutions proposed by
+ researchers to generate reproducible workflows. Then, they report the
+ problems with the current tools used to tackle the reproducible
+ problem. They propose a set of criteria to develop new reproducible
+ workflows and finally they describe their proof of concept workflow
+ called "Maneage". This manuscript could help researchers to improve
+ their workflow to obtain reproducible results.
+
+2. Is the manuscript technically sound? Please explain your answer in the
+ Detailed Comments section.: Yes
+
+3. What do you see as this manuscript's contribution to the literature in
+ this field?: The authors try to propose a simple answer to the
+ reproducibility problem by defining new criteria. They also propose a
+ proof of concept workflow which can be directly used by researchers for
+ their projects.
+
+4. What do you see as the strongest aspect of this manuscript?: This
+ manuscript describes a new reproducible workflow which doesn't require
+ another new trendy high-level software. The proposed workflow is only
+ based on low-level tools already widely known. Moreover, the workflow
+ takes into account the version of all software used in the chain of
+ dependencies.
+
+5. What do you see as the weakest aspect of this manuscript?: Authors don't
+ discuss the problem of results reproducibility when analysis are
+ performed using CPU with different architectures. Some libraries have
+ different behaviors when they ran on different architectures and it
+ could influence final results. Authors are probably talking about x86,
+ but there is no reference at all in the manuscript.
+
+1. Does the manuscript contain title, abstract, and/or keywords?: Yes
+
+2. Are the title, abstract, and keywords appropriate? Please elaborate in
+ the Detailed Comments section.: Yes
+
+3. Does the manuscript contain sufficient and appropriate references
+ (maximum 12-unless the article is a survey or tutorial in scope)? Please
+ elaborate in the Detailed Comments section.: References are sufficient
+ and appropriate
+
+4. Does the introduction clearly state a valid thesis? Please explain your
+ answer in the Detailed Comments section.: Yes
+
+5. How would you rate the organization of the manuscript? Please elaborate
+ in the Detailed Comments section.: Satisfactory
+
+6. Is the manuscript focused? Please elaborate in the Detailed Comments
+ section.: Satisfactory
+
+7. Is the length of the manuscript appropriate for the topic? Please
+ elaborate in the Detailed Comments section.: Satisfactory
+
+8. Please rate and comment on the readability of this manuscript in the
+ Detailed Comments section.: Easy to read
+
+9. Please rate and comment on the timeliness and long term interest of this
+ manuscript to CiSE readers in the Detailed Comments section. Select all
+ that apply.: Topic and content are of immediate and continuing interest
+ to CiSE readers
+
+Please rate the manuscript. Explain your choice in the Detailed Comments
+section.: Good
+
+--------------------------------------------------
+
+
+
+
+
+Reviewer: 3
+Recommendation: Accept If Certain Minor Revisions Are Made
+
+Comments: Longevity of workflows in a project is one of the problems for
+reproducibility in different fields of computational research. Therefore, a
+proposal that seeks to guarantee this longevity becomes relevant for the
+entire community, especially when it is based on free software and is easy
+to access and implement.
+
+GOODMAN et al., 2016, BARBA, 2018 and PLESSER, 2018 observed in their
+research that the terms reproducibility and replicability are frequently
+found in the scientific literature and their use interchangeably ends up
+generating confusion due to the authors' lack of clarity. Thus, authors
+should define their use of the term briefly for their readers.
+
+The introduction is consistent with the proposal of the article, but deals
+with the tools separately, many of which can be used together to minimize
+some of the problems presented. The use of Ansible, Helm, among others,
+also helps in minimizing problems. When the authors use the Python example,
+I believe it is interesting to point out that today version 2 has been
+discontinued by the maintaining community, which creates another problem
+within the perspective of the article. Regarding the use of VM's and
+containers, I believe that the discussion presented by THAIN et al., 2015
+is interesting to increase essential points of the current work. About the
+Singularity, the description article was missing (Kurtzer GM, Sochat V,
+Bauer MW, 2017). I also believe that a reference to FAIR is interesting
+(WILKINSON et al., 2016).
+
+In my opinion, the paragraph on IPOL seems to be out of context with the
+previous ones. This issue of end-to-end reproducibility of a publication
+could be better explored, which would further enrich the tool presented.
+
+The presentation of the longevity criteria was adequate in the context of
+the article and explored the points that were dealt with later.
+
+The presentation of the tool was consistent. On the project website, I
+suggest that the information contained in README-hacking be presented on
+the same page as the Tutorial. A topic breakdown is interesting, as the
+markdown reading may be too long to find information.
+
+Additional Questions:
+
+1. How relevant is this manuscript to the readers of this periodical?
+ Please explain your rating in the Detailed Comments section.: Relevant
+
+2. To what extent is this manuscript relevant to readers around the world?:
+ The manuscript is of interest to readers throughout the world
+
+1. Please summarize what you view as the key point(s) of the manuscript and
+ the importance of the content to the readers of this periodical.: In
+ this article, the authors discuss the problem of the longevity of
+ computational workflows, presenting what they consider to be criteria
+ for longevity and an implementation based on these criteria, called
+ Maneage, seeking to ensure a long lifespan for analysis projects.
+
+2. Is the manuscript technically sound? Please explain your answer in the
+ Detailed Comments section.: Yes
+
+3. What do you see as this manuscript's contribution to the literature in
+ this field?: In this article, the authors discuss the problem of the
+ longevity of computational workflows, presenting what they consider to
+ be criteria for longevity and an implementation based on these criteria,
+ called Maneage, seeking to ensure a long lifespan for analysis projects.
+
+ As a key point, the authors enumerate quite clear criteria that can
+ guarantee the longevity of projects and present a free software-based
+ way of achieving this objective. The method presented by the authors is
+ not easy to implement for many end users, with low computer knowledge,
+ but it can be easily implemented by users with average knowledge in the
+ area.
+
+4. What do you see as the strongest aspect of this manuscript?: One of the
+ strengths of the manuscript is the implementation of Maneage entirely in
+ free software and the search for completeness presented in the
+ manuscript. The use of GNU software adds the guarantee of long
+ maintenance by one of the largest existing software communities. In
+ addition, the tool developed has already been tested in different
+ publications, showing itself consistent in different scenarios.
+
+5. What do you see as the weakest aspect of this manuscript?: For the
+ proper functioning of the proposed tool, the user needs prior knowledge
+ of LaTeX, GIT and the command line, which can keep inexperienced users
+ away. Likewise, the tool is suitable for Unix users, keeping users away
+ from Microsoft environments.
+
+ Even though Unix-like environments are the majority in the areas of
+ scientific computing, many users still perform their analysis in
+ different areas on Windows computers or servers, with the assistance of
+ package managers.
+
+1. Does the manuscript contain title, abstract, and/or keywords?: Yes
+
+2. Are the title, abstract, and keywords appropriate? Please elaborate in
+ the Detailed Comments section.: Yes
+
+3. Does the manuscript contain sufficient and appropriate references
+ (maximum 12-unless the article is a survey or tutorial in scope)? Please
+ elaborate in the Detailed Comments section.: Important references are
+ missing; more references are needed
+
+4. Does the introduction clearly state a valid thesis? Please explain your
+ answer in the Detailed Comments section.: Could be improved
+
+5. How would you rate the organization of the manuscript? Please elaborate
+ in the Detailed Comments section.: Satisfactory
+
+6. Is the manuscript focused? Please elaborate in the Detailed Comments
+ section.: Could be improved
+
+7. Is the length of the manuscript appropriate for the topic? Please
+ elaborate in the Detailed Comments section.: Satisfactory
+
+8. Please rate and comment on the readability of this manuscript in the
+ Detailed Comments section.: Easy to read
+
+9. Please rate and comment on the timeliness and long term interest of this
+ manuscript to CiSE readers in the Detailed Comments section. Select all
+ that apply.: Topic and content are of immediate and continuing interest
+ to CiSE readers
+
+Please rate the manuscript. Explain your choice in the Detailed Comments
+section.: Excellent
+
+--------------------------------------------------
+
+
+
+
+
+Reviewer: 4
+Recommendation: Author Should Prepare A Major Revision For A Second Review
+
+Comments: Overall evaluation - Good.
+
+This paper is in scope, and the topic is of interest to the readers of
+CiSE. However in its present form, I have concerns about whether the paper
+presents enough new contributions to the area in a way that can then be
+understood and reused by others. The main things I believe need addressing
+are: 1) Revisit the criteria, show how you have come to decide on them,
+give some examples of why they are important, and address potential missing
+criteria. 2) Clarify the discussion of challenges to adoption and make it
+clearer which tradeoffs are important to practitioners. 3) Be clearer about
+which sorts of research workflow are best suited to this approach.
+
+B2.Technical soundness: here I am discussing the soundness of the paper,
+rather than the soundness of the Maneage tool. There are some fundamental
+additional challenges to reproducibility that are not addressed. Although
+software library versions are addressed, there is also the challenge of
+mathematical reproducibility, particularly of the handling of floating
+point number, which might occur because of the way the code is written, and
+the hardware architecture (including if code is optimised /
+parallelised). This could obviously be addressed through a criterion around
+how code is written, but this will also come with a tradeoff against
+performance, which is never mentioned. Another tradeoff, which might affect
+Criterion 3 is time to result - people use popular frameworks because it is
+easier to use them. Regarding the discussion, I would liked to have seen
+explanation of how these challenges to adoption were identified: was this
+anecdotal, through surveys. participant observation? As a side note around
+the technical aspects of Maneage - it is using LaTeX which in turn is built
+on TeX which in turn has had many portability problems in the past due to
+being written using WEB / Tangle, though with web2c this is largely now
+resolved - potentially an interesting sidebar to investigate how LaTeX/TeX
+has ensured its longevity!
+
+C2. The title is not specific enough - it should refer to the
+reproducibility of workflows/projects.
+
+C4. As noted above, whilst the thesis stated is valid, it may not be useful
+to practitioners of computation science and engineering as it stands.
+
+C6. Manuscript focus. I would have liked a more focussed approach to the
+presentation of information in II. Longevity is not defined, and whilst
+various tools are discussed and discarded, no attempt is made to categorise
+the magnitude of longevity for which they are relevant. For instance,
+environment isolators are regarded by the software preservation community
+as adequate for timescale of the order of years, but may not be suitable
+for the timescale of decades where porting and emulation are used. The
+title of this section "Commonly used tools and their longevity" is also
+confusing - do you mean the longevity of the tools or the longevity of the
+workflows that can be produced using these tools? What happens if you use a
+combination of all four categories of tools?
+
+C8. Readability. I found it difficult to follow the description of how
+Maneage works. It wasn't clear to me if code was being run to generate the
+results and figures in a LaTeX paper that is part of a project in
+Maneage. It appears to be suggested this is the case, but Figure 1 doesn't
+show how this works - it just has the LaTeX files, the data files and the
+Makefiles. Is it being suggested that LaTeX itself is the programming
+language, using its macro functionality? I was a bit confused on how
+collaboration is handled as well - this appears to be using the Git
+branching model, and the suggestion that Maneage is keeping track of all
+components from all projects - but what happens if you are working with
+collaborators that are using their own Maneage instance?
+
+I would also liked to have seen a comparison between this approach and
+other "executable" paper approaches e.g. Jupyter notebooks, compared on
+completeness, time taken to write a "paper", ease of depositing in a
+repository, and ease of use by another researcher.
+
+Additional Questions:
+
+1. How relevant is this manuscript to the readers of this periodical?
+ Please explain your rating in the Detailed Comments section.: Relevant
+
+2. To what extent is this manuscript relevant to readers around the world?:
+ The manuscript is of interest to readers throughout the world
+
+1. Please summarize what you view as the key point(s) of the manuscript and
+ the importance of the content to the readers of this periodical.: This
+ manuscript discusses the challenges of reproducibility of computational
+ research workflows, suggests criteria for improving the "longevity" of
+ workflows, describes the proof-of-concept tool, Maneage, that has been
+ built to implement these criteria, and discusses the challenges to
+ adoption.
+
+ Of primary importance is the discussion of the challenges to adoption,
+ as CiSE is about computational science which does not take place in a
+ theoretical vacuum. Many of the identified challenges relate to the
+ practice of computational science and the implementation of systems in
+ the real world.
+
+2. Is the manuscript technically sound? Please explain your answer in the
+ Detailed Comments section.: Partially
+
+3. What do you see as this manuscript's contribution to the literature in
+ this field?: The manuscript makes a modest contribution to the
+ literature through the description of the proof-of-concept, in
+ particular its approach to integrating asset management, version control
+ and build and the discussion of challenges to adoption.
+
+ The proposed criteria have mostly been discussed at length in many other
+ works looking at computational reproducibility and executable papers.
+
+4. What do you see as the strongest aspect of this manuscript?: The
+ strongest aspect is the discussion of difficulties for widespread
+ adoption of this sort of approach. Because the proof-of-concept tool
+ received support through the RDA, it was possible to get feedback from
+ researchers who were likely to use it. This has highlighted and
+ reinforced a number of challenges and caveats.
+
+5. What do you see as the weakest aspect of this manuscript?: The weakest
+ aspect is the assumption that research can be easily compartmentalized
+ into simple and complete packages. Given that so much of research
+ involves collaboration and interaction, this is not sufficiently
+ addressed. In particular, the challenge of interdisciplinary work, where
+ there may not be common languages to describe concepts and there may be
+ different common workflow practices will be a barrier to wider adoption
+ of the primary thesis and criteria.
+
+1. Does the manuscript contain title, abstract, and/or keywords?: Yes
+
+2. Are the title, abstract, and keywords appropriate? Please elaborate in
+ the Detailed Comments section.: No
+
+3. Does the manuscript contain sufficient and appropriate references
+ (maximum 12-unless the article is a survey or tutorial in scope)? Please
+ elaborate in the Detailed Comments section.: References are sufficient
+ and appropriate
+
+4. Does the introduction clearly state a valid thesis? Please explain your
+ answer in the Detailed Comments section.: Could be improved
+
+5. How would you rate the organization of the manuscript? Please elaborate
+ in the Detailed Comments section.: Satisfactory
+
+6. Is the manuscript focused? Please elaborate in the Detailed Comments
+ section.: Could be improved
+
+7. Is the length of the manuscript appropriate for the topic? Please
+ elaborate in the Detailed Comments section.: Satisfactory
+
+8. Please rate and comment on the readability of this manuscript in the
+ Detailed Comments section.: Readable - but requires some effort to
+ understand
+
+9. Please rate and comment on the timeliness and long term interest of this
+ manuscript to CiSE readers in the Detailed Comments section. Select all
+ that apply.: Topic and content are of immediate and continuing interest
+ to CiSE readers
+
+Please rate the manuscript. Explain your choice in the Detailed Comments
+section.: Good
+
+--------------------------------------------------
+
+
+
+
+
+Reviewer: 5
+Recommendation: Author Should Prepare A Major Revision For A Second Review
+
+Comments:
+
+Major figures currently working in this exact field do not have their work
+acknowledged in this work. In no particular order: Victoria Stodden,
+Michael Heroux, Michela Taufer, and Ivo Jimenez. All of these authors have
+multiple publications that are highly relevant to this paper. In the case
+of Ivo Jimenez, his Popper work [Jimenez I, Sevilla M, Watkins N, Maltzahn
+C, Lofstead J, Mohror K, Arpaci-Dusseau A, Arpaci-Dusseau R. The popper
+convention: Making reproducible systems evaluation practical. In2017 IEEE
+International Parallel and Distributed Processing Symposium Workshops
+(IPDPSW) 2017 May 29 (pp. 1561-1570). IEEE.] and the later revision that
+uses GitHub Actions, is largely the same as this work. The lack of
+attention to virtual machines and containers is highly problematic. While a
+reader cannot rely on DockerHub or a generic OS version label for a VM or
+container, these are some of the most promising tools for offering true
+reproducibility. On the data side, containers have the promise to manage
+data sets and workflows completely [Lofstead J, Baker J, Younge A. Data
+pallets: containerizing storage for reproducibility and
+traceability. InInternational Conference on High Performance Computing 2019
+Jun 16 (pp. 36-45). Springer, Cham.] Taufer has picked up this work and has
+graduated a MS student working on this topic with a published thesis. See
+also Jimenez's P-RECS workshop at HPDC for additional work highly relevant
+to this paper.
+
+Some other systems that do similar things include: reprozip, occam, whole
+tale, snakemake.
+
+While the work here is a good start, the paper needs to include the context
+of the current community development level to be a complete research
+paper. A revision that includes evaluation of (using the criteria) and
+comparison with the suggested systems and a related work section that
+seriously evaluates the work of the recommended authors, among others,
+would make this paper worthy for publication.
+
+Additional Questions:
+
+1. How relevant is this manuscript to the readers of this periodical?
+ Please explain your rating in the Detailed Comments section.: Very
+ Relevant
+
+2. To what extent is this manuscript relevant to readers around the world?:
+ The manuscript is of interest to readers throughout the world
+
+1. Please summarize what you view as the key point(s) of the manuscript and
+ the importance of the content to the readers of this periodical.: This
+ paper describes the Maneage system for reproducibile workflows. It lays
+ out a bit of the need, has very limited related work, and offers
+ criteria any system that offers reproducibility should have, and finally
+ describes how Maneage achieves these goals.
+
+2. Is the manuscript technically sound? Please explain your answer in the
+ Detailed Comments section.: Partially
+
+3. What do you see as this manuscript's contribution to the literature in
+ this field?: Yet another example of a reproducible workflows
+ project. There are numerous examples, mostly domain specific, and this
+ one is not the most advanced general solution.
+
+4. What do you see as the strongest aspect of this manuscript?: Working
+ code and published artifacts
+
+5. What do you see as the weakest aspect of this manuscript?: Lack of
+ context in the field missing very relevant work that eliminates much, if
+ not all, of the novelty of this work.
+
+1. Does the manuscript contain title, abstract, and/or keywords?: Yes
+
+2. Are the title, abstract, and keywords appropriate? Please elaborate in
+ the Detailed Comments section.: Yes
+
+3. Does the manuscript contain sufficient and appropriate references
+ (maximum 12-unless the article is a survey or tutorial in scope)? Please
+ elaborate in the Detailed Comments section.: Important references are
+ missing; more references are needed
+
+4. Does the introduction clearly state a valid thesis? Please explain your
+ answer in the Detailed Comments section.: Could be improved
+
+5. How would you rate the organization of the manuscript? Please elaborate
+ in the Detailed Comments section.: Satisfactory
+
+6. Is the manuscript focused? Please elaborate in the Detailed Comments
+ section.: Could be improved
+
+7. Is the length of the manuscript appropriate for the topic? Please
+ elaborate in the Detailed Comments section.: Could be improved
+
+8. Please rate and comment on the readability of this manuscript in the
+ Detailed Comaments section.: Easy to read
+
+9. Please rate and comment on the timeliness and long term interest of this
+ manuscript to CiSE readers in the Detailed Comments section. Select all
+ that apply.: Topic and content are likely to be of growing interest to
+ CiSE readers over the next 12 months
+
+Please rate the manuscript. Explain your choice in the Detailed Comments
+section.: Fair
diff --git a/project b/project
index 311c051..93c55e7 100755
--- a/project
+++ b/project
@@ -503,11 +503,13 @@ case $operation in
# example when running './project make clean' there isn't any
# 'paper.pdf').
if [ -f paper.pdf ]; then
- if type pdftotext > /dev/null 2>/dev/null; then
- numwords=$(pdftotext paper.pdf && cat paper.txt | wc -w)
- echo; echo "Number of words in full PDF: $numwords"
- rm paper.txt
- fi
+ if type pdftotext > /dev/null 2>/dev/null; then
+ numwords=$(pdftotext paper.pdf && cat paper.txt | wc -w)
+ numeff=$(echo $numwords | awk '{print $1-850}')
+ echo; echo "Number of words in full PDF: $numwords"
+ echo "No abstract, and figure captions: $numeff"
+ rm paper.txt
+ fi
fi
;;
diff --git a/reproduce/analysis/config/metadata.conf b/reproduce/analysis/config/metadata.conf
index 07a1145..a06b43c 100644
--- a/reproduce/analysis/config/metadata.conf
+++ b/reproduce/analysis/config/metadata.conf
@@ -10,7 +10,7 @@
# warranty.
# Project information
-metadata-title = Towards Long-term and Archivable Reproducibility
+metadata-title = Long-term and Archivable Reproducibility
# DOIs and identifiers.
metadata-arxiv = 2006.03018
diff --git a/reproduce/analysis/make/initialize.mk b/reproduce/analysis/make/initialize.mk
index 9ce157d..6431863 100644
--- a/reproduce/analysis/make/initialize.mk
+++ b/reproduce/analysis/make/initialize.mk
@@ -506,3 +506,9 @@ $(mtexdir)/initialize.tex: | $(mtexdir)
# case, we'll just return the string a clear string.
v=$$(git describe --always --long maneage) || v=maneage-ref-missing
echo "\newcommand{\maneageversion}{$$v}" >> $@
+
+ # Get the date of the most recent commit from the Maneage
+ # branch. Note that with '%aD', the format looks like this:
+ # Wed, 9 Sep 2020 10:08:20 +0200
+ v=$$(git show -s --format=%aD $$v | awk '{print $$2, $$3, $$4}')
+ echo "\newcommand{\maneagedate}{$$v}" >> $@
diff --git a/reproduce/analysis/make/paper.mk b/reproduce/analysis/make/paper.mk
index de7b87f..3bb6a8c 100644
--- a/reproduce/analysis/make/paper.mk
+++ b/reproduce/analysis/make/paper.mk
@@ -202,7 +202,7 @@ paper.pdf: $(mtexdir)/project.tex paper.tex $(texbdir)/paper.bbl
# conversion from PostScript to PDF, see
# https://www.ghostscript.com/doc/current/Language.htm#Transparency
dvips paper.dvi
- ps2pdf -dNOSAFER paper.ps
+ ps2pdf paper.ps
# Come back to the top project directory and copy the built PDF
# file here.
diff --git a/tex/src/preamble-project.tex b/tex/src/preamble-project.tex
index 6efdfd7..3d0c0ae 100644
--- a/tex/src/preamble-project.tex
+++ b/tex/src/preamble-project.tex
@@ -61,7 +61,7 @@
%% directly on the command-line with the '--highlight-new' or
%% '--highlight-notes' options
\ifdefined\highlightnew
-\newcommand{\new}[1]{\textcolor{green!60!black}{#1}}
+\newcommand{\new}[1]{\textcolor{green!50!black}{#1}}
\else
\newcommand{\new}[1]{\textcolor{black}{#1}}
\fi
diff --git a/tex/src/references.tex b/tex/src/references.tex
index 1ef3d7d..dc3b816 100644
--- a/tex/src/references.tex
+++ b/tex/src/references.tex
@@ -10,6 +10,50 @@
%% notice and this notice are preserved. This file is offered as-is,
%% without any warranty.
+
+
+
+
+@ARTICLE{peper20,
+ author = {{Peper}, Marius and {Roukema}, Boudewijn F.},
+ title = "{The role of the elaphrocentre in low surface brightness galaxy formation}",
+ journal = {arXiv e-prints},
+ keywords = {Astrophysics - Cosmology and Nongalactic Astrophysics, Astrophysics - Astrophysics of Galaxies},
+ year = 2020,
+ month = oct,
+ eid = {arXiv:2010.03742},
+ pages = {arXiv:2010.03742},
+archivePrefix = {arXiv},
+ eprint = {2010.03742},
+ primaryClass = {astro-ph.CO},
+ adsurl = {https://ui.adsabs.harvard.edu/abs/2020arXiv201003742P},
+ adsnote = {Provided by the SAO/NASA Astrophysics Data System}
+}
+
+
+
+
+
+@ARTICLE{roukema20,
+ author = {{Roukema}, Boudewijn F.},
+ title = "{Anti-clustering in the national SARS-CoV-2 daily infection counts}",
+ journal = {arXiv e-prints},
+ keywords = {Quantitative Biology - Populations and Evolution, Physics - Physics and Society, Statistics - Methodology},
+ year = 2020,
+ month = jul,
+ eid = {arXiv:2007.11779},
+ pages = {arXiv:2007.11779},
+archivePrefix = {arXiv},
+ eprint = {2007.11779},
+ primaryClass = {q-bio.PE},
+ adsurl = {https://ui.adsabs.harvard.edu/abs/2020arXiv200711779R},
+ adsnote = {Provided by the SAO/NASA Astrophysics Data System}
+}
+
+
+
+
+
@ARTICLE{aissi20,
author = {Dylan A\"issi},
title = {Review for Towards Long-term and Archivable Reproducibility},
@@ -69,6 +113,20 @@ archivePrefix = {arXiv},
+@ARTICLE{nust20,
+ author = {Daniel N\"ust and Vanessa Sochat and Ben Marwick and Stephen J. Eglen and Tim Head and Tony Hirst and Benjamin D. Evans},
+ title = {Ten simple rules for writing Dockerfiles for reproducible data science},
+ year = {2020},
+ journal = {PLOS Computational Biology},
+ volume = {16},
+ pages = {e1008316},
+ doi = {10.1371/journal.pcbi.1008316},
+}
+
+
+
+
+
@ARTICLE{konkol20,
author = {{Konkol}, Markus and {N{\"u}st}, Daniel and {Goulier}, Laura},
title = "{Publishing computational research -- A review of infrastructures for reproducible and transparent scholarly communication}",
@@ -124,6 +182,20 @@ archivePrefix = {arXiv},
+@ARTICLE{lofstead19,
+ author = {Jay Lofstead and Joshua Baker and Andrew Younge},
+ title = {Data Pallets: Containerizing Storage for Reproducibility and Traceability},
+ year = {2019},
+ journal = {High Performance Computing. ISC High Performance 2019},
+ volume = {11887},
+ pages = {1},
+ doi = {10.1007/978-3-030-34356-9\_4},
+}
+
+
+
+
+
@ARTICLE{clement19,
author = {Cl\'ement-Fontaine, M\'elanie and Di Cosmo, Roberto and Guerry, Bastien and MOREAU, Patrick and Pellegrini, Fran\c cois},
title = {Encouraging a wider usage of software derived from research},
@@ -314,11 +386,53 @@ archivePrefix = {arXiv},
-@ARTICLE{brinckman19,
-author = "Adam Brinckman and Kyle Chard and Niall Gaffney and Mihael Hategan and Matthew B. Jones and Kacper Kowalik and Sivakumar Kulasekaran and Bertram Ludäscher and Bryce D. Mecum and Jarek Nabrzyski and Victoria Stodden and Ian J. Taylor and Matthew J. Turk and Kandace Turner",
+@ARTICLE{kurtzer17,
+ author = {Gregory M. Kurtzer and Vanessa Sochat and Michael W. Bauer},
+ title = {Singularity: Scientific containers for mobility of compute},
+ journal = {PLoS ONE},
+ year = {2017},
+ volume = {12},
+ pages = {e0177459},
+ doi = {10.1371/journal.pone.0177459},
+}
+
+
+
+
+
+@ARTICLE{meng17,
+ author = {Haiyan Meng and Douglas Thain},
+ title = {Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications},
+ journal = {Procedia Computer Science},
+ year = {2017},
+ volume = {108},
+ pages = {705},
+ doi = {10.1016/j.procs.2017.05.116},
+}
+
+
+
+
+
+@ARTICLE{tommaso17,
+ author = {Paolo Di Tommaso and Maria Chatzou and Evan W Floden and Pablo Prieto Barja and Emilio Palumbo and Cedric Notredame},
+ title = {Nextflow enables reproducible computational workflows},
+ journal = {Nature Biotechnology},
+ year = 2017,
+ volume = 35,
+ pages = 316,
+ doi = {10.1038/nbt.3820},
+}
+
+
+
+
+
+@ARTICLE{brinckman17,
+ author = {Adam Brinckman and Kyle Chard and Niall Gaffney and Mihael Hategan and Matthew B. Jones and Kacper Kowalik and Sivakumar Kulasekaran and Bertram Ludäscher and Bryce D. Mecum and Jarek Nabrzyski and Victoria Stodden and Ian J. Taylor and Matthew J. Turk and Kandace Turner},
title = {Computing environments for reproducibility: Capturing the ``Whole Tale''},
journal = {Future Generation Computer Systems},
- year = 2019,
+ year = 2017,
volume = 94,
pages = 854,
doi = {10.1016/j.future.2017.12.029},
@@ -380,6 +494,40 @@ archivePrefix = {arXiv},
+@ARTICLE{oliveira18,
+ author = {Lu\'is Oliveira and David Wilkinson and Daniel Moss\'e and Bruce Robert Childers},
+ title = {Supporting Long-term Reproducible Software Execution},
+ journal = {Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS'18)},
+ volume = {1},
+ year = {2018},
+ pages = {6},
+ doi = {10.1145/3214239.3214245},
+}
+
+
+
+
+
+@ARTICLE{rule19,
+ author = {{Rule}, Adam and {Birmingham}, Amanda and {Zuniga}, Cristal and
+ {Altintas}, Ilkay and {Huang}, Shih-Cheng and {Knight}, Rob and
+ {Moshiri}, Niema and {Nguyen}, Mai H. and {Brin Rosenthal},
+ Sara and {P{\'e}rez}, Fernando and {Rose}, Peter W.},
+ title = {Ten Simple Rules for Reproducible Research in Jupyter Notebooks},
+ journal = {PLOS Computational Biology},
+ volume = {15},
+ year = 2019,
+ month = jul,
+ pages = {e1007007},
+archivePrefix = {arXiv},
+ eprint = {1810.08055},
+ doi = {10.1371/journal.pcbi.1007007},
+}
+
+
+
+
+
@article{tange18,
author = {Tange, Ole},
title = {GNU Parallel 2018},
@@ -787,6 +935,20 @@ archivePrefix = {arXiv},
+@ARTICLE{chirigati16,
+ author = {Chirigati, Fernando and Rampin, R{\'e}mi and Shasha, Dennis and Freire, Juliana},
+ title = {ReproZip: Computational Reproducibility With Ease},
+ journal = {Proceedings of the 2016 International Conference on Management of Data (SIGMOD 16)},
+ volume = {2},
+ year = {2016},
+ pages = {2085},
+ doi = {10.1145/2882903.2899401},
+}
+
+
+
+
+
@ARTICLE{smith16,
author = {Arfon M. Smith and Daniel S. Katz and Kyle E. Niemeyer},
title = {Software citation principles},
@@ -980,6 +1142,20 @@ archivePrefix = {arXiv},
+@ARTICLE{meng15b,
+ author = {Haiyan Meng and Douglas Thain},
+ title = {Umbrella: A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids},
+ journal = {Proceedings of the 8th International Workshop on Virtualization Technologies in Distributed Computing (VTDC 15)},
+ year = 2015,
+ volume = 1,
+ pages = 23,
+ doi = {10.1145/2755979.2755982},
+}
+
+
+
+
+
@ARTICLE{meng15,
author = {Haiyan Meng and Rupa Kommineni and Quan Pham and Robert Gardner and Tanu Malik and Douglas Thain},
title = {An invariant framework for conducting reproducible computational science},
@@ -1108,6 +1284,28 @@ archivePrefix = {arXiv},
+@ARTICLE{dolfi14,
+ author = {{Dolfi}, M. and {Gukelberger}, J. and {Hehn}, A. and
+ {Imri{\v{s}}ka}, J. and {Pakrouski}, K. and {R{\o}nnow},
+ T.~F. and {Troyer}, M. and {Zintchenko}, I. and {Chirigati},
+ F. and {Freire}, J. and {Shasha}, D.},
+ title = "{A model project for reproducible papers: critical temperature for the Ising model on a square lattice}",
+ journal = {arXiv},
+ year = 2014,
+ month = jan,
+ eid = {arXiv:1401.2000},
+ pages = {arXiv:1401.2000},
+archivePrefix = {arXiv},
+ eprint = {1401.2000},
+ primaryClass = {cs.CE},
+ adsurl = {https://ui.adsabs.harvard.edu/abs/2014arXiv1401.2000D},
+ adsnote = {Provided by the SAO/NASA Astrophysics Data System}
+}
+
+
+
+
+
@ARTICLE{katz14,
author = {Daniel S. Katz},
title = {Transitive Credit as a Means to Address Social and Technological Concerns Stemming from Citation and Attribution of Digital Products},
@@ -1194,6 +1392,21 @@ archivePrefix = {arXiv},
+@ARTICLE{koster12,
+ author = {Johannes K\"oster and Sven Rahmann},
+ title = {Snakemake—a scalable bioinformatics workflow engine},
+ journal = {Bioinformatics},
+ volume = {28},
+ issue = {19},
+ year = {2012},
+ pages = {2520},
+ doi = {10.1093/bioinformatics/bts480},
+}
+
+
+
+
+
@ARTICLE{gronenschild12,
author = {Ed H. B. M. Gronenschild and Petra Habets and Heidi I. L. Jacobs and Ron Mengelers and Nico Rozendaal and Jim van Os and Machteld Marcelis},
title = {The Effects of FreeSurfer Version, Workstation Type, and Macintosh Operating System Version on Anatomical Volume and Cortical Thickness Measurements},