aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--paper.tex107
-rw-r--r--peer-review/1-answer.txt141
-rw-r--r--reproduce/analysis/make/paper.mk7
3 files changed, 142 insertions, 113 deletions
diff --git a/paper.tex b/paper.tex
index ad4aa2b..0e3cf7a 100644
--- a/paper.tex
+++ b/paper.tex
@@ -126,7 +126,7 @@ Decades later, scientists are still held accountable for their results and there
\section{Longevity of existing tools}
\label{sec:longevityofexisting}
\new{Reproducibility is defined as ``obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis'' \cite{fineberg19}.
-Longevity is defined as the time during which a project remains usable.
+Longevity is defined as the length of time during which a project remains usable.
Usability is defined by context: for machines (machine-actionable, or executable files) \emph{and} humans (readability of the source).
Many usage contexts do not involve execution: for example, checking the configuration parameter of a single step of the analysis to re-\emph{use} in another project, or checking the version of used software, or the source of the input data (extracting these from the outputs of execution is not always possible).}
@@ -150,7 +150,12 @@ We will thus focus on Docker here.
\new{It is hypothetically possible to precisely identify the used Docker ``images'' with their checksums (or ``digest'') to re-create an identical OS image later.
However, that is rarely done.}
-Usually images are imported with generic operating system (OS) names; e.g., \cite{mesnard20} uses `\inlinecode{FROM ubuntu:16.04}' \new{(more examples in the appendices)}.
+Usually images are imported with generic operating system (OS) names; e.g., \cite{mesnard20} uses `\inlinecode{FROM ubuntu:16.04}'
+ \ifdefined\noappendix
+ \new{(more examples in the \href{https://doi.org/10.5281/zenodo.\projectzenodoid}{appendices})}.%
+ \else%
+ \new{(more examples: see the appendices (\ref{appendix:existingtools})).}%
+ \fi%
The extracted tarball (from \url{https://partner-images.canonical.com/core/xenial}) is updated almost monthly, and only the most recent five are archived there.
Hence, if the image is built in different months, its output image will contain different OS components.
In the year 2024, when long-term support for this version of Ubuntu expires, the image will be unavailable at the expected URL.
@@ -687,12 +692,12 @@ An advantages of VMs is that they are a single file which can be copied from one
VMs are used by cloud service providers, enabling fully independent operating systems on their large servers (where the customer can have root access).
VMs were used in solutions like SHARE \citeappendix{vangorp11} (which was awarded second prize in the Elsevier Executable Paper Grand Challenge of 2011 \citeappendix{gabriel11}), or in suggested reproducible papers like \citeappendix{dolfi14}.
-However, due to their very large size, they are expensive to maintain, thus leading SHARE to discontinue its services in 2019.
-Also, the URL to the VM that is mentioned in \citeappendix{dolfi14} is no longer accessible (probably due to the same reason of size and archival costs).
+However, due to their very large size, these are expensive to maintain, thus leading SHARE to discontinue its services in 2019.
+The URL to the VM file \texttt{provenance\_machine.ova} that is mentioned in \citeappendix{dolfi14} is not currently accessible (we suspect that this is due to size and archival costs).
\subsubsection{Containers}
\label{appendix:containers}
-Containers also host a binary copy of a running environment, but don't have their own kernel.
+Containers also host a binary copy of a running environment, but do not have their own kernel.
Through a thin layer of low-level system libraries, programs running within a container talk directly with the host operating system kernel.
Otherwise, containers have their own independent software for everything else.
Therefore, they have much less overhead in hardware/CPU access.
@@ -711,10 +716,10 @@ Below we'll review some of the most common container solutions: Docker and Singu
This is a major security flaw that discourages many high performance computing (HPC) facilities from providing it.
\item {\bf\small Singularity:} Singularity \citeappendix{kurtzer17} is a single-image container (unlike Docker which is composed of modular/independent images).
- Although it needs root permissions to be installed on the system (once), it doesn't require root permissions every time it is run.
+ Although it needs root permissions to be installed on the system (once), it does not require root permissions every time it is run.
Its main program is also not a daemon, but a normal program that can be stopped.
- These features make it much easier for HPC administrators to install compared to Docker.
- However, the fact that it requires root access for initial install is still a hindrance for a random project: if its not already present on the HPC, the project can't be run as a normal user.
+ These features make it much safer for HPC administrators to install compared to Docker.
+ However, the fact that it requires root access for the initial install is still a hindrance for a typical project: if Singularity is not already present on the HPC, the user's science project cannot be run by a non-root user.
\item {\bf\small Podman:} Podman uses the Linux kernel containerization features to enable containers without a daemon, and without root permissions.
It has a command-line interface very similar to Docker, but only works on GNU/Linux operating systems.
@@ -726,18 +731,18 @@ Because of this they will be large (many Gigabytes) and expensive to archive, do
Recall the two examples above for VMs in Section \ref{appendix:virtualmachines}. But this is also valid for Docker images, as is clear from Dockerhub's recent decision to delete images of free accounts that haven't been used for more than 6 months.
Meng \& Thain \citeappendix{meng17} also give similar reasons on why Docker images were not suitable in their trials.
-On a more fundamental level, VMs or contains don't store \emph{how} the core environment was built.
+On a more fundamental level, VMs or contains do not store \emph{how} the core environment was built.
This information is usually in a third-party repository, and not necessarily inside container or VM file, making it hard (if not impossible) to track for future users.
This is a major problem when considering reproducibility which is also highlighted as a major issue in terms of long term reproducibility in \citeappendix{oliveira18}.
The example of \cite{mesnard20} was previously mentioned in Section \ref{criteria}.
Another useful example is the \href{https://github.com/benmarwick/1989-excavation-report-Madjedbebe/blob/master/Dockerfile}{\inlinecode{Dockerfile}} of \citeappendix{clarkso15} (published in June 2015) which starts with \inlinecode{FROM rocker/verse:3.3.2}.
When we tried to build it (November 2020), the core downloaded image (\inlinecode{rocker/verse:3.3.2}, with image ``digest'' \inlinecode{sha256:c136fb0dbab...}) was created in October 2018 (long after the publication of that paper).
-Theoretically it is possible to investigate the difference between this new image and the old one that the authors used, but that will require a lot of effort and may not be possible where the changes are not in a third public repository or not under version control.
-In Docker, it is possible to retrieve the precise Docker image with its digest for example \inlinecode{FROM ubuntu:16.04@sha256:XXXXXXX} (where \inlinecode{XXXXXXX} is the digest, uniquely identifying the core image to be used), but we haven't seen it practiced often in ``reproducible'' \inlinecode{Dockerfiles}.
+In principle, it is possible to investigate the difference between this new image and the old one that the authors used, but that would require a lot of effort and may not be possible where the changes are not available in a third public repository or not under version control.
+In Docker, it is possible to retrieve the precise Docker image with its digest for example \inlinecode{FROM ubuntu:16.04@sha256:XXXXXXX} (where \inlinecode{XXXXXXX} is the digest, uniquely identifying the core image to be used), but we haven't seen this often done in existing examples of ``reproducible'' \inlinecode{Dockerfiles}.
The ``digest'' is specific to Docker repositories.
-A more generic/longterm approach to ensure identical core OS componets at a later time is to construct the containers or VMs with fixed/archived versions of the operating system ISO files.
+A more generic/longterm approach to ensure identical core OS components at a later time is to construct the containers or VMs with fixed/archived versions of the operating system ISO files.
ISO files are pre-built binary files with volumes of hundreds of megabytes and not containing their build instructions).
For example the archives of Debian\footnote{\inlinecode{\url{https://cdimage.debian.org/mirror/cdimage/archive/}}} or Ubuntu\footnote{\inlinecode{\url{http://old-releases.ubuntu.com/releases}}} provide older ISO files.
@@ -752,8 +757,8 @@ However, attempting to archive the actual binary container or VM files as a blac
\subsubsection{Independent build in host's file system}
\label{appendix:independentbuild}
The virtual machine and container solutions mentioned above, have their own independent file system.
-Another approach to having an isolated analysis environment is to use the same filesystem as the host, but installing the project's software in a non-standrard, project-specific directory that doesn't interfere with the host.
-Because the environment in this approach can be built in any custom location on the host, this solution generally doesn't require root permissions or extra low-level layers like containers or VMs.
+Another approach to having an isolated analysis environment is to use the same filesystem as the host, but installing the project's software in a non-standrard, project-specific directory that does not interfere with the host.
+Because the environment in this approach can be built in any custom location on the host, this solution generally does not require root permissions or extra low-level layers like containers or VMs.
However, ``moving'' the built product of such solutions from one computer to another is not generally as trivial as containers or VMs.
Examples of such third-party package managers (that are detached from the host OS's package manager) include Nix, GNU Guix, Python's Virtualenv package and Conda, among others.
Because it is highly intertwined with the way software are built and installed, third party package managers are described in more detail as part of Section \ref{appendix:packagemanagement}.
@@ -782,7 +787,7 @@ Note that we are not including package managers that are specific to one languag
\subsubsection{Operating system's package manager}
-The most commonly used package managers are those of the host operating system, for example \inlinecode{apt} or \inlinecode{yum} respectively on Debian-based, or RedHat-based GNU/Linux operating systems, \inlinecode{pkg} in FreeBSD, among many others in other OSs.
+The most commonly used package managers are those of the host operating system, for example \inlinecode{apt} or \inlinecode{yum} respectively on Debian-based, or RedHat-based GNU/Linux operating systems, \inlinecode{pkg} in FreeBSD, among many others in other OSes.
These package managers are tightly intertwined with the operating system: they also include the building and updating of the core kernel and the C library.
Because they are part of the OS, they also commonly require root permissions.
@@ -792,7 +797,7 @@ Hence if two projects need different versions of a software, it is not possible
When a container or virtual machine (see Appendix \ref{appendix:independentenvironment}) is used for each project, it is common for projects to use the containerized operating system's package manager.
However, it is important to remember that operating system package managers are not static: software are updated on their servers.
Hence, simply running \inlinecode{apt install gcc}, will install different versions of the GNU Compiler Collection (GCC) based on the version of the OS and when it has been run.
-Requesting a special version of that special software doesn't fully address the problem because the package managers also download and install its dependencies.
+Requesting a special version of that special software does not fully address the problem because the package managers also download and install its dependencies.
Hence a fixed version of the dependencies must also be specified.
In robust package managers like Debian's \inlinecode{apt} it is possible to fully control (and later reproduce) the build environment of a high-level software.
@@ -800,12 +805,13 @@ Debian also archives all packaged high-level software in its Snapshot\footnote{\
Hence it is indeed theoretically possible to reproduce the software environment only using archived operating systems and their own package managers, but unfortunately we have not seen it practiced in scientific papers/projects.
In summary, the host OS package managers are primarily meant for the operating system components or very low-level components.
-Hence, many robust reproducible analysis solutions (reviewed in Appendix \ref{appendix:existingsolutions}) don't use the host's package manager, but an independent package manager, like the ones below discussed below.
+Hence, many robust reproducible analysis solutions (reviewed in Appendix \ref{appendix:existingsolutions}) do not use the host's package manager, but an independent package manager, like the ones below discussed below.
\subsubsection{Packaging with Linux containerization}
Once a software is packaged as an AppImage\footnote{\inlinecode{\url{https://appimage.org}}}, Flatpak\footnote{\inlinecode{\url{https://flatpak.org}}} or Snap\footnote{\inlinecode{\url{https://snapcraft.io}}} the software's binary product and all its dependencies (not including the core C library) are packaged into one file.
-This makes it very easy to move that single software's built product to newer systems: because the C library is not included, it can fail on older systems.
-However, these are designed for the Linux kernel (using its containerization features) and can thus only be run on GNU/Linux operating systems.
+This makes it very easy to move that single software's built product to newer systems.
+However, because the C library is not included, it can fail on older systems.
+Moreover, these are designed for the Linux kernel (using its containerization features) and can thus only be run on GNU/Linux operating systems.
\subsubsection{Nix or GNU Guix}
\label{appendix:nixguix}
@@ -839,7 +845,7 @@ However, it is not possible to fix the versions of the dependencies through the
This is thoroughly discussed under issue 787 (in May 2019) of \inlinecode{conda-forge}\footnote{\url{https://github.com/conda-forge/conda-forge.github.io/issues/787}}.
In that discussion, the authors of \citeappendix{uhse19} report that the half-life of their environment (defined in a YAML file) is 3 months, and that at least one of their their dependencies breaks shortly after this period.
The main reply they got in the discussion is to build the Conda environment in a container, which is also the suggested solution by \citeappendix{gruning18}.
-However, as described in Appendix \ref{appendix:independentenvironment} containers just hide the reproducibility problem, they don't fix it: containers aren't static and need to evolve (i.e., re-built) with the project.
+However, as described in Appendix \ref{appendix:independentenvironment} containers just hide the reproducibility problem, they do not fix it: containers aren't static and need to evolve (i.e., re-built) with the project.
Given these limitations, \citeappendix{uhse19} are forced to host their conda-packaged software as tarballs on a separate repository.
Conda installs with a shell script that contains a binary-blob (+500 mega bytes, embedded in the shell script).
@@ -852,14 +858,14 @@ However, the resulting environment is not fully independent of the host operatin
However, the host operating system's directories are also appended afterwards.
Therefore, a user, or script may not notice that a software that is being used is actually coming from the operating system, not the controlled Conda installation.
-\item Generally, by default Conda relies heavily on the operating system and doesn't include core analysis components like \inlinecode{mkdir}, \inlinecode{ls} or \inlinecode{cp}.
+\item Generally, by default Conda relies heavily on the operating system and does not include core analysis components like \inlinecode{mkdir}, \inlinecode{ls} or \inlinecode{cp}.
Although they are generally the same between different Unix-like operating systems, they have their differences.
For example \inlinecode{mkdir -p} is a common way to build directories, but this option is only available with GNU Coreutils (default on GNU/Linux systems).
Running the same command within a Conda environment on a macOS for example, will crash.
Important packages like GNU Coreutils are available in channels like conda-forge, but they are not the default.
Therefore, many users may not recognize this, and failing to account for it, will cause unexpected crashes.
-\item Many major Conda packaging ``channels'' (for example the core Anaconda channel, or very popular conda-forge channel) don't include the C library, that a package was built with, as a dependency.
+\item Many major Conda packaging ``channels'' (for example the core Anaconda channel, or very popular conda-forge channel) do not include the C library, that a package was built with, as a dependency.
They rely on the host operating system's C library.
C is the core language of modern operating systems and even higher-level languages like Python or R are written in it, and need it to run.
Therefore if the host operating system's C library is different from the C library that a package was built with, a Conda-packaged program will crash and the project will not be executable.
@@ -869,7 +875,7 @@ However, the resulting environment is not fully independent of the host operatin
However, this is rarely practiced in the main Git repositories of channels like Anaconda and conda-forge: only the name of the high-level prerequisite packages is listed in a package's \inlinecode{meta.yaml} file, which is version-controlled.
Therefore two builds of the package from the same Git repository will result in different tarballs (depending on what prerequisites were present at build time).
In the Conda tarball (that contains the binaries and is not under version control) \inlinecode{meta.yaml} does include the exact versions of most build-time dependencies.
- However, because the different software of one project may have been built at different times, if they depend on different versions of a single software there will be a conflict and the tarball can't be rebuilt, or the project can't be run.
+ However, because the different software of one project may have been built at different times, if they depend on different versions of a single software there will be a conflict and the tarball cannot be rebuilt, or the project cannot be run.
\end{itemize}
As reviewed above, the low-level dependence of Conda on the host operating system's components and build-time conditions, is the primary reason that it is very fast to install (thus making it an attractive tool to software developers who just need to reproduce a bug in a few minutes).
@@ -878,7 +884,7 @@ However, these same factors are major caveats in a scientific scenario, where lo
\subsubsection{Spack}
Spack is a package manager that is also influenced by Nix (similar to GNU Guix), see \citeappendix{gamblin15}.
- But unlike Nix or GNU Guix, it doesn't aim for full, bit-wise reproducibility and can be built without root access in any generic location.
+ But unlike Nix or GNU Guix, it does not aim for full, bit-wise reproducibility and can be built without root access in any generic location.
It relies on the host operating system for the C library.
Spack is fully written in Python, where each software package is an instance of a class, which defines how it should be downloaded, configured, built and installed.
@@ -957,7 +963,7 @@ There are many tools for managing the sequence of jobs, below we'll review the m
The most commonly used workflow system for many researchers is to run the commands, experiment on them and keep the output when they are happy with it.
As an improvement, some also keep a narrative description of what they ran.
Atleast in our personal experience with colleagues, this method is still being heavily practiced by many researchers.
-Given that many researchers don't get trained well in computational methods, this is not surprizing and as discussed in Section \ref{discussion}, we believe that improved literacy in computational methods is the single most important factor for the integrity/reproducibility of modern science.
+Given that many researchers do not get trained well in computational methods, this is not surprizing and as discussed in Section \ref{discussion}, we believe that improved literacy in computational methods is the single most important factor for the integrity/reproducibility of modern science.
\subsubsection{Scripts}
\label{appendix:scripts}
@@ -967,7 +973,7 @@ However, as the series of operations become complex and large, managing the work
For example if 90\% of a long project is already done and a researcher wants to add a followup step, a script will go through all the previous steps (which can take significant time).
In other scenarios, when a small step in the middle of an analysis has to be changed, the full analysis needs to be re-run from the start.
-Scripts have no concept of dependencies, forcing authors to ``temporarily'' comment parts of that they don't want to be re-run (forgetting to un-comment such parts are the most common cause of frustration for the authors and others attempting to reproduce the result).
+Scripts have no concept of dependencies, forcing authors to ``temporarily'' comment parts of that they do not want to be re-run (forgetting to un-comment such parts are the most common cause of frustration for the authors and others attempting to reproduce the result).
Such factors discourage experimentation, which is a critical component of the scientific method.
It is possible to manually add conditionals all over the script to add dependencies or only run certain steps at certain times, but they just make it harder to read, and introduce many bugs themselves.
@@ -996,7 +1002,7 @@ Therefore all three components in a rule must be files on the running filesystem
To decide which operation should be re-done when executed, Make compares the time stamp of the targets and prerequisites.
When any of the prerequisite(s) is newer than a target, the recipe is re-run to re-build the target.
-When all the prerequisites are older than the target, that target doesn't need to be rebuilt.
+When all the prerequisites are older than the target, that target does not need to be rebuilt.
The recipe can contain any number of commands, they should just all start with a \inlinecode{TAB}.
Going deeper into the syntax of Make is beyond the scope of this paper, but we recommend interested readers to consult the GNU Make manual for a nice introduction\footnote{\inlinecode{\url{http://www.gnu.org/software/make/manual/make.pdf}}}.
@@ -1016,7 +1022,7 @@ Bazel\footnote{\inlinecode{\url{https://bazel.build}}} is a high-level job organ
Scons is a Python package for managing operations outside of Python (in contrast to CGAT-core, discussed below, which only organizes Python functions).
In many aspects it is similar to Make, for example it is managed through a `SConstruct' file.
Like a Makefile, SConstruct is also declarative: the running order is not necessarily the top-to-bottom order of the written operations within the file (unlike the imperative paradigm which is common in languages like C, Python, or FORTRAN).
-However, unlike Make, SCons doesn't use the file modification date to decide if it should be remade.
+However, unlike Make, SCons does not use the file modification date to decide if it should be remade.
SCons keeps the MD5 hash of all the files (in a hidden binary file) to check if the contents have changed.
SCons thus attempts to work on a declarative file with an imperative language (Python).
@@ -1041,7 +1047,7 @@ Another drawback with this workflow manager is that Python is a very high-level
\subsubsection{Guix Workflow Language (GWL)}
GWL is based on the declarative language that GNU Guix uses for package management (see Appendix \ref{appendix:packagemanagement}), which is itself based on the general purpose Scheme language.
It is closely linked with GNU Guix and can even install the necessary software needed for each individual process.
-Hence in the GWL paradigm, software installation and usage doesn't have to be separated.
+Hence in the GWL paradigm, software installation and usage does not have to be separated.
GWL has two high-level concepts called ``processes'' and ``workflows'' where the latter defines how multiple processes should be executed together.
In conclusion, shell scripts and Make are very common and extensively used by users of Unix-based OSs (which are most commonly used for computations).
@@ -1079,8 +1085,8 @@ However, editors that can execute or debug the source (like GNU Emacs), just run
With text editors, the final edited file is independent of the actual editor and can be further edited with another editor, or executed without it.
This is a very important feature that is not commonly present for other solutions mentioned below.
Another very important advantage of advanced text editors like GNU Emacs or Vi(m) is that they can also be run without a graphic user interface, directly on the command-line.
-This feature is critical when working on remote systems, in particular high performance computing (HPC) facilities that don't provide a graphic user interface.
-Also, the commonly used minimalistic containers don't include a graphic user interface.
+This feature is critical when working on remote systems, in particular high performance computing (HPC) facilities that do not provide a graphic user interface.
+Also, the commonly used minimalistic containers do not include a graphic user interface.
\subsubsection{Integrated Development Environments (IDEs)}
To facilitate the development of source files, IDEs add software building and running environments as well as debugging tools to a plain text editor.
@@ -1169,7 +1175,7 @@ On the other hand, the dependency graph of tools written in high-level languages
For example see Figure 1 of \citeappendix{alliez19}, it shows the dependencies and their inter-dependencies for Matplotlib (a popular plotting module in Python).
Acceptable version intervals between the dependencies will cause incompatibilities in a year or two, when a robust package manager is not used (see Appendix \ref{appendix:packagemanagement}).
-Since a domain scientist doesn't always have the resources/knowledge to modify the conflicting part(s), many are forced to create complex environments with different versions of Python and pass the data between them (for example just to use the work of a previous PhD student in the team).
+Since a domain scientist does not always have the resources/knowledge to modify the conflicting part(s), many are forced to create complex environments with different versions of Python and pass the data between them (for example just to use the work of a previous PhD student in the team).
This greatly increases the complexity of the project, even for the principal author.
A good reproducible workflow can account for these different versions.
However, when the actual workflow system (not the analysis software) is written in a high-level language this will cause a major problem.
@@ -1192,7 +1198,7 @@ Once they have mastered one version of a language (mostly in the early stages of
The inertia of programming languages is very strong.
This is natural, because they have their own science field to focus on, and re-writing their high-level analysis toolkits (which they have curated over their career and is often only readable/usable by themselves) in newer languages every few years requires too much investment and time.
-When this investment is not possible, either the mentee has to use the mentor's old method (and miss out on all the new tools, which they need for the future job prospects), or the mentor has to avoid implementation details in discussions with the mentee, because they don't share a common language.
+When this investment is not possible, either the mentee has to use the mentor's old method (and miss out on all the new tools, which they need for the future job prospects), or the mentor has to avoid implementation details in discussions with the mentee, because they do not share a common language.
The authors of this paper have personal experiences in both mentor/mentee relational scenarios.
This failure to communicate in the details is a very serious problem, leading to the loss of valuable inter-generational experience.
@@ -1231,24 +1237,25 @@ Other studies have also attempted to review existing reproducible solutions, for
\subsection{Suggested rules, checklists, or criteria}
Before going into the various implementations, it is also useful to review existing suggested rules, checklists or criteria for computationally reproducible research.
-All the cases below are primarily targetted to immediate reproducibility and don't consider longevity explicitly.
-Therefore, they lack a strong/clear completeness criteria (mainly merely suggesting to record versions, or the ultimate suggestion to store the full binary OS in a binary VM or container is problematic (as mentioned in \ref{appendix:independentenvironment} and \citeappendix{oliveira18}).
+All the cases below are primarily targetted to immediate reproducibility and do not consider longevity explicitly.
+Therefore, they lack a strong/clear completeness criterion (they mainly only suggest, rather than require, the recording of versions, and their ultimate suggestion of storing the full binary OS in a binary VM or container is problematic (as mentioned in \ref{appendix:independentenvironment} and \citeappendix{oliveira18}).
-Sandve et al. \citeappendix{sandve13} propose ``ten simple rule for reproducible computational research'' that can be applied in any project.
-Generally, the are very similar to the criteria proposed here and follow a similar spirit but they don't provide any actual research papers following all those points, or a proof of concept.
-The Popper convention \citeappendix{jimenez17} also provides a set of principles that are indeed generally useful and some are shared with the criteria here (for example automatic validation, and like Maneage, they suggest having a template for new users).
-but they don't include completness or attention to longevity as mentioned above (Popper itself is written in Python with many dependencies, and its core operating language has already changed once).
+Sandve et al. \citeappendix{sandve13} propose ``ten simple rules for reproducible computational research'' that can be applied in any project.
+Generally, these are very similar to the criteria proposed here and follow a similar spirit, but they do not provide any actual research papers following up all those points, nor do they provide a proof of concept.
+The Popper convention \citeappendix{jimenez17} also provides a set of principles that are indeed generally useful, among which some are common to the criteria here (for example, automatic validation, and, as in Maneage, the authors suggest providing a template for new users),
+but the authors do not include completeness as a criterion nor pay attention to longevity (Popper itself is written in Python with many dependencies, and its core operating language has already changed once).
For more on Popper, please see Section \ref{appendix:popper}.
For improved reproducibility in Jupyter notebook users, \citeappendix{rule19} propose ten rules to improve reproducibility and also provide links to example implementations.
-They can be very useful for users of Jupyter and not generic to any computational project.
-Some criteria (which are indeed very good in a more general context) don't directly relate to reproducibility, for example their Rule 1: ``Tell a Story for an Audience''.
+These can be very useful for users of Jupyter, but are not generic for non-Jupyter-based computational projects.
+Some criteria (which are indeed very good in a more general context) do not directly relate to reproducibility, for example their Rule 1: ``Tell a Story for an Audience''.
Generally, as reviewed in Sections \ref{sec:longevityofexisting} and \ref{appendix:jupyter}, Jupyter itself has many issues regarding reproducibility.
To create Docker images, N\"ust et al. propose ``ten simple rules'' in \citeappendix{nust20}.
-They do recommend some issues that can indeed help increase the quality of Docker images and their production/usage, for example their rule 7 to ``mount datasets at run time'' to separate the computational evironment from the data.
-However, like before, the long term reproducibility of the images is not a concern, for example in they recommend using base operating systems only with a version like \inlinecode{ubuntu:18.04}, which was clearly shown to have longevity issues in Section \ref{sec:longevityofexisting}.
-Furthermore, in their proof of concept Dockerfile (listing 1), \inlinecode{rocker} is used with a tag (not a digest), which can be problematic (as shown in Section \ref{appendix:containers}).
+They recommend some issues that can indeed help increase the quality of Docker images and their production/usage, such as their rule 7 to ``mount datasets [only] at run time'' to separate the computational environment from the data.
+However, long-term reproducibility of the images is not included as a criterion by these authors.
+For example, they recommend using base operating systems, with version identification limited to a single brief identifier such as \inlinecode{ubuntu:18.04}, which has a serious problem with longevity issues (Section \ref{sec:longevityofexisting}).
+Furthermore, in their proof-of-concept Dockerfile (listing 1), \inlinecode{rocker} is used with a tag (not a digest), which can be problematic due to the high risk of ambiguity (as discussed in Section \ref{appendix:containers}).
\subsection{Reproducible Electronic Documents, RED (1992)}
\label{appendix:red}
@@ -1323,7 +1330,7 @@ It is an extension of the Jupyter notebook (see Appendix \ref{appendix:editors})
However, the wrapper modules just call an existing tool on the host system.
Given that each server may have its own set of installed software, the analysis may differ (or crash) when run on different GenePattern servers, hampering reproducibility.
-%% GenePattern shutdown announcement (although as of November 2020, it doesn't open any more!): https://www.genepattern.org/blog/2019/10/01/the-genomespace-project-is-ending-on-november-15-2019
+%% GenePattern shutdown announcement (although as of November 2020, it does not open any more!): https://www.genepattern.org/blog/2019/10/01/the-genomespace-project-is-ending-on-november-15-2019
The primary GenePattern server was active since 2008 and had 40,000 registered users with 2000 to 5000 jobs running every week \citeappendix{reich17}.
However, it was shut down on November 15th 2019 due to end of funding.
All processing with this sever has stopped, and any archived data on it has been deleted.
@@ -1371,7 +1378,7 @@ Besides the fact that it is no longer maintained, VisTrails didn't control the s
\label{appendix:galaxy}
Galaxy\footnote{\inlinecode{\url{https://galaxyproject.org}}} is a web-based Genomics workbench \citeappendix{goecks10}.
-The main user interface are ``Galaxy Pages'', which doesn't require any programming: users simply use abstract ``tools'' which are a wrappers over command-line programs.
+The main user interface are ``Galaxy Pages'', which does not require any programming: users simply use abstract ``tools'' which are a wrappers over command-line programs.
Therefore the actual running version of the program can be hard to control across different Galaxy servers.
Besides the automatically generated metadata of a project (which include version control, or its history), users can also tag/annotate each analysis step, describing its intent/purpose.
Besides some small differences Galaxy seems very similar to GenePattern (Appendix \ref{appendix:genepattern}), so most of the same points there apply here too (including the very large cost of maintining such a system).
@@ -1509,11 +1516,11 @@ The captured environment can be viewed in plain text or a web interface.
Sumatra also provides \LaTeX/Sphinx features, which will link the paper with the project's Sumatra database.
This enables researchers to use a fixed version of a project's figures in the paper, even at later times (while the project is being developed).
-The actual code that Sumatra wraps around, must itself be under version control, and it doesn't run if there is non-committed changes (although its not clear what happens if a commit is amended).
+The actual code that Sumatra wraps around, must itself be under version control, and it does not run if there is non-committed changes (although its not clear what happens if a commit is amended).
Since information on the environment has been captured, Sumatra is able to identify if it has changed since a previous run of the project.
Therefore Sumatra makes no attempt at storing the environment of the analysis as in Sciunit (see Appendix \ref{appendix:sciunit}), but its information.
Sumatra thus needs to know the language of the running program and isn't generic.
-It just captures the environment, it doesn't store \emph{how} that environment was built.
+It just captures the environment, it does not store \emph{how} that environment was built.
@@ -1685,7 +1692,7 @@ Hence for complex data analysis operations with involve thousands of steps, it i
%%\begin{itemize}
%%\item \url{https://sites.nationalacademies.org/cs/groups/pgasite/documents/webpage/pga_180684.pdf}, does the following classification of tools:
%% \begin{itemize}
-%% \item Research environments: \href{http://vcr.stanford.edu}{Verifiable computational research} (discussed above), \href{http://www.sciencedirect.com/science/article/pii/S1877050911001207}{SHARE} (a Virtual Machine), \href{http://www.codeocean.com}{Code Ocean} (discussed above), \href{http://jupyter.org}{Jupyter} (discussed above), \href{https://yihui.name/knitr}{knitR} (based on Sweave, dynamic report generation with R), \href{https://cran.r-project.org}{Sweave} (Function in R, for putting R code within \LaTeX), \href{http://www.cyverse.org}{Cyverse} (proprietary web tool with servers for bioinformatics), \href{https://nanohub.org}{NanoHUB} (collection of Simulation Programs for nanoscale phenomena that run in the cloud), \href{https://www.elsevier.com/about/press-releases/research-and-journals/special-issue-computers-and-graphics-incorporates-executable-paper-grand-challenge-winner-collage-authoring-environment}{Collage Authoring Environment} (discussed above), \href{https://osf.io/ns2m3}{SOLE} (discussed above), \href{https://osf.io}{Open Science framework} (a hosting webpage), \href{https://www.vistrails.org}{VisTrails} (discussed above), \href{https://pypi.python.org/pypi/Sumatra}{Sumatra} (discussed above), \href{http://software.broadinstitute.org/cancer/software/genepattern}{GenePattern} (reviewed above), Image Processing On Line (\href{http://www.ipol.im}{IPOL}) journal (publishes full analysis scripts, but doesn't deal with dependencies), \href{https://github.com/systemslab/popper}{Popper} (reviewed above), \href{https://galaxyproject.org}{Galaxy} (reviewed above), \href{http://torch.ch}{Torch.ch} (finished project for neural networks on images), \href{http://wholetale.org/}{Whole Tale} (discussed above).
+%% \item Research environments: \href{http://vcr.stanford.edu}{Verifiable computational research} (discussed above), \href{http://www.sciencedirect.com/science/article/pii/S1877050911001207}{SHARE} (a Virtual Machine), \href{http://www.codeocean.com}{Code Ocean} (discussed above), \href{http://jupyter.org}{Jupyter} (discussed above), \href{https://yihui.name/knitr}{knitR} (based on Sweave, dynamic report generation with R), \href{https://cran.r-project.org}{Sweave} (Function in R, for putting R code within \LaTeX), \href{http://www.cyverse.org}{Cyverse} (proprietary web tool with servers for bioinformatics), \href{https://nanohub.org}{NanoHUB} (collection of Simulation Programs for nanoscale phenomena that run in the cloud), \href{https://www.elsevier.com/about/press-releases/research-and-journals/special-issue-computers-and-graphics-incorporates-executable-paper-grand-challenge-winner-collage-authoring-environment}{Collage Authoring Environment} (discussed above), \href{https://osf.io/ns2m3}{SOLE} (discussed above), \href{https://osf.io}{Open Science framework} (a hosting webpage), \href{https://www.vistrails.org}{VisTrails} (discussed above), \href{https://pypi.python.org/pypi/Sumatra}{Sumatra} (discussed above), \href{http://software.broadinstitute.org/cancer/software/genepattern}{GenePattern} (reviewed above), Image Processing On Line (\href{http://www.ipol.im}{IPOL}) journal (publishes full analysis scripts, but does not deal with dependencies), \href{https://github.com/systemslab/popper}{Popper} (reviewed above), \href{https://galaxyproject.org}{Galaxy} (reviewed above), \href{http://torch.ch}{Torch.ch} (finished project for neural networks on images), \href{http://wholetale.org/}{Whole Tale} (discussed above).
%% \item Workflow systems: \href{http://www.taverna.org.uk}{Taverna}, \href{http://www.wings-workflows.org}{Wings}, \href{https://pegasus.isi.edu}{Pegasus}, \href{http://www.pgbovine.net/cde.html}{CDE}, \href{http://binder.org}{Binder}, \href{http://wiki.datakurator.org/wiki}{Kurator}, \href{https://kepler-project.org}{Kepler}, \href{https://github.com/everware}{Everware}, \href{http://cds.nyu.edu/projects/reprozip}{Reprozip}.
%% \item Dissemination platforms: \href{http://researchcompendia.org}{ResearchCompendia}, \href{https://datacenterhub.org/about}{DataCenterHub}, \href{http://runmycode.org}, \href{https://www.chameleoncloud.org}{ChameleonCloud}, \href{https://occam.cs.pitt.edu}{Occam}, \href{http://rcloud.social/index.html}{RCloud}, \href{http://thedatahub.org}{TheDataHub}, \href{http://www.ahay.org/wiki/Package_overview}{Madagascar}.
%% \end{itemize}
@@ -1711,7 +1718,7 @@ Hence for complex data analysis operations with involve thousands of steps, it i
%% \item \citeappendix{stodden18}: Effectiveness of journal policy on computational reproducibility.
%% \item \citeappendix{fanelli18} is critical of the narrative that there is a ``reproducibility crisis'', and that its important to empower scientists.
%% \item \citeappendix{burrell18} open software (in particular Python) in heliophysics.
-%% \item \citeappendix{allen18} show that many papers don't cite software.
+%% \item \citeappendix{allen18} show that many papers do not cite software.
%% \item \citeappendix{zhang18} explicity say that they won't release their code: ``We opt not to make the code used for the chemical evo-lution modeling publicly available because it is an important asset of the re-searchers’ toolkits''
%% \item \citeappendix{jones19} make genuine effort at reproducing every number in the paper (using Docker, Conda, and CGAT-core, and Binder), but they can ultimately only release scripts. They claim its not possible to reproduce that level of reproducibility, but here we show it is.
%% \item LSST uses Kubernetes and docker for reproducibility \citeappendix{banek19}.
diff --git a/peer-review/1-answer.txt b/peer-review/1-answer.txt
index 9c6bbd9..91cf3d8 100644
--- a/peer-review/1-answer.txt
+++ b/peer-review/1-answer.txt
@@ -45,8 +45,8 @@ ANSWER:
3. [Associate Editor] Some terminology is not well-defined
(e.g. longevity).
-ANSWER: It has now been clearly defined in the first paragraph of Section
-II. With this definition, the main argument of the paper is much clearer,
+ANSWER: Longevity has now been defined in the first paragraph of Section
+II. With this definition, the main argument of the paper is clearer,
thank you (and thank you to the referees for highlighting this).
------------------------------
@@ -59,7 +59,9 @@ thank you (and thank you to the referees for highlighting this).
categorization to characterize their longevity.
ANSWER: The longevity of the general tools reviewed in Section II is now
-mentioned immediately after each (highlighted in green).
+mentioned immediately after each (VMs, SHARE: discontinued in 2019;
+Docker: 6 months; python-dependent package managers: a few years;
+Jupyter notebooks: shortest longevity non-core python dependency).
------------------------------
@@ -70,7 +72,7 @@ mentioned immediately after each (highlighted in green).
5. [Associate Editor] Background and related efforts need significant
improvement. (See below.)
-ANSWER: This has been done, as mentioned in (1).
+ANSWER: This has been done, as mentioned in (1.) above.
------------------------------
@@ -81,7 +83,7 @@ ANSWER: This has been done, as mentioned in (1).
6. [Associate Editor] There is consistency among the reviews that
related work is particularly lacking.
-ANSWER: This has been done, as mentioned in (1).
+ANSWER: This has been done, as mentioned in (1.) above.
------------------------------
@@ -93,9 +95,10 @@ ANSWER: This has been done, as mentioned in (1).
explaining how it deals with the nagging problem of running on CPU
vs. different architectures.
-ANSWER: The CPU architecture of the running system is now reported in the
-"Acknowledgments" section and a description of the problem and its solution
-in Maneage is also added in the "Proof of concept: Maneage" Section.
+ANSWER: The CPU architecture of the running system is now reported in
+the "Acknowledgments" section and a description of the problem and its
+solution in Maneage is also added and illustrated in the "Proof of
+concept: Maneage" Section.
------------------------------
@@ -109,10 +112,11 @@ in Maneage is also added in the "Proof of concept: Maneage" Section.
architectures. Is CI employed in any way in the work presented in
this article?
-ANSWER: CI has been added in the discussion as one solution to find
-breaking points in operating system updates and new/different
-architectures. For the core Maneage branch, we have defined task #15741 [1]
-to add CI on many architectures in the near future.
+ANSWER: CI has been added in the discussion section (V) as one
+solution to find breaking points in operating system updates and
+new/different architectures. For the core Maneage branch, we have
+defined task #15741 [1] to add CI on many architectures in the near
+future.
[1] http://savannah.nongnu.org/task/?15741
@@ -147,9 +151,13 @@ README-hacking.md webpage into smaller pages that can be entered.
related work to help readers understand the clear need for a new
approach, if this is being presented as new/novel.
-ANSWER: Thank you for highlighting this important point. We saw that its
-necessary to contrast our proof of concept demonstration more directly with
-Maneage. Two paragraphs have been added in Sections II and IV for this.
+ANSWER: Thank you for highlighting this important point. We saw that
+it is necessary to contrast our Maneage proof-of-concept demonstration
+more directly against the Jupyter notebook type of approach. Two
+paragraphs have been added in Sections II and IV to clarify this (our
+criteria require and build in more modularity and longevity than
+Jupyter).
+
------------------------------
@@ -188,33 +196,32 @@ ANSWER:
and the related provenance work that has already been done and can be
exploited using these criteria and our proof of concept is indeed very
large. However, the 6250 word-count limit is very tight and if we add
- more on it in this length, we would have to remove more directly
- relevant points. Hopefully this can be the subject of a follow up
- paper.
+ more on it in this length, we would have to remove points of higher priority.
+ Hopefully this can be the subject of a follow-up paper.
3. A review of ReproZip is in Appendix B.
4. A review of Occam is in Appendix B.
5. A review of Popper is in Appendix B.
-6. A review of Whole tale is in Appendix B.
+6. A review of Whole Tale is in Appendix B.
7. A review of Snakemake is in Appendix A.
-8. CWL and WDL are described in Appendix A (job management).
-9. Nextflow is described in Appendix A (job management).
+8. CWL and WDL are described in Appendix A (Job management).
+9. Nextflow is described in Appendix A (Job management).
10. Sumatra is described in Appendix B.
-11. Podman is mentioned in Appendix A (containers).
-12. AppImage is mentioned in Appendix A (package management).
-13. Flatpak is mentioned in Appendix A (package management).
-14. nbdev and jupytext are high-level tools to generate documentation and
+11. Podman is mentioned in Appendix A (Containers).
+12. AppImage is mentioned in Appendix A (Package management).
+13. Flatpak is mentioned in Appendix A (Package management).
+14. Snap is mentioned in Appendix A (Package management).
+15. nbdev and jupytext are high-level tools to generate documentation and
packaging custom code in Conda or pypi. High-level package managers
like Conda and Pypi have already been thoroughly reviewed in Appendix A
- for their longevity issues, so we feel there is no need to include
- these.
-15. Bazel has been mentioned in Appendix A (job management).
-16. Debian's reproducible builds is only for ensuring that software
- packaged for Debian are bitwise reproducible. As mentioned in the
- discussion of this paper, the bitwise reproducibility of software is
- not an issue in the context discussed here, the reproducibility of the
+ for their longevity issues, so we feel that there is no need to
+ include these.
+16. Bazel is mentioned in Appendix A (job management).
+17. Debian's reproducible builds are only designed for ensuring that software
+ packaged for Debian is bitwise reproducible. As mentioned in the
+ discussion section of this paper, the bitwise reproducibility of software is
+ not an issue in the context discussed here; the reproducibility of the
relevant output data of the software is the main issue.
-
------------------------------
@@ -231,14 +238,18 @@ ANSWER:
* Executable/reproducible paper articles and original concepts
ANSWER: Thank you for highlighting these points. Appendix B starts with a
-subsection titled "suggested rules, checklists or criteria" that review of
-existing criteria. That include the proposed sources here (and others).
+subsection titled "suggested rules, checklists or criteria" with a review of
+existing sets of criteria. This subsection includes the sources proposed
+by the reviewer [Sandve et al; Rule et al; Nust et al] (and others).
-arXiv:1401.2000 has been added in Appendix A as an example paper using
+ArXiv:1401.2000 has been added in Appendix A as an example paper using
virtual machines. We thank the referee for bringing up this paper, because
-the link to the VM provided in the paper no longer works (the file has been
-removed on the server). Therefore added with SHARE, it very nicely
-highlighting our main issue with binary containers or VMs and their lack of
+the link to the VM provided in the paper no longer works (the URL
+http://archive.comp-phys.org/provenance_challenge/provenance_machine.ova
+redirects to
+https://share.phys.ethz.ch//~alpsprovenance_challenge/provenance_machine.ova
+which gives a 'Not Found' html response). Together with SHARE, this very nicely
+highlights our main issue with binary containers or VMs: their lack of
longevity.
------------------------------
@@ -261,17 +272,18 @@ longevity.
requires significant resources to translate or rewrite every
few years."
-ANSWER: They have been clarified in the highlighted parts of the text:
+ANSWER: These points have been clarified in the highlighted parts of the text:
-1. Many examples have been given throughout the newly added appendices. To
- avoid confusion in the main body of the paper, we have removed the "we
- have surveyed" part. It is already mentioned above it that a large
- survey of existing methods/solutions is given in the appendices.
+1. Many examples have been given throughout the newly added
+ appendices. To avoid confusion in the main body of the paper, we
+ have removed the "we have surveyed" part. It is already mentioned
+ above this point in the text that a large survey of existing
+ methods/solutions is given in the appendices.
2. Due to the thorough discussion of this issue in the appendices with
precise examples, this line has been removed to allow space for the
other points raised by the referees. The main point (high cost of
- keeping binaries) is aldreay abundantly clear.
+ keeping binaries) is already abundantly clear.
On a similar topic, Dockerhub's recent announcement that inactive images
(for over 6 months) will be deleted has also been added. The announcemnt
@@ -280,8 +292,9 @@ ANSWER: They have been clarified in the highlighted parts of the text:
https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates
3. A small statement has been added, reminding the readers that almost all
- free software projects are built with Make (note that CMake is just a
- high-level wrapper over Make: it finally produces a 'Makefile').
+ free software projects are built with Make (CMake is popular, but it is just a
+ high-level wrapper over Make: it finally produces a 'Makefile'; practical
+ usage of CMake generally obliges the user to understand Make).
4. The example of Python 2 has been added.
@@ -299,23 +312,24 @@ ANSWER: They have been clarified in the highlighted parts of the text:
papers have been written on this topic). Note that CI is
well-established technology (e.g. Jenkins is almost 10 years old).
-ANSWER: Thank you for raising this issue. We had initially planned to add
-this issue also, but like many discussion points, we were forced to remove
+ANSWER: Thank you for raising these issues. We had initially planned to
+discuss CIs, but like many discussion points, we were forced to remove
it before the first submission due to the very tight word-count limit. We
have now added a sentence on CI in the discussion.
-On the initial note, indeed, the "executable" files of Bash, Git or Make
-are not bitwise reproducible/identical on different systems. However, as
-mentioned in the discussion, we are concerned with the _output_ of the
-software's executable file, _after_ the execution of its job. We (or any
-user of Bash) is not interested in the executable file itself. The
-reproducibility of the binary file only becomes important if a bug is found
-(very rare for common usage in such core software of the OS). Hence even
-though the compiled binary files of specific versions of Git, Bash or Make
+On the issue of Bash/Git/Make, indeed, the _executable_ Bash, Git and
+Make binaries are not bitwise reproducible/identical on different
+systems. However, as mentioned in the discussion, we are concerned
+with the _output_ of the software's executable file, _after_ the
+execution of its job. We (or any user of Bash) is not interested in
+the executable file itself. The reproducibility of the binary file
+only becomes important if a significant bug is found (very rare for
+ordinary usage of such core software of the OS). Hence, even though
+the compiled binary files of specific versions of Git, Bash or Make
will not be bitwise reproducible/identical on different systems, their
-outputs are exactly reproducible: 'git describe' or Bash's 'for' loop will
-have the same output on GNU/Linux, macOS or FreeBSD (that produce bit-wise
-different executables).
+scientific outputs are exactly reproducible: 'git describe' or Bash's
+'for' loop will have the same output on GNU/Linux, macOS/Darwin or
+FreeBSD (despite having bit-wise different executables).
------------------------------
@@ -326,9 +340,10 @@ different executables).
15. [Reviewer 1] Criterion has been proposed previously. Maneage itself
provides little novelty (see comments below).
-ANSWER: The previously suggested criteria that were mentioned are reviewed
-in the newly added Appendix B, and the novelty/necessity of the proposed
-criteria is shown by comparison there.
+ANSWER: The previously suggested sets of criteria that were listed by
+Reviewer 1 are reviewed by us in the newly added Appendix B, and the
+novelty and advantages of our proposed criteria are contrasted there
+with the earlier sets of criteria.
------------------------------
diff --git a/reproduce/analysis/make/paper.mk b/reproduce/analysis/make/paper.mk
index 3bb6a8c..e4eeb59 100644
--- a/reproduce/analysis/make/paper.mk
+++ b/reproduce/analysis/make/paper.mk
@@ -165,6 +165,13 @@ $(texbdir)/paper.bbl: tex/src/references.tex $(mtexdir)/dependencies-bib.tex \
| sed -e 's/\([^,]\) *\( \|EOLINE\) *\\eprint/\1, \\eprint/g' \
| sed -e 's/\([^,]\) *\( \|EOLINE\) *\\doi/\1, \\doi/g' \
| sed -e 's/EOLINE/\n/g' > paper.bbl
+ cp appendix.bbl appendix-tmp.bbl \
+ && sed -e "s/\'/EOLINE/g" appendix-tmp.bbl \
+ | tr -d '\n' \
+ | sed -e 's/\([0-9]\)\( \|EOLINE\)}/\1}/g' \
+ | sed -e 's/\([^,]\) *\( \|EOLINE\) *\\eprint/\1, \\eprint/g' \
+ | sed -e 's/\([^,]\) *\( \|EOLINE\) *\\doi/\1, \\doi/g' \
+ | sed -e 's/EOLINE/\n/g' > appendix.bbl
# The pre-final run of LaTeX after 'paper.bbl' was created.
latex -shell-escape -halt-on-error "$$p"/paper.tex