aboutsummaryrefslogtreecommitdiff
path: root/tex/src/appendix-existing-solutions.tex
diff options
context:
space:
mode:
Diffstat (limited to 'tex/src/appendix-existing-solutions.tex')
-rw-r--r--tex/src/appendix-existing-solutions.tex142
1 files changed, 82 insertions, 60 deletions
diff --git a/tex/src/appendix-existing-solutions.tex b/tex/src/appendix-existing-solutions.tex
index e802644..d149c77 100644
--- a/tex/src/appendix-existing-solutions.tex
+++ b/tex/src/appendix-existing-solutions.tex
@@ -21,53 +21,55 @@
\section{Survey of common existing reproducible workflows}
\label{appendix:existingsolutions}
-As reviewed in the introduction, the problem of reproducibility has received considerable attention over the last three decades and various solutions have already been proposed.
+The problem of reproducibility has received considerable attention over the last three decades and various solutions have already been proposed.
The core principles that many of the existing solutions (including Maneage) aim to achieve are nicely summarized by the FAIR principles \citeappendix{wilkinson16}.
-In this appendix, some of the solutions are reviewed.
+In this appendix, \emph{some} of the solutions are reviewed.
+We are not just reviewing solutions that can be used today.
+The main focus of this paper is longevity, therefore we also spent considerable time on finding and inspecting solutions that have been aborted, discontinued or abandoned.
+
The solutions are based on an evolving software landscape, therefore they are ordered by date: when the project has a web page, the year of its first release is used for the sorting.
Otherwise their paper's publication year is used.
-
For each solution, we summarize its methodology and discuss how it relates to the criteria proposed in this paper.
Freedom of the software/method is a core concept behind scientific reproducibility, as opposed to industrial reproducibility where a black box is acceptable/desirable.
Therefore proprietary solutions like Code Ocean\footnote{\inlinecode{\url{https://codeocean.com}}} or Nextjournal\footnote{\inlinecode{\url{https://nextjournal.com}}} will not be reviewed here.
Other studies have also attempted to review existing reproducible solutions, for example, see \citeappendix{konkol20}.
-
-
+We have tried our best to test and read through the documentation of almost all reviewed solutions to a sufficient level.
+However, due to time constraints, it is inevitable that we may have missed some aspects the solutions, or incorrectly interpreted their behavior and outputs.
+In this case, please let us know and we will correct it in the text on the paper's Git repository and publish the updated PDF on \href{https://doi.org/10.5281/zenodo.3872247}{zenodo.3872247} (this is the version-independent DOI, that always points to the most recent Zenodo upload).
\subsection{Suggested rules, checklists, or criteria}
-Before going into the various implementations, it is also useful to review existing suggested rules, checklists, or criteria for computationally reproducible research.
-
-All the cases below are primarily targeted to immediate reproducibility and do not consider longevity explicitly.
-Therefore, they lack a strong/clear completeness criterion (they mainly only suggest, rather than require, the recording of versions, and their ultimate suggestion of storing the full binary OS in a binary VM or container is problematic (as mentioned in \ref{appendix:independentenvironment} and \citeappendix{oliveira18}).
+Before going into the various implementations, it is useful to review some existing suggested rules, checklists, or criteria for computationally reproducible research.
Sandve et al. \citeappendix{sandve13} propose ``ten simple rules for reproducible computational research'' that can be applied in any project.
Generally, these are very similar to the criteria proposed here and follow a similar spirit, but they do not provide any actual research papers following up all those points, nor do they provide a proof of concept.
-The Popper convention \citeappendix{jimenez17} also provides a set of principles that are indeed generally useful, among which some are common to the criteria here (for example, automatic validation, and, as in Maneage, the authors suggest providing a template for new users), but the authors do not include completeness as a criterion nor pay attention to longevity (Popper itself is written in Python with many dependencies, and its core operating language has already changed once).
+The Popper convention \citeappendix{jimenez17} also provides a set of principles that are indeed generally useful, among which some are common to the criteria here (for example, automatic validation, and, as in Maneage, the authors suggest providing a template for new users), but the authors do not include completeness as a criterion nor pay attention to longevity: Popper has already changed its core workflow language once and is written in Python with many dependencies that evolve fast, see \ref{appendix:highlevelinworkflow}.
For more on Popper, please see Section \ref{appendix:popper}.
-For improved reproducibility in Jupyter notebook users, \citeappendix{rule19} propose ten rules to improve reproducibility and also provide links to example implementations.
+For improved reproducibility Jupyter notebooks, \citeappendix{rule19} propose ten rules and also provide links to example implementations.
These can be very useful for users of Jupyter but are not generic for non-Jupyter-based computational projects.
Some criteria (which are indeed very good in a more general context) do not directly relate to reproducibility, for example their Rule 1: ``Tell a Story for an Audience''.
Generally, as reviewed in
-\ifdefined\separatesupplement
-the main body of this paper (section on the longevity of existing tools)
-\else
-Section \ref{sec:longevityofexisting}
+\ifdefined\separatesupplement%
+the main body of this paper (section on the longevity of existing tools)%
+\else%
+Section \ref{sec:longevityofexisting}%
\fi
and Section \ref{appendix:jupyter} (below), Jupyter itself has many issues regarding reproducibility.
To create Docker images, N\"ust et al. propose ``ten simple rules'' in \citeappendix{nust20}.
They recommend some issues that can indeed help increase the quality of Docker images and their production/usage, such as their rule 7 to ``mount datasets [only] at run time'' to separate the computational environment from the data.
However, the long-term reproducibility of the images is not included as a criterion by these authors.
For example, they recommend using base operating systems, with version identification limited to a single brief identifier such as \inlinecode{ubuntu:18.04}, which has a serious problem with longevity issues
-\ifdefined\separatesupplement
-(as discussed in the longevity of existing tools section of the main paper).
-\else
-(Section \ref{sec:longevityofexisting}).
-\fi
+\ifdefined\separatesupplement%
+(as discussed in the longevity of existing tools section of the main paper)%
+\else%
+(Section \ref{sec:longevityofexisting})%
+\fi.
Furthermore, in their proof-of-concept Dockerfile (listing 1), \inlinecode{rocker} is used with a tag (not a digest), which can be problematic due to the high risk of ambiguity (as discussed in Section \ref{appendix:containers}).
+Previous criteria are thus primarily targeted to immediate reproducibility and do not consider longevity.
+Therefore, they lack a strong/clear completeness criterion (they mainly only suggest, rather than require, the recording of versions, and their ultimate suggestion of storing the full binary OS in a binary VM or container is problematic (as mentioned in \ref{appendix:independentenvironment} and \citeappendix{oliveira18}).
@@ -81,7 +83,7 @@ In particular, the heavy investment one has to make in order to re-do another sc
RED also influenced other early reproducible works, for example \citeappendix{buckheit1995}.
To orchestrate the various figures/results of a project, from 1990, they used ``Cake'' \citeappendix{somogyi87}, a dialect of Make, for more on Make, see Appendix \ref{appendix:jobmanagement}.
-As described in \cite{schwab2000}, in the latter half of that decade, they moved to GNU Make, which was much more commonly used, developed and came with a complete and up-to-date manual.
+As described in \cite{schwab2000}, in the latter half of that decade, they moved to GNU Make, which was much more commonly used, better maintained, and came with a complete and up-to-date manual.
The basic idea behind RED's solution was to organize the analysis as independent steps, including the generation of plots, and organizing the steps through a Makefile.
This enabled all the results to be re-executed with a single command.
Several basic low-level Makefiles were included in the high-level/central Makefile.
@@ -100,13 +102,13 @@ Hence in 2006 SEP moved to a new Python-based framework called Madagascar, see A
\subsection{Apache Taverna (2003)}
\label{appendix:taverna}
-Apache Taverna\footnote{\inlinecode{\url{https://taverna.incubator.apache.org}}} \citeappendix{oinn04} is a workflow management system written in Java with a graphical user interface which is still being developed.
+Apache Taverna\footnote{\inlinecode{\url{https://taverna.incubator.apache.org}}} \citeappendix{oinn04} is a workflow management system written in Java with a graphical user interface which is still being used and developed.
A workflow is defined as a directed graph, where nodes are called ``processors''.
Each Processor transforms a set of inputs into a set of outputs and they are defined in the Scufl language (an XML-based language, where each step is an atomic task).
Other components of the workflow are ``Data links'' and ``Coordination constraints''.
The main user interface is graphical, where users move processors in the given space and define links between their inputs and outputs (manually constructing a lineage like
\ifdefined\separatesupplement
-lineage figure of the main paper.
+lineage figure of the main paper).
\else
Figure \ref{fig:datalineage}).
\fi
@@ -145,12 +147,12 @@ Furthermore, the blending of the workflow component with the low-level analysis
\subsection{GenePattern (2004)}
\label{appendix:genepattern}
GenePattern\footnote{\inlinecode{\url{https://www.genepattern.org}}} \citeappendix{reich06} (first released in 2004) is a client-server software containing many common analysis functions/modules, primarily focused for Gene studies.
-Although it is highly focused to a special research field, it is reviewed here because its concepts/methods are generic, and in the context of this paper.
+Although it is highly focused to a special research field, it is reviewed here because its concepts/methods are generic.
Its server-side software is installed with fixed software packages that are wrapped into GenePattern modules.
The modules are used through a web interface, the modern implementation is GenePattern Notebook \citeappendix{reich17}.
It is an extension of the Jupyter notebook (see Appendix \ref{appendix:editors}), which also has a special ``GenePattern'' cell that will connect to GenePattern servers for doing the analysis.
-However, the wrapper modules just call an existing tool on the host system.
+However, the wrapper modules just call an existing tool on the running system.
Given that each server may have its own set of installed software, the analysis may differ (or crash) when run on different GenePattern servers, hampering reproducibility.
%% GenePattern shutdown announcement (although as of November 2020, it does not open any more): https://www.genepattern.org/blog/2019/10/01/the-genomespace-project-is-ending-on-november-15-2019
@@ -159,7 +161,9 @@ However, it was shut down on November 15th 2019 due to the end of funding.
All processing with this sever has stopped, and any archived data on it has been deleted.
Since GenePattern is free software, there are alternative public servers to use, so hopefully, work on it will continue.
However, funding is limited and those servers may face similar funding problems.
+
This is a very nice example of the fragility of solutions that depend on archiving and running the research codes with high-level research products (including data and binary/compiled codes that are expensive to keep in one place).
+The data and software may have backups in other places, but the high-level project-specific workflows that researchers spent most time on, have been lost due to the deletion (unless they were backed up privately by the authors!).
@@ -169,11 +173,11 @@ This is a very nice example of the fragility of solutions that depend on archivi
Kepler\footnote{\inlinecode{\url{https://kepler-project.org}}} \citeappendix{ludascher05} is a Java-based Graphic User Interface workflow management tool.
Users drag-and-drop analysis components, called ``actors'', into a visual, directional graph, which is the workflow (similar to
\ifdefined\separatesupplement
-the lineage figure shown in the main paper.
+the lineage figure shown in the main paper).
\else
Figure \ref{fig:datalineage}).
\fi
-Each actor is connected to others through the Ptolemy II\footnote{\inlinecode{\url{https://ptolemy.berkeley.edu}}} \citeappendix{eker03}.
+Each actor is connected to others through Ptolemy II\footnote{\inlinecode{\url{https://ptolemy.berkeley.edu}}} \citeappendix{eker03}.
In many aspects, the usage of Kepler and its issues for long-term reproducibility is like Apache Taverna (see Section \ref{appendix:taverna}).
@@ -193,14 +197,14 @@ Since the main goal was visualization (as images), apparently its primary output
Its design is based on a change-based provenance model using a custom VisTrails provenance query language (vtPQL), for more see \citeappendix{scheidegger08}.
Since XML is a plain text format, as the user inspects the data and makes changes to the analysis, the changes are recorded as ``trails'' in the project's VisTrails repository that operates very much like common version control systems (see Appendix \ref{appendix:versioncontrol}).
.
-However, even though XML is in plain text, it is very hard to edit manually.
+However, even though XML is in plain text, it is very hard to read/edit without the VisTrails software (which is no longer maintained).
VisTrails, therefore, provides a graphic user interface with a visual representation of the project's inter-dependent steps (similar to
\ifdefined\separatesupplement
the data lineage figure of the main paper).
\else
Figure \ref{fig:datalineage}).
\fi
-Besides the fact that it is no longer maintained, VisTrails does not control the software that is run, it only controls the sequence of steps that they are run in.
+Besides the fact that it is no longer maintained, VisTrails did not control the software that is run, it only controlled the sequence of steps that they are run in.
@@ -213,7 +217,7 @@ The main user interface is the ``Galaxy Pages'', which does not require any prog
Therefore the actual running version of the program can be hard to control across different Galaxy servers.
Besides the automatically generated metadata of a project (which include version control, or its history), users can also tag/annotate each analysis step, describing its intent/purpose.
Besides some small differences, Galaxy seems very similar to GenePattern (Appendix \ref{appendix:genepattern}), so most of the same points there apply here too.
-For example the very large cost of maintaining such a system and being based on a graphic environment.
+For example the very large cost of maintaining such a system, being based on a graphic environment and blending hand-written code with automatically generated (large) files.
@@ -225,17 +229,16 @@ The IPOL journal\footnote{\inlinecode{\url{https://www.ipol.im}}} \citeappendix{
An IPOL paper is a traditional research paper, but with a focus on implementation.
The published narrative description of the algorithm must be detailed to a level that any specialist can implement it in their own programming language (extremely detailed).
The author's own implementation of the algorithm is also published with the paper (in C, C++, or MATLAB), the code must be commented well enough and link each part of it with the relevant part of the paper.
-The authors must also submit several examples of datasets/scenarios.
+The authors must also submit several example of datasets that show the applicability of their proposed algorithm.
The referee is expected to inspect the code and narrative, confirming that they match with each other, and with the stated conclusions of the published paper.
After publication, each paper also has a ``demo'' button on its web page, allowing readers to try the algorithm on a web-interface and even provide their own input.
-IPOL has grown steadily over the last 10 years, publishing 23 research articles in 2019 alone.
+IPOL has grown steadily over the last 10 years, publishing 23 research articles in 2019.
We encourage the reader to visit its web page and see some of its recent papers and their demos.
The reason it can be so thorough and complete is its very narrow scope (low-level image processing algorithms), where the published algorithms are highly atomic, not needing significant dependencies (beyond input/output of well-known formats), allowing the referees and readers to go deeply into each implemented algorithm.
In fact, high-level languages like Perl, Python, or Java are not acceptable in IPOL precisely because of the additional complexities, such as the dependencies that they require.
-However, many data-intensive projects commonly involve dozens of high-level dependencies, with large and complex data formats and analysis, so this solution is not scalable.
+However, many data-intensive projects commonly involve dozens of high-level dependencies, with large and complex data formats and analysis, so while it is modular (a single module, doing a very specific thing) this solution is not scalable.
-IPOL thus fails on our Scalability criteria.
Furthermore, by not publishing/archiving each paper's version control history or directly linking the analysis and produced paper, it fails criteria 6 and 7.
Note that on the web page, it is possible to change parameters, but that will not affect the produced PDF.
A paper written in Maneage (the proof-of-concept solution presented in this paper) could be scrutinized at a similar detailed level to IPOL, but for much more complex research scenarios, involving hundreds of dependencies and complex processing of the data.
@@ -267,13 +270,15 @@ In the Python version, all processing steps and input data (or references to the
When the Python module contains a component written in other languages (mostly C or C++), it needs to be an external dependency to the Active Paper.
As mentioned in \citeappendix{hinsen15}, the fact that it relies on HDF5 is a caveat of Active Papers, because many tools are necessary to merely open it.
-Downloading the pre-built HDF View binaries (provided by the HDF group) is not possible anonymously/automatically (login is required).
+Downloading the pre-built ``HDF View'' binaries (a GUI browser of HDF5 files that is provided by the HDF group) is not possible anonymously/automatically (login is required).
Installing it using the Debian or Arch Linux package managers also failed due to dependencies in our trials.
Furthermore, as a high-level data format HDF5 evolves very fast, for example HDF5 1.12.0 (February 29th, 2020) is not usable with older libraries provided by the HDF5 team. % maybe replace with: February 29\textsuperscript{th}, 2020?
-While data and code are indeed fundamentally similar concepts technically\citeappendix{hinsen16}, they are used by humans differently due to their volume: the code of a large project involving Terabytes of data can be less than a megabyte.
-Hence, storing code and data together becomes a burden when large datasets are used, this was also acknowledged in \citeappendix{hinsen15}.
-Also, if the data are proprietary (for example medical patient data), the data must not be released, but the methods that were applied to them can be published.
+While data and code are indeed fundamentally similar concepts technically\citeappendix{hinsen16}, they are used by humans differently.
+The hand-written code of a large project involving Terabytes of data can be 100 kilo bytes.
+When the two are bundled together, merely seeing one line of the code, requires downloading Terabytes volume that is not needed, this was also acknowledged in \citeappendix{hinsen15}.
+It may also happen that the data are proprietary (for example medical patient data).
+In such cases, the data must not be publicly released, but the methods that were applied to them can.
Furthermore, since all reading and writing is done in the HDF5 file, it can easily bloat the file to very large sizes due to temporary files.
These files can later be removed as part of the analysis, but this makes the code more complicated and hard to read/maintain.
For example the Active Papers HDF5 file of \citeappendix[in \href{https://doi.org/10.5281/zenodo.2549987}{zenodo.2549987}]{kneller19} is 1.8 giga-bytes.
@@ -290,11 +295,14 @@ Hence the extra volume for data and obscure HDF5 format that needs special tools
\label{appendix:collage}
The Collage Authoring Environment \citeappendix{nowakowski11} was the winner of Elsevier Executable Paper Grand Challenge \citeappendix{gabriel11}.
It is based on the GridSpace2\footnote{\inlinecode{\url{http://dice.cyfronet.pl}}} distributed computing environment, which has a web-based graphic user interface.
-Through its web-based interface, viewers of a paper can actively experiment with the parameters of a published paper's displayed outputs (for example figures).
-%\tonote{See how it containerizes the software environment}
-
-
+Through its web-based interface, viewers of a paper can actively experiment with the parameters of a published paper's displayed outputs (for example figures) through a web interface.
+In their Figure 3, they nicely vizualize how the ``Executable Paper'' of Collage operates through two servers and a computing backend.
+Unfortunately in the paper no webpage has been provided follow up on the work and find its current status.
+A web search also only pointed us to its main paper (\citeappendix{nowakowski11}).
+In the paper they do not discuss the major issue of software versioning and its verification to ensure that future updates to the backend do not affect the result; apparently it just assumes the software exist on the ``Computing backend''.
+Since we could not access or test it, from the descriptions in the paper, it seems to be very similar to the modern day Jupyter notebook concept (see \ref{appendix:jupyter}), which had not yet been created in its current form in 2011.
+So we expect similar longevity issues with Collage.
\subsection{SHARE (2011)}
@@ -302,7 +310,7 @@ Through its web-based interface, viewers of a paper can actively experiment with
SHARE\footnote{\inlinecode{\url{https://is.ieis.tue.nl/staff/pvgorp/share}}} \citeappendix{vangorp11} is a web portal that hosts virtual machines (VMs) for storing the environment of a research project.
SHARE was recognized as the second position in the Elsevier Executable Paper Grand Challenge \citeappendix{gabriel11}.
Simply put, SHARE was just a VM library that users could download or connect to, and run.
-The limitations of VMs for reproducibility were discussed in Appendix \ref{appendix:virtualmachines}, and the SHARE system does not specify any requirements on making the VM itself reproducible.
+The limitations of VMs for reproducibility were discussed in Appendix \ref{appendix:virtualmachines}, and the SHARE system does not specify any requirements or standards on making the VM itself reproducible, or enforcing common internals for its supported projects.
As of January 2021, the top SHARE web page still works.
However, upon selecting any operation, a notice is printed that ``SHARE is offline'' since 2019 and the reason is not mentioned.
@@ -315,15 +323,23 @@ However, upon selecting any operation, a notice is printed that ``SHARE is offli
A ``verifiable computational result''\footnote{\inlinecode{\url{http://vcr.stanford.edu}}} is an output (table, figure, etc) that is associated with a ``verifiable result identifier'' (VRI), see \citeappendix{gavish11}.
It was awarded the third prize in the Elsevier Executable Paper Grand Challenge \citeappendix{gabriel11}.
-A VRI is created using tags within the programming source that produced that output, also recording its version control or history.
+A VRI is a hash that is created using tags within the programming source that produced that output, also recording its version control or history.
This enables the exact identification and citation of results.
The VRIs are automatically generated web-URLs that link to public VCR repositories containing the data, inputs, and scripts, that may be re-executed.
-According to \citeappendix{gavish11}, the VRI generation routine has been implemented in MATLAB, R, and Python, although only the MATLAB version was available during the writing of this paper.
+According to \citeappendix{gavish11}, the VRI generation routine has been implemented in MATLAB, R, and Python, although only the MATLAB version was available on the webpage in January 2021.
VCR also has special \LaTeX{} macros for loading the respective VRI into the generated PDF.
+In effect this is very similar to what have done at the end of the caption of
+\ifdefined\separatesupplement
+the first figure in the main body of the paper,
+\else
+Figure \ref{fig:datalineage},
+\fi
+where you can click on the given Zenodo link and be taken to the raw data that created the plot.
+However, instead of a long and hard to read hash, we simply point to the plotted file's source as a Zenodo DOI (which has long term funding for logevity).
-Unfortunately, most parts of the web page are not complete at the time of this writing.
-The VCR web page contains an example PDF\footnote{\inlinecode{\url{http://vcr.stanford.edu/paper.pdf}}} that is generated with this system, but the linked VCR repository\footnote{\inlinecode{\url{http://vcr-stat.stanford.edu}}} does not exist at the time of this writing.
-Finally, the date of the files in the MATLAB extension tarball is set to 2011, hinting that probably VCR has been abandoned soon after the publication of \citeappendix{gavish11}.
+Unfortunately, most parts of the web page are not complete as of January 2021.
+The VCR web page contains an example PDF\footnote{\inlinecode{\url{http://vcr.stanford.edu/paper.pdf}}} that is generated with this system, but the linked VCR repository\footnote{\inlinecode{\url{http://vcr-stat.stanford.edu}}} did not exist (again, as of January 2021).
+Finally, the date of the files in the MATLAB extension tarball is set to May 2011, hinting that probably VCR has been abandoned soon after the publication of \citeappendix{gavish11}.
@@ -340,8 +356,9 @@ SOLE also supports workflows as Galaxy tools \citeappendix{goecks10}.
For reproducibility, \citeappendix{pham12} suggest building a SOLE-based project in a virtual machine, using any custom package manager that is hosted on a private server to obtain a usable URI.
However, as described in Appendices \ref{appendix:independentenvironment} and \ref{appendix:packagemanagement}, unless virtual machines are built with robust package managers, this is not a sustainable solution (the virtual machine itself is not reproducible).
Also, hosting a large virtual machine server with fixed IP on a hosting service like Amazon (as suggested there) for every project in perpetuity will be very expensive.
+
The manual/artificial definition of tags to connect parts of the paper with the analysis scripts is also a caveat due to human error and incompleteness (the authors may not consider tags as important things, but they may be useful later).
-In Maneage, instead of using artificial/commented tags, the analysis inputs and outputs are automatically linked into the paper's text through \LaTeX{} macros that are the backbone of the whole system (aren't artifical/extra features).
+In Maneage, instead of using artificial/commented tags, the analysis inputs and outputs are automatically linked into the paper's text through \LaTeX{} macros that are the backbone of the whole system (are not artifical/extra features).
@@ -372,6 +389,7 @@ It thus provides resources to link various workflow/analysis components (see App
\citeappendix{bechhofer13} describes how a workflow in Apache Taverna (Appendix \ref{appendix:taverna}) can be translated into research objects.
The important thing is that the research object concept is not specific to any special workflow, it is just a metadata bundle/standard which is only as robust in reproducing the result as the running workflow.
+Therefore if implemented over a complete workflow like Maneage, it can be very useful in analysing/optimizing the workflow, finding common components between many Maneage'd workflows, or translating to other complete workflows.
@@ -386,7 +404,7 @@ Sciunit was originally written in Python 2 (which reached its end-of-life on Jan
Therefore Sciunit2 is a new implementation in Python 3.
The main issue with Sciunit's approach is that the copied binaries are just black boxes: it is not possible to see how the used binaries from the initial system were built.
-This is a major problem for scientific projects: in principle (not knowing how the programs were built) and in practice (archiving a large volume sciunit for every step of the analysis requires a lot of storage space).
+This is a major problem for scientific projects: in principle (not knowing how the programs were built) and in practice (archiving a large volume sciunit for every step of the analysis requires a lot of storage space and archival cost).
@@ -423,9 +441,9 @@ Hence two projects that each use a 1-terabyte dataset will need a full copy of t
\subsection{Binder (2017)}
Binder\footnote{\inlinecode{\url{https://mybinder.org}}} is used to containerize already existing Jupyter based processing steps.
Users simply add a set of Binder-recognized configuration files to their repository and Binder will build a Docker image and install all the dependencies inside of it with Conda (the list of necessary packages comes from Conda).
-One good feature of Binder is that the imported Docker image must be tagged (something like a checksum).
-This will ensure that future/latest updates of the imported Docker image are not mistakenly used.
+One good feature of Binder is that the imported Docker image must be tagged, although as mentioned in Appendix \ref{appendix:containers}, tags do not ensure reproducibility.
However, it does not make sure that the Dockerfile used by the imported Docker image follows a similar convention also.
+So users can simply use generic operating system names.
Binder is used by \citeappendix{jones19}.
@@ -436,6 +454,8 @@ Binder is used by \citeappendix{jones19}.
%% I took the date from their PiPy page, where the first version 0.1 was published in November 2016.
Gigantum\footnote{\inlinecode{\url{https://gigantum.com}}} is a client/server system, in which the client is a web-based (graphical) interface that is installed as ``Gigantum Desktop'' within a Docker image.
Gigantum uses Docker containers for an independent environment, Conda (or Pip) to install packages, Jupyter notebooks to edit and run code, and Git to store its history.
+The reproducibility issues with these tools has been thoroughly discussed in \ref{appendix:existingtools}.
+
Simply put, it is a high-level wrapper for combining these components.
Internally, a Gigantum project is organized as files in a directory that can be opened without their own client.
The file structure (which is under version control) includes codes, input data, and output data.
@@ -452,19 +472,19 @@ However, there is one directory that can be used to store files that must not be
Popper\footnote{\inlinecode{\url{https://falsifiable.us}}} is a software implementation of the Popper Convention \citeappendix{jimenez17}.
The Popper team's own solution is through a command-line program called \inlinecode{popper}.
The \inlinecode{popper} program itself is written in Python.
-However, job management was initially based on the HashiCorp configuration language (HCL) because HCL was used by ``GitHub Actions'' to manage workflows.
-Moreover, from October 2019 GitHub changed to a custom YAML-based language, so Popper also deprecated HCL.
-This is an important issue when low-level choices are based on service providers.
+However, job management was initially based on the HashiCorp configuration language (HCL) because HCL was used by ``GitHub Actions'' to manage workflows at that time.
+However, from October 2019 GitHub changed to a custom YAML-based language, so Popper also deprecated HCL.
+This is an important issue when low-level choices are based on service providers (see Appendix \ref{appendix:highlevelinworkflow}).
To start a project, the \inlinecode{popper} command-line program builds a template, or ``scaffold'', which is a minimal set of files that can be run.
-However, as of this writing, the scaffold is not complete: it lacks a manuscript and validation of outputs (as mentioned in the convention).
By default, Popper runs in a Docker image (so root permissions are necessary and reproducible issues with Docker images have been discussed above), but Singularity is also supported.
See Appendix \ref{appendix:independentenvironment} for more on containers, and Appendix \ref{appendix:highlevelinworkflow} for using high-level languages in the workflow.
Popper does not comply with the completeness, minimal complexity, and including the narrative criteria.
Moreover, the scaffold that is provided by Popper is an output of the program that is not directly under version control.
-Hence, tracking future changes in Popper and how they relate to the high-level projects that depend on it will be very hard.
-In Maneage, the same \inlinecode{maneage} git branch is shared by the developers and users; any new feature or change in Maneage can thus be directly tracked with Git when the high-level project merges their branch with Maneage.
+Hence, tracking future low-level changes in Popper and how they relate to the high-level projects that depend on it through the scaffold will be very hard.
+In Maneage, users start their projects by branching-off of the core \inlinecode{maneage} git branch.
+Hence any future change in the low level features will directly propagated to all derived projects (and be clear as Git conflicts if the user has customized them).
@@ -477,10 +497,11 @@ It uses online editors like Jupyter or RStudio (see Appendix \ref{appendix:edito
The web-based nature of Whole Tale's approach and its dependency on many tools (which have many dependencies themselves) is a major limitation for future reproducibility.
For example, when following their own tutorial on ``Creating a new tale'', the provided Jupyter notebook could not be executed because of a dependency problem.
-This was reported to the authors as issue 113\footnote{\inlinecode{\url{https://github.com/whole-tale/wt-design-docs/issues/113}}}, but as all the second-order dependencies evolve, it is not hard to envisage such dependency incompatibilities being the primary issue for older projects on Whole Tale.
+This was reported to the authors as issue 113\footnote{\inlinecode{\url{https://github.com/whole-tale/wt-design-docs/issues/113}}} and fixed.
+But as all the second-order dependencies evolve, it is not hard to envisage such dependency incompatibilities being the primary issue for older projects on Whole Tale.
Furthermore, the fact that a Tale is stored as a binary Docker container causes two important problems:
1) it requires a very large storage capacity for every project that is hosted there, making it very expensive to scale if demand expands.
-2) It is not possible to see how the environment was built accurately (when the Dockerfile uses \inlinecode{apt}).
+2) It is not possible to see how the environment was built accurately (when the Dockerfile uses operating system package managers like \inlinecode{apt}).
This issue with Whole Tale (and generally all other solutions that only rely on preserving a container/VM) was also mentioned in \citeappendix{oliveira18}, for more on this, please see Appendix \ref{appendix:packagemanagement}.
@@ -488,6 +509,7 @@ This issue with Whole Tale (and generally all other solutions that only rely on
\subsection{Occam (2018)}
+\label{appendix:occam}
Occam\footnote{\inlinecode{\url{https://occam.cs.pitt.edu}}} \citeappendix{oliveira18} is web-based application to preserve software and its execution.
To achieve long-term reproducibility, Occam includes its own package manager (instructions to build software and their dependencies) to be in full control of the software build instructions, similar to Maneage.
Besides Nix or Guix (which are primarily a package manager that can also do job management), Occam has been the only solution in our survey here that attempts to be complete in this aspect.