aboutsummaryrefslogtreecommitdiff
path: root/tex/src/appendix-existing-solutions.tex
diff options
context:
space:
mode:
Diffstat (limited to 'tex/src/appendix-existing-solutions.tex')
-rw-r--r--tex/src/appendix-existing-solutions.tex502
1 files changed, 502 insertions, 0 deletions
diff --git a/tex/src/appendix-existing-solutions.tex b/tex/src/appendix-existing-solutions.tex
new file mode 100644
index 0000000..e802644
--- /dev/null
+++ b/tex/src/appendix-existing-solutions.tex
@@ -0,0 +1,502 @@
+%% Appendix on reviewing existing reproducible workflow solutions. This
+%% file is loaded by the project's 'paper.tex' or 'tex/src/supplement.tex',
+%% it should not be run independently.
+%
+%% Copyright (C) 2020-2021 Mohammad Akhlaghi <mohammad@akhlaghi.org>
+%% Copyright (C) 2021 Raúl Infante-Sainz <infantesainz@gmail.com>
+%
+%% This file is free software: you can redistribute it and/or modify it
+%% under the terms of the GNU General Public License as published by the
+%% Free Software Foundation, either version 3 of the License, or (at your
+%% option) any later version.
+%
+%% This file is distributed in the hope that it will be useful, but WITHOUT
+%% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+%% FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+%% for more details. See <http://www.gnu.org/licenses/>.
+
+
+
+
+
+\section{Survey of common existing reproducible workflows}
+\label{appendix:existingsolutions}
+As reviewed in the introduction, the problem of reproducibility has received considerable attention over the last three decades and various solutions have already been proposed.
+The core principles that many of the existing solutions (including Maneage) aim to achieve are nicely summarized by the FAIR principles \citeappendix{wilkinson16}.
+In this appendix, some of the solutions are reviewed.
+The solutions are based on an evolving software landscape, therefore they are ordered by date: when the project has a web page, the year of its first release is used for the sorting.
+Otherwise their paper's publication year is used.
+
+For each solution, we summarize its methodology and discuss how it relates to the criteria proposed in this paper.
+Freedom of the software/method is a core concept behind scientific reproducibility, as opposed to industrial reproducibility where a black box is acceptable/desirable.
+Therefore proprietary solutions like Code Ocean\footnote{\inlinecode{\url{https://codeocean.com}}} or Nextjournal\footnote{\inlinecode{\url{https://nextjournal.com}}} will not be reviewed here.
+Other studies have also attempted to review existing reproducible solutions, for example, see \citeappendix{konkol20}.
+
+
+
+
+
+\subsection{Suggested rules, checklists, or criteria}
+Before going into the various implementations, it is also useful to review existing suggested rules, checklists, or criteria for computationally reproducible research.
+
+All the cases below are primarily targeted to immediate reproducibility and do not consider longevity explicitly.
+Therefore, they lack a strong/clear completeness criterion (they mainly only suggest, rather than require, the recording of versions, and their ultimate suggestion of storing the full binary OS in a binary VM or container is problematic (as mentioned in \ref{appendix:independentenvironment} and \citeappendix{oliveira18}).
+
+Sandve et al. \citeappendix{sandve13} propose ``ten simple rules for reproducible computational research'' that can be applied in any project.
+Generally, these are very similar to the criteria proposed here and follow a similar spirit, but they do not provide any actual research papers following up all those points, nor do they provide a proof of concept.
+The Popper convention \citeappendix{jimenez17} also provides a set of principles that are indeed generally useful, among which some are common to the criteria here (for example, automatic validation, and, as in Maneage, the authors suggest providing a template for new users), but the authors do not include completeness as a criterion nor pay attention to longevity (Popper itself is written in Python with many dependencies, and its core operating language has already changed once).
+For more on Popper, please see Section \ref{appendix:popper}.
+
+For improved reproducibility in Jupyter notebook users, \citeappendix{rule19} propose ten rules to improve reproducibility and also provide links to example implementations.
+These can be very useful for users of Jupyter but are not generic for non-Jupyter-based computational projects.
+Some criteria (which are indeed very good in a more general context) do not directly relate to reproducibility, for example their Rule 1: ``Tell a Story for an Audience''.
+Generally, as reviewed in
+\ifdefined\separatesupplement
+the main body of this paper (section on the longevity of existing tools)
+\else
+Section \ref{sec:longevityofexisting}
+\fi
+and Section \ref{appendix:jupyter} (below), Jupyter itself has many issues regarding reproducibility.
+To create Docker images, N\"ust et al. propose ``ten simple rules'' in \citeappendix{nust20}.
+They recommend some issues that can indeed help increase the quality of Docker images and their production/usage, such as their rule 7 to ``mount datasets [only] at run time'' to separate the computational environment from the data.
+However, the long-term reproducibility of the images is not included as a criterion by these authors.
+For example, they recommend using base operating systems, with version identification limited to a single brief identifier such as \inlinecode{ubuntu:18.04}, which has a serious problem with longevity issues
+\ifdefined\separatesupplement
+(as discussed in the longevity of existing tools section of the main paper).
+\else
+(Section \ref{sec:longevityofexisting}).
+\fi
+Furthermore, in their proof-of-concept Dockerfile (listing 1), \inlinecode{rocker} is used with a tag (not a digest), which can be problematic due to the high risk of ambiguity (as discussed in Section \ref{appendix:containers}).
+
+
+
+
+
+\subsection{Reproducible Electronic Documents, RED (1992)}
+\label{appendix:red}
+RED\footnote{\inlinecode{\url{http://sep.stanford.edu/doku.php?id=sep:research:reproducible}}} is the first attempt that we could find on doing reproducible research, see \cite{claerbout1992,schwab2000}.
+It was developed within the Stanford Exploration Project (SEP) for Geophysics publications.
+Their introductions on the importance of reproducibility, resonate a lot with today's environment in computational sciences.
+In particular, the heavy investment one has to make in order to re-do another scientist's work, even in the same team.
+RED also influenced other early reproducible works, for example \citeappendix{buckheit1995}.
+
+To orchestrate the various figures/results of a project, from 1990, they used ``Cake'' \citeappendix{somogyi87}, a dialect of Make, for more on Make, see Appendix \ref{appendix:jobmanagement}.
+As described in \cite{schwab2000}, in the latter half of that decade, they moved to GNU Make, which was much more commonly used, developed and came with a complete and up-to-date manual.
+The basic idea behind RED's solution was to organize the analysis as independent steps, including the generation of plots, and organizing the steps through a Makefile.
+This enabled all the results to be re-executed with a single command.
+Several basic low-level Makefiles were included in the high-level/central Makefile.
+The reader/user of a project had to manually edit the central Makefile and set the variable \inlinecode{RESDIR} (result directory), this is the directory where built files are kept.
+The reader could later select which figures/parts of the project to reproduce by manually adding its name in the central Makefile, and running Make.
+
+At the time, Make was already practiced by individual researchers and projects as a job orchestration tool, but SEP's innovation was to standardize it as an internal policy, and define conventions for the Makefiles to be consistent across projects.
+This enabled new members to benefit from the already existing work of previous team members (who had graduated or moved to other jobs).
+However, RED only used the existing software of the host system, it had no means to control them.
+Therefore, with wider adoption, they confronted a ``versioning problem'' where the host's analysis software had different versions on different hosts, creating different results, or crashing \citeappendix{fomel09}.
+Hence in 2006 SEP moved to a new Python-based framework called Madagascar, see Appendix \ref{appendix:madagascar}.
+
+
+
+
+
+\subsection{Apache Taverna (2003)}
+\label{appendix:taverna}
+Apache Taverna\footnote{\inlinecode{\url{https://taverna.incubator.apache.org}}} \citeappendix{oinn04} is a workflow management system written in Java with a graphical user interface which is still being developed.
+A workflow is defined as a directed graph, where nodes are called ``processors''.
+Each Processor transforms a set of inputs into a set of outputs and they are defined in the Scufl language (an XML-based language, where each step is an atomic task).
+Other components of the workflow are ``Data links'' and ``Coordination constraints''.
+The main user interface is graphical, where users move processors in the given space and define links between their inputs and outputs (manually constructing a lineage like
+\ifdefined\separatesupplement
+lineage figure of the main paper.
+\else
+Figure \ref{fig:datalineage}).
+\fi
+Taverna is only a workflow manager and is not integrated with a package manager, hence the versions of the used software can be different in different runs.
+\citeappendix{zhao12} have studied the problem of workflow decays in Taverna.
+
+
+
+
+
+\subsection{Madagascar (2003)}
+\label{appendix:madagascar}
+Madagascar\footnote{\inlinecode{\url{http://ahay.org}}} \citeappendix{fomel13} is a set of extensions to the SCons job management tool (reviewed in \ref{appendix:scons}).
+Madagascar is a continuation of the Reproducible Electronic Documents (RED) project that was discussed in Appendix \ref{appendix:red}.
+Madagascar has been used in the production of hundreds of research papers or book chapters\footnote{\inlinecode{\url{http://www.ahay.org/wiki/Reproducible_Documents}}}, 120 prior to \citeappendix{fomel13}.
+
+Madagascar does include project management tools in the form of SCons extensions.
+However, it is not just a reproducible project management tool.
+The Regularly Sampled File (RSF) file format\footnote{\inlinecode{\url{http://www.ahay.org/wiki/Guide\_to\_RSF\_file\_format}}} is a custom plain-text file that points to the location of the actual data files on the file system and acts as the intermediary between Madagascar's analysis programs.
+Therefore, Madagascar is primarily a collection of analysis programs and tools to interact with RSF files and plotting facilities.
+For example in our test of Madagascar 3.0.1, it installed 855 Madagascar-specific analysis programs (\inlinecode{PREFIX/bin/sf*}).
+The analysis programs mostly target geophysical data analysis, including various project-specific tools: more than half of the total built tools are under the \inlinecode{build/user} directory which includes names of Madagascar users.
+
+Besides the location or contents of the data, RSF also contains name/value pairs that can be used as options to Madagascar programs, which are built with inputs and outputs of this format.
+Since RSF contains program options also, the inputs and outputs of Madagascar's analysis programs are read from, and written to, standard input and standard output.
+
+In terms of completeness, as long as the user only uses Madagascar's own analysis programs, it is fairly complete at a high level (not lower-level OS libraries).
+However, this comes at the expense of a large amount of bloatware (programs that one project may never need, but is forced to build), thus adding complexity.
+Also, the linking between the analysis programs (of a certain user at a certain time) and future versions of that program (that is updated in time) is not immediately obvious.
+Furthermore, the blending of the workflow component with the low-level analysis components fails the modularity criteria.
+
+
+
+
+
+\subsection{GenePattern (2004)}
+\label{appendix:genepattern}
+GenePattern\footnote{\inlinecode{\url{https://www.genepattern.org}}} \citeappendix{reich06} (first released in 2004) is a client-server software containing many common analysis functions/modules, primarily focused for Gene studies.
+Although it is highly focused to a special research field, it is reviewed here because its concepts/methods are generic, and in the context of this paper.
+
+Its server-side software is installed with fixed software packages that are wrapped into GenePattern modules.
+The modules are used through a web interface, the modern implementation is GenePattern Notebook \citeappendix{reich17}.
+It is an extension of the Jupyter notebook (see Appendix \ref{appendix:editors}), which also has a special ``GenePattern'' cell that will connect to GenePattern servers for doing the analysis.
+However, the wrapper modules just call an existing tool on the host system.
+Given that each server may have its own set of installed software, the analysis may differ (or crash) when run on different GenePattern servers, hampering reproducibility.
+
+%% GenePattern shutdown announcement (although as of November 2020, it does not open any more): https://www.genepattern.org/blog/2019/10/01/the-genomespace-project-is-ending-on-november-15-2019
+The primary GenePattern server was active since 2008 and had 40,000 registered users with 2000 to 5000 jobs running every week \citeappendix{reich17}.
+However, it was shut down on November 15th 2019 due to the end of funding.
+All processing with this sever has stopped, and any archived data on it has been deleted.
+Since GenePattern is free software, there are alternative public servers to use, so hopefully, work on it will continue.
+However, funding is limited and those servers may face similar funding problems.
+This is a very nice example of the fragility of solutions that depend on archiving and running the research codes with high-level research products (including data and binary/compiled codes that are expensive to keep in one place).
+
+
+
+
+
+\subsection{Kepler (2005)}
+Kepler\footnote{\inlinecode{\url{https://kepler-project.org}}} \citeappendix{ludascher05} is a Java-based Graphic User Interface workflow management tool.
+Users drag-and-drop analysis components, called ``actors'', into a visual, directional graph, which is the workflow (similar to
+\ifdefined\separatesupplement
+the lineage figure shown in the main paper.
+\else
+Figure \ref{fig:datalineage}).
+\fi
+Each actor is connected to others through the Ptolemy II\footnote{\inlinecode{\url{https://ptolemy.berkeley.edu}}} \citeappendix{eker03}.
+In many aspects, the usage of Kepler and its issues for long-term reproducibility is like Apache Taverna (see Section \ref{appendix:taverna}).
+
+
+
+
+
+\subsection{VisTrails (2005)}
+\label{appendix:vistrails}
+VisTrails\footnote{\inlinecode{\url{https://www.vistrails.org}}} \citeappendix{bavoil05} was a graphical workflow managing system.
+According to its web page, VisTrails maintenance has stopped since May 2016, its last Git commit, as of this writing, was in November 2017.
+However, given that it was well maintained for over 10 years is an achievement.
+
+VisTrails (or ``visualization trails'') was initially designed for managing visualizations, but later grew into a generic workflow system with meta-data and provenance features.
+Each analysis step, or module, is recorded in an XML schema, which defines the operations and their dependencies.
+The XML attributes of each module can be used in any XML query language to find certain steps (for example those that used a certain command).
+Since the main goal was visualization (as images), apparently its primary output is in the form of image spreadsheets.
+Its design is based on a change-based provenance model using a custom VisTrails provenance query language (vtPQL), for more see \citeappendix{scheidegger08}.
+Since XML is a plain text format, as the user inspects the data and makes changes to the analysis, the changes are recorded as ``trails'' in the project's VisTrails repository that operates very much like common version control systems (see Appendix \ref{appendix:versioncontrol}).
+.
+However, even though XML is in plain text, it is very hard to edit manually.
+VisTrails, therefore, provides a graphic user interface with a visual representation of the project's inter-dependent steps (similar to
+\ifdefined\separatesupplement
+the data lineage figure of the main paper).
+\else
+Figure \ref{fig:datalineage}).
+\fi
+Besides the fact that it is no longer maintained, VisTrails does not control the software that is run, it only controls the sequence of steps that they are run in.
+
+
+
+
+
+\subsection{Galaxy (2010)}
+\label{appendix:galaxy}
+Galaxy\footnote{\inlinecode{\url{https://galaxyproject.org}}} is a web-based Genomics workbench \citeappendix{goecks10}.
+The main user interface is the ``Galaxy Pages'', which does not require any programming: users graphically manipulate abstract ``tools'' which are wrappers over command-line programs.
+Therefore the actual running version of the program can be hard to control across different Galaxy servers.
+Besides the automatically generated metadata of a project (which include version control, or its history), users can also tag/annotate each analysis step, describing its intent/purpose.
+Besides some small differences, Galaxy seems very similar to GenePattern (Appendix \ref{appendix:genepattern}), so most of the same points there apply here too.
+For example the very large cost of maintaining such a system and being based on a graphic environment.
+
+
+
+
+
+\subsection{Image Processing On Line journal, IPOL (2010)}
+\label{appendix:ipol}
+The IPOL journal\footnote{\inlinecode{\url{https://www.ipol.im}}} \citeappendix{limare11} (first published article in July 2010) publishes papers on image processing algorithms as well as the the full code of the proposed algorithm.
+An IPOL paper is a traditional research paper, but with a focus on implementation.
+The published narrative description of the algorithm must be detailed to a level that any specialist can implement it in their own programming language (extremely detailed).
+The author's own implementation of the algorithm is also published with the paper (in C, C++, or MATLAB), the code must be commented well enough and link each part of it with the relevant part of the paper.
+The authors must also submit several examples of datasets/scenarios.
+The referee is expected to inspect the code and narrative, confirming that they match with each other, and with the stated conclusions of the published paper.
+After publication, each paper also has a ``demo'' button on its web page, allowing readers to try the algorithm on a web-interface and even provide their own input.
+
+IPOL has grown steadily over the last 10 years, publishing 23 research articles in 2019 alone.
+We encourage the reader to visit its web page and see some of its recent papers and their demos.
+The reason it can be so thorough and complete is its very narrow scope (low-level image processing algorithms), where the published algorithms are highly atomic, not needing significant dependencies (beyond input/output of well-known formats), allowing the referees and readers to go deeply into each implemented algorithm.
+In fact, high-level languages like Perl, Python, or Java are not acceptable in IPOL precisely because of the additional complexities, such as the dependencies that they require.
+However, many data-intensive projects commonly involve dozens of high-level dependencies, with large and complex data formats and analysis, so this solution is not scalable.
+
+IPOL thus fails on our Scalability criteria.
+Furthermore, by not publishing/archiving each paper's version control history or directly linking the analysis and produced paper, it fails criteria 6 and 7.
+Note that on the web page, it is possible to change parameters, but that will not affect the produced PDF.
+A paper written in Maneage (the proof-of-concept solution presented in this paper) could be scrutinized at a similar detailed level to IPOL, but for much more complex research scenarios, involving hundreds of dependencies and complex processing of the data.
+
+
+
+
+
+\subsection{WINGS (2010)}
+\label{appendix:wings}
+WINGS\footnote{\inlinecode{\url{https://wings-workflows.org}}} \citeappendix{gil10} is an automatic workflow generation algorithm.
+It runs on a centralized web server, requiring many dependencies (such that it is recommended to download Docker images).
+It allows users to define various workflow components (for example datasets, analysis components, etc), with high-level goals.
+It then uses selection and rejection algorithms to find the best components using a pool of analysis components that can satisfy the requested high-level constraints.
+%\tonote{Read more about this}
+
+
+
+
+
+\subsection{Active Papers (2011)}
+\label{appendix:activepapers}
+Active Papers\footnote{\inlinecode{\url{http://www.activepapers.org}}} attempts to package the code and data of a project into one file (in HDF5 format).
+It was initially written in Java because its compiled byte-code outputs in JVM are portable on any machine \citeappendix{hinsen11}.
+However, Java is not a commonly used platform today, hence it was later implemented in Python \citeappendix{hinsen15}.
+
+In the Python version, all processing steps and input data (or references to them) are stored in an HDF5 file.
+%However, it can only account for pure-Python packages using the host operating system's Python modules \tonote{confirm this!}.
+When the Python module contains a component written in other languages (mostly C or C++), it needs to be an external dependency to the Active Paper.
+
+As mentioned in \citeappendix{hinsen15}, the fact that it relies on HDF5 is a caveat of Active Papers, because many tools are necessary to merely open it.
+Downloading the pre-built HDF View binaries (provided by the HDF group) is not possible anonymously/automatically (login is required).
+Installing it using the Debian or Arch Linux package managers also failed due to dependencies in our trials.
+Furthermore, as a high-level data format HDF5 evolves very fast, for example HDF5 1.12.0 (February 29th, 2020) is not usable with older libraries provided by the HDF5 team. % maybe replace with: February 29\textsuperscript{th}, 2020?
+
+While data and code are indeed fundamentally similar concepts technically\citeappendix{hinsen16}, they are used by humans differently due to their volume: the code of a large project involving Terabytes of data can be less than a megabyte.
+Hence, storing code and data together becomes a burden when large datasets are used, this was also acknowledged in \citeappendix{hinsen15}.
+Also, if the data are proprietary (for example medical patient data), the data must not be released, but the methods that were applied to them can be published.
+Furthermore, since all reading and writing is done in the HDF5 file, it can easily bloat the file to very large sizes due to temporary files.
+These files can later be removed as part of the analysis, but this makes the code more complicated and hard to read/maintain.
+For example the Active Papers HDF5 file of \citeappendix[in \href{https://doi.org/10.5281/zenodo.2549987}{zenodo.2549987}]{kneller19} is 1.8 giga-bytes.
+
+In many scenarios, peers just want to inspect the processing by reading the code and checking a very special part of it (one or two lines, just to see the option values to one step for example).
+They do not necessarily need to run it or obtain the output the datasets (which may be published elsewhere).
+Hence the extra volume for data and obscure HDF5 format that needs special tools for reading its plain-text internals is an issue.
+
+
+
+
+
+\subsection{Collage Authoring Environment (2011)}
+\label{appendix:collage}
+The Collage Authoring Environment \citeappendix{nowakowski11} was the winner of Elsevier Executable Paper Grand Challenge \citeappendix{gabriel11}.
+It is based on the GridSpace2\footnote{\inlinecode{\url{http://dice.cyfronet.pl}}} distributed computing environment, which has a web-based graphic user interface.
+Through its web-based interface, viewers of a paper can actively experiment with the parameters of a published paper's displayed outputs (for example figures).
+%\tonote{See how it containerizes the software environment}
+
+
+
+
+
+\subsection{SHARE (2011)}
+\label{appendix:SHARE}
+SHARE\footnote{\inlinecode{\url{https://is.ieis.tue.nl/staff/pvgorp/share}}} \citeappendix{vangorp11} is a web portal that hosts virtual machines (VMs) for storing the environment of a research project.
+SHARE was recognized as the second position in the Elsevier Executable Paper Grand Challenge \citeappendix{gabriel11}.
+Simply put, SHARE was just a VM library that users could download or connect to, and run.
+The limitations of VMs for reproducibility were discussed in Appendix \ref{appendix:virtualmachines}, and the SHARE system does not specify any requirements on making the VM itself reproducible.
+As of January 2021, the top SHARE web page still works.
+However, upon selecting any operation, a notice is printed that ``SHARE is offline'' since 2019 and the reason is not mentioned.
+
+
+
+
+
+\subsection{Verifiable Computational Result, VCR (2011)}
+\label{appendix:verifiableidentifier}
+A ``verifiable computational result''\footnote{\inlinecode{\url{http://vcr.stanford.edu}}} is an output (table, figure, etc) that is associated with a ``verifiable result identifier'' (VRI), see \citeappendix{gavish11}.
+It was awarded the third prize in the Elsevier Executable Paper Grand Challenge \citeappendix{gabriel11}.
+
+A VRI is created using tags within the programming source that produced that output, also recording its version control or history.
+This enables the exact identification and citation of results.
+The VRIs are automatically generated web-URLs that link to public VCR repositories containing the data, inputs, and scripts, that may be re-executed.
+According to \citeappendix{gavish11}, the VRI generation routine has been implemented in MATLAB, R, and Python, although only the MATLAB version was available during the writing of this paper.
+VCR also has special \LaTeX{} macros for loading the respective VRI into the generated PDF.
+
+Unfortunately, most parts of the web page are not complete at the time of this writing.
+The VCR web page contains an example PDF\footnote{\inlinecode{\url{http://vcr.stanford.edu/paper.pdf}}} that is generated with this system, but the linked VCR repository\footnote{\inlinecode{\url{http://vcr-stat.stanford.edu}}} does not exist at the time of this writing.
+Finally, the date of the files in the MATLAB extension tarball is set to 2011, hinting that probably VCR has been abandoned soon after the publication of \citeappendix{gavish11}.
+
+
+
+
+
+\subsection{SOLE (2012)}
+\label{appendix:sole}
+SOLE (Science Object Linking and Embedding) defines ``science objects'' (SOs) that can be manually linked with phrases of the published paper \citeappendix{pham12,malik13}.
+An SO is any code/content that is wrapped in begin/end tags with an associated type and name.
+For example, special commented lines in a Python, R, or C program.
+The SOLE command-line program parses the tagged file, generating metadata elements unique to the SO (including its URI).
+SOLE also supports workflows as Galaxy tools \citeappendix{goecks10}.
+
+For reproducibility, \citeappendix{pham12} suggest building a SOLE-based project in a virtual machine, using any custom package manager that is hosted on a private server to obtain a usable URI.
+However, as described in Appendices \ref{appendix:independentenvironment} and \ref{appendix:packagemanagement}, unless virtual machines are built with robust package managers, this is not a sustainable solution (the virtual machine itself is not reproducible).
+Also, hosting a large virtual machine server with fixed IP on a hosting service like Amazon (as suggested there) for every project in perpetuity will be very expensive.
+The manual/artificial definition of tags to connect parts of the paper with the analysis scripts is also a caveat due to human error and incompleteness (the authors may not consider tags as important things, but they may be useful later).
+In Maneage, instead of using artificial/commented tags, the analysis inputs and outputs are automatically linked into the paper's text through \LaTeX{} macros that are the backbone of the whole system (aren't artifical/extra features).
+
+
+
+
+
+\subsection{Sumatra (2012)}
+Sumatra\footnote{\inlinecode{\url{http://neuralensemble.org/sumatra}}} \citeappendix{davison12} attempts to capture the environment information of a running project.
+It is written in Python and is a command-line wrapper over the analysis script.
+By controlling a project at running-time, Sumatra is able to capture the environment it was run in.
+The captured environment can be viewed in plain text or a web interface.
+Sumatra also provides \LaTeX/Sphinx features, which will link the paper with the project's Sumatra database.
+This enables researchers to use a fixed version of a project's figures in the paper, even at later times (while the project is being developed).
+
+The actual code that Sumatra wraps around, must itself be under version control, and it does not run if there are non-committed changes (although it is not clear what happens if a commit is amended).
+Since information on the environment has been captured, Sumatra is able to identify if it has changed since a previous run of the project.
+Therefore Sumatra makes no attempt at storing the environment of the analysis as in Sciunit (see Appendix \ref{appendix:sciunit}), but its information.
+Sumatra thus needs to know the language of the running program and is not generic.
+It just captures the environment, it does not store \emph{how} that environment was built.
+
+
+
+
+
+\subsection{Research Object (2013)}
+\label{appendix:researchobject}
+The Research object\footnote{\inlinecode{\url{http://www.researchobject.org}}} is collection of meta-data ontologies, to describe aggregation of resources, or workflows, see \citeappendix{bechhofer13} and \citeappendix{belhajjame15}.
+It thus provides resources to link various workflow/analysis components (see Appendix \ref{appendix:existingtools}) into a final workflow.
+
+\citeappendix{bechhofer13} describes how a workflow in Apache Taverna (Appendix \ref{appendix:taverna}) can be translated into research objects.
+The important thing is that the research object concept is not specific to any special workflow, it is just a metadata bundle/standard which is only as robust in reproducing the result as the running workflow.
+
+
+
+
+
+\subsection{Sciunit (2015)}
+\label{appendix:sciunit}
+Sciunit\footnote{\inlinecode{\url{https://sciunit.run}}} \citeappendix{meng15} defines ``sciunit''s that keep the executed commands for an analysis and all the necessary programs and libraries that are used in those commands.
+It automatically parses all the executable files in the script and copies them, and their dependency libraries (down to the C library), into the sciunit.
+Because the sciunit contains all the programs and necessary libraries, it is possible to run it readily on other systems that have a similar CPU architecture.
+Sciunit was originally written in Python 2 (which reached its end-of-life on January 1st, 2020).
+Therefore Sciunit2 is a new implementation in Python 3.
+
+The main issue with Sciunit's approach is that the copied binaries are just black boxes: it is not possible to see how the used binaries from the initial system were built.
+This is a major problem for scientific projects: in principle (not knowing how the programs were built) and in practice (archiving a large volume sciunit for every step of the analysis requires a lot of storage space).
+
+
+
+
+
+\subsection{Umbrella (2015)}
+Umbrella \citeappendix{meng15b} is a high-level wrapper script for isolating the environment of the analysis.
+The user specifies the necessary operating system, and necessary packages for the analysis steps in various JSON files.
+Umbrella will then study the host operating system and the various necessary inputs (including data and software) through a process similar to Sciunits mentioned above to find the best environment isolator (maybe using Linux containerization, containers, or VMs).
+We could not find a URL to the source software of Umbrella (no source code repository is mentioned in the papers we reviewed above), but from the descriptions in \citeappendix{meng17}, it is written in Python 2.6 (which is now \new{deprecated}).
+
+
+
+
+
+\subsection{ReproZip (2016)}
+ReproZip\footnote{\inlinecode{\url{https://www.reprozip.org}}} \citeappendix{chirigati16} is a Python package that is designed to automatically track all the necessary data files, libraries, and environment variables into a single bundle.
+The tracking is done at the kernel system-call level, so any file that is accessed during the running of the project is identified.
+The tracked files can be packaged into a \inlinecode{.rpz} bundle that can then be unpacked into another system.
+
+ReproZip is therefore very good to take a ``snapshot'' of the running environment into a single file.
+However, the bundle can become very large when many/large datasets are involved, or if the software environment is complex (many dependencies).
+Since it copies the binary software libraries, it can only be run on systems with a similar CPU architecture to the original.
+Furthermore, ReproZip just copies the binary/compiled files used in a project, it has no way to know how the software was built.
+As mentioned in this paper, and also \citeappendix{oliveira18} the question of ``how'' the environment was built is critical for understanding the results, and simply having the binaries cannot necessarily be useful.
+
+For the data, it is similarly not possible to extract which data server they came from.
+Hence two projects that each use a 1-terabyte dataset will need a full copy of that same 1-terabyte file in their bundle, making long-term preservation extremely expensive.
+
+
+
+
+
+\subsection{Binder (2017)}
+Binder\footnote{\inlinecode{\url{https://mybinder.org}}} is used to containerize already existing Jupyter based processing steps.
+Users simply add a set of Binder-recognized configuration files to their repository and Binder will build a Docker image and install all the dependencies inside of it with Conda (the list of necessary packages comes from Conda).
+One good feature of Binder is that the imported Docker image must be tagged (something like a checksum).
+This will ensure that future/latest updates of the imported Docker image are not mistakenly used.
+However, it does not make sure that the Dockerfile used by the imported Docker image follows a similar convention also.
+Binder is used by \citeappendix{jones19}.
+
+
+
+
+
+\subsection{Gigantum (2017)}
+%% I took the date from their PiPy page, where the first version 0.1 was published in November 2016.
+Gigantum\footnote{\inlinecode{\url{https://gigantum.com}}} is a client/server system, in which the client is a web-based (graphical) interface that is installed as ``Gigantum Desktop'' within a Docker image.
+Gigantum uses Docker containers for an independent environment, Conda (or Pip) to install packages, Jupyter notebooks to edit and run code, and Git to store its history.
+Simply put, it is a high-level wrapper for combining these components.
+Internally, a Gigantum project is organized as files in a directory that can be opened without their own client.
+The file structure (which is under version control) includes codes, input data, and output data.
+As acknowledged on their own web page, this greatly reduces the speed of Git operations, transmitting, or archiving the project.
+Therefore there are size limits on the dataset/code sizes.
+However, there is one directory that can be used to store files that must not be tracked.
+
+
+
+
+
+\subsection{Popper (2017)}
+\label{appendix:popper}
+Popper\footnote{\inlinecode{\url{https://falsifiable.us}}} is a software implementation of the Popper Convention \citeappendix{jimenez17}.
+The Popper team's own solution is through a command-line program called \inlinecode{popper}.
+The \inlinecode{popper} program itself is written in Python.
+However, job management was initially based on the HashiCorp configuration language (HCL) because HCL was used by ``GitHub Actions'' to manage workflows.
+Moreover, from October 2019 GitHub changed to a custom YAML-based language, so Popper also deprecated HCL.
+This is an important issue when low-level choices are based on service providers.
+
+To start a project, the \inlinecode{popper} command-line program builds a template, or ``scaffold'', which is a minimal set of files that can be run.
+However, as of this writing, the scaffold is not complete: it lacks a manuscript and validation of outputs (as mentioned in the convention).
+By default, Popper runs in a Docker image (so root permissions are necessary and reproducible issues with Docker images have been discussed above), but Singularity is also supported.
+See Appendix \ref{appendix:independentenvironment} for more on containers, and Appendix \ref{appendix:highlevelinworkflow} for using high-level languages in the workflow.
+
+Popper does not comply with the completeness, minimal complexity, and including the narrative criteria.
+Moreover, the scaffold that is provided by Popper is an output of the program that is not directly under version control.
+Hence, tracking future changes in Popper and how they relate to the high-level projects that depend on it will be very hard.
+In Maneage, the same \inlinecode{maneage} git branch is shared by the developers and users; any new feature or change in Maneage can thus be directly tracked with Git when the high-level project merges their branch with Maneage.
+
+
+
+
+
+\subsection{Whole Tale (2017)}
+\label{appendix:wholetale}
+Whole Tale\footnote{\inlinecode{\url{https://wholetale.org}}} is a web-based platform for managing a project and organizing data provenance, see \citeappendix{brinckman17}.
+It uses online editors like Jupyter or RStudio (see Appendix \ref{appendix:editors}) that are encapsulated in a Docker container (see Appendix \ref{appendix:independentenvironment}).
+
+The web-based nature of Whole Tale's approach and its dependency on many tools (which have many dependencies themselves) is a major limitation for future reproducibility.
+For example, when following their own tutorial on ``Creating a new tale'', the provided Jupyter notebook could not be executed because of a dependency problem.
+This was reported to the authors as issue 113\footnote{\inlinecode{\url{https://github.com/whole-tale/wt-design-docs/issues/113}}}, but as all the second-order dependencies evolve, it is not hard to envisage such dependency incompatibilities being the primary issue for older projects on Whole Tale.
+Furthermore, the fact that a Tale is stored as a binary Docker container causes two important problems:
+1) it requires a very large storage capacity for every project that is hosted there, making it very expensive to scale if demand expands.
+2) It is not possible to see how the environment was built accurately (when the Dockerfile uses \inlinecode{apt}).
+This issue with Whole Tale (and generally all other solutions that only rely on preserving a container/VM) was also mentioned in \citeappendix{oliveira18}, for more on this, please see Appendix \ref{appendix:packagemanagement}.
+
+
+
+
+
+\subsection{Occam (2018)}
+Occam\footnote{\inlinecode{\url{https://occam.cs.pitt.edu}}} \citeappendix{oliveira18} is web-based application to preserve software and its execution.
+To achieve long-term reproducibility, Occam includes its own package manager (instructions to build software and their dependencies) to be in full control of the software build instructions, similar to Maneage.
+Besides Nix or Guix (which are primarily a package manager that can also do job management), Occam has been the only solution in our survey here that attempts to be complete in this aspect.
+
+However it is incomplete from the perspective of requirements: it works within a Docker image (that requires root permissions) and currently only runs on Debian-based, Red Hat based, and Arch-based GNU/Linux operating systems that respectively use the \inlinecode{apt}, \inlinecode{pacman} or \inlinecode{yum} package managers.
+It is also itself written in Python (version 3.4 or above).
+
+Furthermore, it does not account for the minimal complexity criteria because the instructions to build the software and their versions are not immediately viewable or modifiable by the user.
+Occam contains its own JSON database that should be parsed with its own custom program.
+The analysis phase of Occam is also through a drag-and-drop interface (similar to Taverna, Appendix \ref{appendix:taverna}) that is a web-based graphic user interface.
+All the connections between various phases of the analysis need to be pre-defined in a JSON file and manually linked in the GUI.
+Hence for complex data analysis operations that involve thousands of steps, it is not scalable.