aboutsummaryrefslogtreecommitdiff
path: root/tex/src/appendix-existing-solutions.tex
diff options
context:
space:
mode:
Diffstat (limited to 'tex/src/appendix-existing-solutions.tex')
-rw-r--r--tex/src/appendix-existing-solutions.tex46
1 files changed, 23 insertions, 23 deletions
diff --git a/tex/src/appendix-existing-solutions.tex b/tex/src/appendix-existing-solutions.tex
index 1d515e4..4ca31d6 100644
--- a/tex/src/appendix-existing-solutions.tex
+++ b/tex/src/appendix-existing-solutions.tex
@@ -36,8 +36,8 @@ Therefore proprietary solutions like Code Ocean\footnote{\inlinecode{\url{https:
Other studies have also attempted to review existing reproducible solutions, for example, see Konkol et al.\citeappendix{konkol20}.
We have tried our best to test and read through the documentation of almost all reviewed solutions to a sufficient level.
-However, due to time constraints, it is inevitable that we may have missed some aspects the solutions, or incorrectly interpreted their behavior and outputs.
-In this case, please let us know and we will correct it in the text on the paper's Git repository and publish the updated PDF on \href{https://doi.org/10.5281/zenodo.3872247}{zenodo.3872247} (this is the version-independent DOI, that always points to the most recent Zenodo upload).
+However, due to time constraints, it is inevitable that we may have missed some aspects of the solutions, or incorrectly interpreted their behavior and outputs.
+In this case, please let us know and we will correct it in the text on the paper's Git repository and publish the updated (postprint) PDF on \href{https://doi.org/10.5281/zenodo.3872247}{zenodo.3872247} (this is the version-independent DOI, which always points to the most recent Zenodo upload).
\subsection{Suggested rules, checklists, or criteria}
@@ -79,8 +79,8 @@ Therefore, they lack a strong/clear completeness criterion (they mainly only sug
\label{appendix:red}
RED\footnote{\inlinecode{\url{http://sep.stanford.edu/doku.php?id=sep:research:reproducible}}} is the first attempt\cite{claerbout1992,schwab2000} that we could find on doing reproducible research.
It was developed within the Stanford Exploration Project (SEP) for Geophysics publications.
-Their introductions on the importance of reproducibility, resonate a lot with today's environment in computational sciences.
-In particular, the heavy investment one has to make in order to re-do another scientist's work, even in the same team.
+Their introductions on the importance of reproducibility resonate a lot with today's environment in computational sciences.
+In particular, the authors highlight the heavy investment one has to make in order to re-do another scientist's work, even in the same team.
RED also influenced other early reproducible works, for example Buckheit \& Donoho\citeappendix{buckheit1995}.
To orchestrate the various figures/results of a project, from 1990, they used ``Cake''\citeappendix{somogyi87}, a dialect of Make, for more on Make, see Appendix \ref{appendix:jobmanagement}.
@@ -88,14 +88,14 @@ As described in Schwab et al.\cite{schwab2000}, in the latter half of that decad
The basic idea behind RED's solution was to organize the analysis as independent steps, including the generation of plots, and organizing the steps through a Makefile.
This enabled all the results to be re-executed with a single command.
Several basic low-level Makefiles were included in the high-level/central Makefile.
-The reader/user of a project had to manually edit the central Makefile and set the variable \inlinecode{RESDIR} (result directory), this is the directory where built files are kept.
-The reader could later select which figures/parts of the project to reproduce by manually adding its name in the central Makefile, and running Make.
+The reader/user of a project had to manually edit the central Makefile and set the variable \inlinecode{RESDIR} (result directory), the directory where built files are kept.
+The reader could later select which figures/parts of the project to reproduce by manually adding their names to the central Makefile, and running Make.
-At the time, Make was already practiced by individual researchers and projects as a job orchestration tool, but SEP's innovation was to standardize it as an internal policy, and define conventions for the Makefiles to be consistent across projects.
+At the time, Make was already used by individual researchers and projects as a job orchestration tool, but SEP's innovation was to standardize it as an internal policy, and define conventions for the Makefiles to be consistent across projects.
This enabled new members to benefit from the already existing work of previous team members (who had graduated or moved to other jobs).
-However, RED only used the existing software of the host system, it had no means to control them.
+However, RED only used the existing software of the host system, with no means to control that software.
Therefore, with wider adoption, they confronted a ``versioning problem'' where the host's analysis software had different versions on different hosts, creating different results, or crashing\citeappendix{fomel09}.
-Hence in 2006 SEP moved to a new Python-based framework called Madagascar, see Appendix \ref{appendix:madagascar}.
+Hence, in 2006, SEP moved to a new Python-based framework called Madagascar; see Appendix \ref{appendix:madagascar}.
@@ -141,7 +141,7 @@ Since RSF contains program options also, the inputs and outputs of Madagascar's
In terms of completeness, as long as the user only uses Madagascar's own analysis programs, it is fairly complete at a high level (not lower-level OS libraries).
However, this comes at the expense of a large amount of bloatware (programs that one project may never need, but is forced to build), thus adding complexity.
Also, the linking between the analysis programs (of a certain user at a certain time) and future versions of that program (that is updated in time) is not immediately obvious.
-Furthermore, the blending of the workflow component with the low-level analysis components fails the modularity criteria.
+Furthermore, the blending of the workflow component with the low-level analysis components fails the modularity criterion.
@@ -299,9 +299,9 @@ It is based on the GridSpace2\footnote{\inlinecode{\url{http://dice.cyfronet.pl}
Through its web-based interface, viewers of a paper can actively experiment with the parameters of a published paper's displayed outputs (for example figures) through a web interface.
In their Figure 3, they nicely vizualize how the ``Executable Paper'' of Collage operates through two servers and a computing backend.
-Unfortunately in the paper no webpage has been provided follow up on the work and find its current status.
-A web search also only pointed us to its main paper\citeappendix{nowakowski11}.
-In the paper they do not discuss the major issue of software versioning and its verification to ensure that future updates to the backend do not affect the result; apparently it just assumes the software exist on the ``Computing backend''.
+Unfortunately in the paper no webpage has been provided to follow up on the work and find its current status.
+A web search only pointed us to its main paper\citeappendix{nowakowski11}.
+In the paper, the authors do not discuss the major issue of software versioning and its verification to ensure that future updates to the backend do not affect the result; apparently it just assumes that the software exists on the ``Computing backend''.
Since we could not access or test it, from the descriptions in the paper, it seems to be very similar to the modern day Jupyter notebook concept (see \ref{appendix:jupyter}), which had not yet been created in its current form in 2011.
So we expect similar longevity issues with Collage.
@@ -329,14 +329,14 @@ This enables the exact identification and citation of results.
The VRIs are automatically generated web-URLs that link to public VCR repositories containing the data, inputs, and scripts, that may be re-executed.
According to Gavish \& Donoho\citeappendix{gavish11}, the VRI generation routine has been implemented in MATLAB, R, and Python, although only the MATLAB version was available on the webpage in January 2021.
VCR also has special \LaTeX{} macros for loading the respective VRI into the generated PDF.
-In effect this is very similar to what have done at the end of the caption of
+In effect this is very similar to what we have done at the end of the caption of
\ifdefined\separatesupplement
the first figure in the main body of the paper,
\else
Figure \ref{fig:datalineage},
\fi
where you can click on the given Zenodo link and be taken to the raw data that created the plot.
-However, instead of a long and hard to read hash, we simply point to the plotted file's source as a Zenodo DOI (which has long term funding for longevity).
+However, instead of a long and hard to read hash, we point to the plotted file's source as a Zenodo DOI (which has long-term funding for longevity).
Unfortunately, most parts of the web page are not complete as of January 2021.
The VCR web page contains an example PDF\footnote{\inlinecode{\url{http://vcr.stanford.edu/paper.pdf}}} that is generated with this system, but the linked VCR repository\footnote{\inlinecode{\url{http://vcr-stat.stanford.edu}}} did not exist (again, as of January 2021).
@@ -519,14 +519,14 @@ This issue with Whole Tale (and generally all other solutions that only rely on
\subsection{Occam (2018)}
\label{appendix:occam}
Occam\footnote{\inlinecode{\url{https://occam.cs.pitt.edu}}}\citeappendix{oliveira18} is a web-based application to preserve software and its execution.
-To achieve long-term reproducibility, Occam includes its own package manager (instructions to build software and their dependencies) to be in full control of the software build instructions, similar to Maneage.
-Besides Nix or Guix (which are primarily a package manager that can also do job management), Occam has been the only solution in our survey here that attempts to be complete in this aspect.
+To achieve long-term reproducibility, Occam includes its own package manager (instructions to build software and its dependencies) in order to be in full control of the software build instructions, similarly to Maneage.
+Besides Nix or Guix (which are primarily a package manager that can also do job management), Occam is the only solution in our survey that attempts to be complete in this aspect.
-However it is incomplete from the perspective of requirements: it works within a Docker image (that requires root permissions) and currently only runs on Debian-based, Red Hat based, and Arch-based GNU/Linux operating systems that respectively use the \inlinecode{apt}, \inlinecode{pacman} or \inlinecode{yum} package managers.
+However, it is incomplete from the perspective of requirements: it works within a Docker image (that requires root permissions) and currently only runs on Debian-based, Red Hat based, and Arch-based GNU/Linux operating systems that respectively use the \inlinecode{apt}, \inlinecode{yum} or \inlinecode{pacman} package managers.
It is also itself written in Python (version 3.4 or above).
-Furthermore, it does not account for the minimal complexity criteria because the instructions to build the software and their versions are not immediately viewable or modifiable by the user.
-Occam contains its own JSON database that should be parsed with its own custom program.
-The analysis phase of Occam is also through a drag-and-drop interface (similar to Taverna, Appendix \ref{appendix:taverna}) that is a web-based graphic user interface.
-All the connections between various phases of the analysis need to be pre-defined in a JSON file and manually linked in the GUI.
-Hence for complex data analysis operations that involve thousands of steps, it is not scalable.
+Furthermore, it does not satisfy the minimal complexity criterion, because the instructions to build the software packages and their versions are not immediately viewable or modifiable by the user.
+Occam contains its own JSON database that should be parsed by Occam's own custom program.
+The analysis phase of Occam is through a drag-and-drop interface (similar to Taverna, Appendix \ref{appendix:taverna}), which is provided as a web-based graphic user interface.
+All the connections between the various phases of the analysis need to be pre-defined in a JSON file and manually linked in the GUI.
+Hence, for complex data analysis operations that involve thousands of steps, this is not scalable.