aboutsummaryrefslogtreecommitdiff
path: root/tex/src/appendix-existing-solutions.tex
diff options
context:
space:
mode:
Diffstat (limited to 'tex/src/appendix-existing-solutions.tex')
-rw-r--r--tex/src/appendix-existing-solutions.tex37
1 files changed, 31 insertions, 6 deletions
diff --git a/tex/src/appendix-existing-solutions.tex b/tex/src/appendix-existing-solutions.tex
index 919f4e5..d7888ad 100644
--- a/tex/src/appendix-existing-solutions.tex
+++ b/tex/src/appendix-existing-solutions.tex
@@ -41,12 +41,22 @@ For more on Popper, please see Section \ref{appendix:popper}.
For improved reproducibility in Jupyter notebook users, \citeappendix{rule19} propose ten rules to improve reproducibility and also provide links to example implementations.
These can be very useful for users of Jupyter, but are not generic for non-Jupyter-based computational projects.
Some criteria (which are indeed very good in a more general context) do not directly relate to reproducibility, for example their Rule 1: ``Tell a Story for an Audience''.
-Generally, as reviewed in Sections \ref{sec:longevityofexisting} and \ref{appendix:jupyter}, Jupyter itself has many issues regarding reproducibility.
-
+Generally, as reviewed in
+\ifdefined\separatesupplement
+the main body of this paper (section on longevity of existing tools)
+\else
+Section \ref{sec:longevityofexisting}
+\fi
+and Section \ref{appendix:jupyter} (below), Jupyter itself has many issues regarding reproducibility.
To create Docker images, N\"ust et al. propose ``ten simple rules'' in \citeappendix{nust20}.
They recommend some issues that can indeed help increase the quality of Docker images and their production/usage, such as their rule 7 to ``mount datasets [only] at run time'' to separate the computational environment from the data.
However, long-term reproducibility of the images is not included as a criterion by these authors.
-For example, they recommend using base operating systems, with version identification limited to a single brief identifier such as \inlinecode{ubuntu:18.04}, which has a serious problem with longevity issues (Section \ref{sec:longevityofexisting}).
+For example, they recommend using base operating systems, with version identification limited to a single brief identifier such as \inlinecode{ubuntu:18.04}, which has a serious problem with longevity issues
+\ifdefined\separatesupplement
+(as discussed in the longevity of existing tools section of the main paper).
+\else
+(Section \ref{sec:longevityofexisting}).
+\fi
Furthermore, in their proof-of-concept Dockerfile (listing 1), \inlinecode{rocker} is used with a tag (not a digest), which can be problematic due to the high risk of ambiguity (as discussed in Section \ref{appendix:containers}).
\subsection{Reproducible Electronic Documents, RED (1992)}
@@ -82,7 +92,12 @@ Apache Taverna\footnote{\inlinecode{\url{https://taverna.incubator.apache.org}}}
A workflow is defined as a directed graph, where nodes are called ``processors''.
Each Processor transforms a set of inputs into a set of outputs and they are defined in the Scufl language (an XML-based language, were each step is an atomic task).
Other components of the workflow are ``Data links'' and ``Coordination constraints''.
-The main user interface is graphical, where users move processors in the given space and define links between their inputs outputs (manually constructing a lineage like Figure \ref{fig:datalineage}).
+The main user interface is graphical, where users move processors in the given space and define links between their inputs outputs (manually constructing a lineage like
+\ifdefined\separatesupplement
+lineage figure of the main paper.
+\else
+Figure \ref{fig:datalineage}).
+\fi
Taverna is only a workflow manager and is not integrated with a package manager, hence the versions of the used software can be different in different runs.
\citeappendix{zhao12} have studied the problem of workflow decays in Taverna.
@@ -136,7 +151,12 @@ This is a very nice example of the fragility of solutions that depend on archivi
\subsection{Kepler (2005)}
Kepler\footnote{\inlinecode{\url{https://kepler-project.org}}} \citeappendix{ludascher05} is a Java-based Graphic User Interface workflow management tool.
-Users drag-and-drop analysis components, called ``actors'', into a visual, directional graph, which is the workflow (similar to Figure \ref{fig:datalineage}).
+Users drag-and-drop analysis components, called ``actors'', into a visual, directional graph, which is the workflow (similar to
+\ifdefined\separatesupplement
+the lineage figure shown in the main paper.
+\else
+Figure \ref{fig:datalineage}).
+\fi
Each actor is connected to others through the Ptolemy II\footnote{\inlinecode{\url{https://ptolemy.berkeley.edu}}} \citeappendix{eker03}.
In many aspects, the usage of Kepler and its issues for long-term reproducibility is like Apache Taverna (see Section \ref{appendix:taverna}).
@@ -159,7 +179,12 @@ Its design is based on a change-based provenance model using a custom VisTrails
Since XML is a plane text format, as the user inspects the data and makes changes to the analysis, the changes are recorded as ``trails'' in the project's VisTrails repository that operates very much like common version control systems (see Appendix \ref{appendix:versioncontrol}).
.
However, even though XML is in plain text, it is very hard to edit manually.
-VisTrails therefore provides a graphic user interface with a visual representation of the project's inter-dependent steps (similar to Figure \ref{fig:datalineage}).
+VisTrails therefore provides a graphic user interface with a visual representation of the project's inter-dependent steps (similar to
+\ifdefined\separatesupplement
+the data lineage figure of the main paper).
+\else
+Figure \ref{fig:datalineage}).
+\fi
Besides the fact that it is no longer maintained, VisTrails does not control the software that is run, it only controls the sequence of steps that they are run in.