diff options
Diffstat (limited to 'tex/src/appendix-existing-solutions.tex')
-rw-r--r-- | tex/src/appendix-existing-solutions.tex | 37 |
1 files changed, 31 insertions, 6 deletions
diff --git a/tex/src/appendix-existing-solutions.tex b/tex/src/appendix-existing-solutions.tex index 919f4e5..d7888ad 100644 --- a/tex/src/appendix-existing-solutions.tex +++ b/tex/src/appendix-existing-solutions.tex @@ -41,12 +41,22 @@ For more on Popper, please see Section \ref{appendix:popper}. For improved reproducibility in Jupyter notebook users, \citeappendix{rule19} propose ten rules to improve reproducibility and also provide links to example implementations. These can be very useful for users of Jupyter, but are not generic for non-Jupyter-based computational projects. Some criteria (which are indeed very good in a more general context) do not directly relate to reproducibility, for example their Rule 1: ``Tell a Story for an Audience''. -Generally, as reviewed in Sections \ref{sec:longevityofexisting} and \ref{appendix:jupyter}, Jupyter itself has many issues regarding reproducibility. - +Generally, as reviewed in +\ifdefined\separatesupplement +the main body of this paper (section on longevity of existing tools) +\else +Section \ref{sec:longevityofexisting} +\fi +and Section \ref{appendix:jupyter} (below), Jupyter itself has many issues regarding reproducibility. To create Docker images, N\"ust et al. propose ``ten simple rules'' in \citeappendix{nust20}. They recommend some issues that can indeed help increase the quality of Docker images and their production/usage, such as their rule 7 to ``mount datasets [only] at run time'' to separate the computational environment from the data. However, long-term reproducibility of the images is not included as a criterion by these authors. -For example, they recommend using base operating systems, with version identification limited to a single brief identifier such as \inlinecode{ubuntu:18.04}, which has a serious problem with longevity issues (Section \ref{sec:longevityofexisting}). +For example, they recommend using base operating systems, with version identification limited to a single brief identifier such as \inlinecode{ubuntu:18.04}, which has a serious problem with longevity issues +\ifdefined\separatesupplement +(as discussed in the longevity of existing tools section of the main paper). +\else +(Section \ref{sec:longevityofexisting}). +\fi Furthermore, in their proof-of-concept Dockerfile (listing 1), \inlinecode{rocker} is used with a tag (not a digest), which can be problematic due to the high risk of ambiguity (as discussed in Section \ref{appendix:containers}). \subsection{Reproducible Electronic Documents, RED (1992)} @@ -82,7 +92,12 @@ Apache Taverna\footnote{\inlinecode{\url{https://taverna.incubator.apache.org}}} A workflow is defined as a directed graph, where nodes are called ``processors''. Each Processor transforms a set of inputs into a set of outputs and they are defined in the Scufl language (an XML-based language, were each step is an atomic task). Other components of the workflow are ``Data links'' and ``Coordination constraints''. -The main user interface is graphical, where users move processors in the given space and define links between their inputs outputs (manually constructing a lineage like Figure \ref{fig:datalineage}). +The main user interface is graphical, where users move processors in the given space and define links between their inputs outputs (manually constructing a lineage like +\ifdefined\separatesupplement +lineage figure of the main paper. +\else +Figure \ref{fig:datalineage}). +\fi Taverna is only a workflow manager and is not integrated with a package manager, hence the versions of the used software can be different in different runs. \citeappendix{zhao12} have studied the problem of workflow decays in Taverna. @@ -136,7 +151,12 @@ This is a very nice example of the fragility of solutions that depend on archivi \subsection{Kepler (2005)} Kepler\footnote{\inlinecode{\url{https://kepler-project.org}}} \citeappendix{ludascher05} is a Java-based Graphic User Interface workflow management tool. -Users drag-and-drop analysis components, called ``actors'', into a visual, directional graph, which is the workflow (similar to Figure \ref{fig:datalineage}). +Users drag-and-drop analysis components, called ``actors'', into a visual, directional graph, which is the workflow (similar to +\ifdefined\separatesupplement +the lineage figure shown in the main paper. +\else +Figure \ref{fig:datalineage}). +\fi Each actor is connected to others through the Ptolemy II\footnote{\inlinecode{\url{https://ptolemy.berkeley.edu}}} \citeappendix{eker03}. In many aspects, the usage of Kepler and its issues for long-term reproducibility is like Apache Taverna (see Section \ref{appendix:taverna}). @@ -159,7 +179,12 @@ Its design is based on a change-based provenance model using a custom VisTrails Since XML is a plane text format, as the user inspects the data and makes changes to the analysis, the changes are recorded as ``trails'' in the project's VisTrails repository that operates very much like common version control systems (see Appendix \ref{appendix:versioncontrol}). . However, even though XML is in plain text, it is very hard to edit manually. -VisTrails therefore provides a graphic user interface with a visual representation of the project's inter-dependent steps (similar to Figure \ref{fig:datalineage}). +VisTrails therefore provides a graphic user interface with a visual representation of the project's inter-dependent steps (similar to +\ifdefined\separatesupplement +the data lineage figure of the main paper). +\else +Figure \ref{fig:datalineage}). +\fi Besides the fact that it is no longer maintained, VisTrails does not control the software that is run, it only controls the sequence of steps that they are run in. |