aboutsummaryrefslogtreecommitdiff
path: root/tex/src
diff options
context:
space:
mode:
Diffstat (limited to 'tex/src')
-rw-r--r--tex/src/appendix-existing-solutions.tex46
-rw-r--r--tex/src/appendix-existing-tools.tex27
2 files changed, 36 insertions, 37 deletions
diff --git a/tex/src/appendix-existing-solutions.tex b/tex/src/appendix-existing-solutions.tex
index 1d515e4..4ca31d6 100644
--- a/tex/src/appendix-existing-solutions.tex
+++ b/tex/src/appendix-existing-solutions.tex
@@ -36,8 +36,8 @@ Therefore proprietary solutions like Code Ocean\footnote{\inlinecode{\url{https:
Other studies have also attempted to review existing reproducible solutions, for example, see Konkol et al.\citeappendix{konkol20}.
We have tried our best to test and read through the documentation of almost all reviewed solutions to a sufficient level.
-However, due to time constraints, it is inevitable that we may have missed some aspects the solutions, or incorrectly interpreted their behavior and outputs.
-In this case, please let us know and we will correct it in the text on the paper's Git repository and publish the updated PDF on \href{https://doi.org/10.5281/zenodo.3872247}{zenodo.3872247} (this is the version-independent DOI, that always points to the most recent Zenodo upload).
+However, due to time constraints, it is inevitable that we may have missed some aspects of the solutions, or incorrectly interpreted their behavior and outputs.
+In this case, please let us know and we will correct it in the text on the paper's Git repository and publish the updated (postprint) PDF on \href{https://doi.org/10.5281/zenodo.3872247}{zenodo.3872247} (this is the version-independent DOI, which always points to the most recent Zenodo upload).
\subsection{Suggested rules, checklists, or criteria}
@@ -79,8 +79,8 @@ Therefore, they lack a strong/clear completeness criterion (they mainly only sug
\label{appendix:red}
RED\footnote{\inlinecode{\url{http://sep.stanford.edu/doku.php?id=sep:research:reproducible}}} is the first attempt\cite{claerbout1992,schwab2000} that we could find on doing reproducible research.
It was developed within the Stanford Exploration Project (SEP) for Geophysics publications.
-Their introductions on the importance of reproducibility, resonate a lot with today's environment in computational sciences.
-In particular, the heavy investment one has to make in order to re-do another scientist's work, even in the same team.
+Their introductions on the importance of reproducibility resonate a lot with today's environment in computational sciences.
+In particular, the authors highlight the heavy investment one has to make in order to re-do another scientist's work, even in the same team.
RED also influenced other early reproducible works, for example Buckheit \& Donoho\citeappendix{buckheit1995}.
To orchestrate the various figures/results of a project, from 1990, they used ``Cake''\citeappendix{somogyi87}, a dialect of Make, for more on Make, see Appendix \ref{appendix:jobmanagement}.
@@ -88,14 +88,14 @@ As described in Schwab et al.\cite{schwab2000}, in the latter half of that decad
The basic idea behind RED's solution was to organize the analysis as independent steps, including the generation of plots, and organizing the steps through a Makefile.
This enabled all the results to be re-executed with a single command.
Several basic low-level Makefiles were included in the high-level/central Makefile.
-The reader/user of a project had to manually edit the central Makefile and set the variable \inlinecode{RESDIR} (result directory), this is the directory where built files are kept.
-The reader could later select which figures/parts of the project to reproduce by manually adding its name in the central Makefile, and running Make.
+The reader/user of a project had to manually edit the central Makefile and set the variable \inlinecode{RESDIR} (result directory), the directory where built files are kept.
+The reader could later select which figures/parts of the project to reproduce by manually adding their names to the central Makefile, and running Make.
-At the time, Make was already practiced by individual researchers and projects as a job orchestration tool, but SEP's innovation was to standardize it as an internal policy, and define conventions for the Makefiles to be consistent across projects.
+At the time, Make was already used by individual researchers and projects as a job orchestration tool, but SEP's innovation was to standardize it as an internal policy, and define conventions for the Makefiles to be consistent across projects.
This enabled new members to benefit from the already existing work of previous team members (who had graduated or moved to other jobs).
-However, RED only used the existing software of the host system, it had no means to control them.
+However, RED only used the existing software of the host system, with no means to control that software.
Therefore, with wider adoption, they confronted a ``versioning problem'' where the host's analysis software had different versions on different hosts, creating different results, or crashing\citeappendix{fomel09}.
-Hence in 2006 SEP moved to a new Python-based framework called Madagascar, see Appendix \ref{appendix:madagascar}.
+Hence, in 2006, SEP moved to a new Python-based framework called Madagascar; see Appendix \ref{appendix:madagascar}.
@@ -141,7 +141,7 @@ Since RSF contains program options also, the inputs and outputs of Madagascar's
In terms of completeness, as long as the user only uses Madagascar's own analysis programs, it is fairly complete at a high level (not lower-level OS libraries).
However, this comes at the expense of a large amount of bloatware (programs that one project may never need, but is forced to build), thus adding complexity.
Also, the linking between the analysis programs (of a certain user at a certain time) and future versions of that program (that is updated in time) is not immediately obvious.
-Furthermore, the blending of the workflow component with the low-level analysis components fails the modularity criteria.
+Furthermore, the blending of the workflow component with the low-level analysis components fails the modularity criterion.
@@ -299,9 +299,9 @@ It is based on the GridSpace2\footnote{\inlinecode{\url{http://dice.cyfronet.pl}
Through its web-based interface, viewers of a paper can actively experiment with the parameters of a published paper's displayed outputs (for example figures) through a web interface.
In their Figure 3, they nicely vizualize how the ``Executable Paper'' of Collage operates through two servers and a computing backend.
-Unfortunately in the paper no webpage has been provided follow up on the work and find its current status.
-A web search also only pointed us to its main paper\citeappendix{nowakowski11}.
-In the paper they do not discuss the major issue of software versioning and its verification to ensure that future updates to the backend do not affect the result; apparently it just assumes the software exist on the ``Computing backend''.
+Unfortunately in the paper no webpage has been provided to follow up on the work and find its current status.
+A web search only pointed us to its main paper\citeappendix{nowakowski11}.
+In the paper, the authors do not discuss the major issue of software versioning and its verification to ensure that future updates to the backend do not affect the result; apparently it just assumes that the software exists on the ``Computing backend''.
Since we could not access or test it, from the descriptions in the paper, it seems to be very similar to the modern day Jupyter notebook concept (see \ref{appendix:jupyter}), which had not yet been created in its current form in 2011.
So we expect similar longevity issues with Collage.
@@ -329,14 +329,14 @@ This enables the exact identification and citation of results.
The VRIs are automatically generated web-URLs that link to public VCR repositories containing the data, inputs, and scripts, that may be re-executed.
According to Gavish \& Donoho\citeappendix{gavish11}, the VRI generation routine has been implemented in MATLAB, R, and Python, although only the MATLAB version was available on the webpage in January 2021.
VCR also has special \LaTeX{} macros for loading the respective VRI into the generated PDF.
-In effect this is very similar to what have done at the end of the caption of
+In effect this is very similar to what we have done at the end of the caption of
\ifdefined\separatesupplement
the first figure in the main body of the paper,
\else
Figure \ref{fig:datalineage},
\fi
where you can click on the given Zenodo link and be taken to the raw data that created the plot.
-However, instead of a long and hard to read hash, we simply point to the plotted file's source as a Zenodo DOI (which has long term funding for longevity).
+However, instead of a long and hard to read hash, we point to the plotted file's source as a Zenodo DOI (which has long-term funding for longevity).
Unfortunately, most parts of the web page are not complete as of January 2021.
The VCR web page contains an example PDF\footnote{\inlinecode{\url{http://vcr.stanford.edu/paper.pdf}}} that is generated with this system, but the linked VCR repository\footnote{\inlinecode{\url{http://vcr-stat.stanford.edu}}} did not exist (again, as of January 2021).
@@ -519,14 +519,14 @@ This issue with Whole Tale (and generally all other solutions that only rely on
\subsection{Occam (2018)}
\label{appendix:occam}
Occam\footnote{\inlinecode{\url{https://occam.cs.pitt.edu}}}\citeappendix{oliveira18} is a web-based application to preserve software and its execution.
-To achieve long-term reproducibility, Occam includes its own package manager (instructions to build software and their dependencies) to be in full control of the software build instructions, similar to Maneage.
-Besides Nix or Guix (which are primarily a package manager that can also do job management), Occam has been the only solution in our survey here that attempts to be complete in this aspect.
+To achieve long-term reproducibility, Occam includes its own package manager (instructions to build software and its dependencies) in order to be in full control of the software build instructions, similarly to Maneage.
+Besides Nix or Guix (which are primarily a package manager that can also do job management), Occam is the only solution in our survey that attempts to be complete in this aspect.
-However it is incomplete from the perspective of requirements: it works within a Docker image (that requires root permissions) and currently only runs on Debian-based, Red Hat based, and Arch-based GNU/Linux operating systems that respectively use the \inlinecode{apt}, \inlinecode{pacman} or \inlinecode{yum} package managers.
+However, it is incomplete from the perspective of requirements: it works within a Docker image (that requires root permissions) and currently only runs on Debian-based, Red Hat based, and Arch-based GNU/Linux operating systems that respectively use the \inlinecode{apt}, \inlinecode{yum} or \inlinecode{pacman} package managers.
It is also itself written in Python (version 3.4 or above).
-Furthermore, it does not account for the minimal complexity criteria because the instructions to build the software and their versions are not immediately viewable or modifiable by the user.
-Occam contains its own JSON database that should be parsed with its own custom program.
-The analysis phase of Occam is also through a drag-and-drop interface (similar to Taverna, Appendix \ref{appendix:taverna}) that is a web-based graphic user interface.
-All the connections between various phases of the analysis need to be pre-defined in a JSON file and manually linked in the GUI.
-Hence for complex data analysis operations that involve thousands of steps, it is not scalable.
+Furthermore, it does not satisfy the minimal complexity criterion, because the instructions to build the software packages and their versions are not immediately viewable or modifiable by the user.
+Occam contains its own JSON database that should be parsed by Occam's own custom program.
+The analysis phase of Occam is through a drag-and-drop interface (similar to Taverna, Appendix \ref{appendix:taverna}), which is provided as a web-based graphic user interface.
+All the connections between the various phases of the analysis need to be pre-defined in a JSON file and manually linked in the GUI.
+Hence, for complex data analysis operations that involve thousands of steps, this is not scalable.
diff --git a/tex/src/appendix-existing-tools.tex b/tex/src/appendix-existing-tools.tex
index 43a0ef9..99a4284 100644
--- a/tex/src/appendix-existing-tools.tex
+++ b/tex/src/appendix-existing-tools.tex
@@ -129,7 +129,7 @@ Because it is highly intertwined with the way software is built and installed, t
Maneage (the solution proposed in this paper) also follows a similar approach of building and installing its own software environment within the host's file system, but without depending on it beyond the kernel.
However, unlike the third-party package manager mentioned above, Maneage'd software management is not detached from the specific research/analysis project: the instructions to build the full isolated software environment is maintained with the high-level analysis steps of the project, and the narrative paper/report of the project.
-This is fundamental to achieve the Completeness criteria.
+This is fundamental to achieve the completeness criterion.
@@ -191,7 +191,7 @@ That hash is then prefixed to the software's installation directory.
As an example Dolstra et al.\citeappendix{dolstra04}: if a certain build of GNU C Library 2.3.2 has a hash of \inlinecode{8d013ea878d0}, then it is installed under \inlinecode{/nix/store/8d013ea878d0-glibc-2.3.2} and all software that is compiled with it (and thus need it to run) will link to this unique address.
This allows for multiple versions of the software to co-exist on the system, while keeping an accurate dependency tree.
-As mentioned in Court{\'e}s \& Wurmus\citeappendix{courtes15}, one major caveat with using these package managers is that they require a daemon with root privileges (failing our completeness criteria).
+As mentioned in Court{\'e}s \& Wurmus\citeappendix{courtes15}, one major caveat with using these package managers is that they require a daemon with root privileges (failing our completeness criterion).
This is necessary ``to use the Linux kernel container facilities that allow it to isolate build processes and maximize build reproducibility''.
This is because the focus in Nix or Guix is to create bitwise reproducible software binaries and this is necessary for the security or development perspectives.
However, in a non-computer-science analysis (for example natural sciences), the main aim is reproducible \emph{results} that can also be created with the same software version that may not be bitwise identical (for example when they are installed in other locations, because the installation location is hard-coded in the software binary or for a different CPU architecture).
@@ -278,7 +278,7 @@ In conclusion for all package managers, there are two common issues regarding ge
This is another consequence of the detachment of the package manager from the project doing the analysis.
\end{itemize}
-Addressing these issues has been the basic reason behind the proposed solution: based on the completeness criteria, instructions to download and build the packages are included within the actual science project, and no special/new syntax/language is used.
+Addressing these issues has been the basic reason behind the proposed solution: based on the completeness criterion, instructions to download and build the packages are included within the actual science project, and no special/new syntax/language is used.
Software download, built and installation is done with the same language/syntax that researchers manage their research: using the shell (by default GNU Bash in Maneage) for low-level steps and Make (by default, GNU Make in Maneage) for job management.
@@ -541,7 +541,7 @@ To solve this problem there are advanced text editors like GNU Emacs that allow
However, editors that can execute or debug the source (like GNU Emacs), just run external programs for these jobs (for example GNU GCC, or GNU GDB), just as if those programs was called from outside the editor.
With text editors, the final edited file is independent of the actual editor and can be further edited with another editor, or executed without it.
-This is a very important feature and corresponds to the modularity criteria of this paper.
+This is a very important feature and corresponds to the modularity criterion of this paper.
This type of modularity is not commonly present for other solutions mentioned below (the source can only be edited/run in a specific browser).
Another very important advantage of advanced text editors like GNU Emacs or Vi(m) is that they can also be run without a graphic user interface, directly on the command-line.
This feature is critical when working on remote systems, in particular high performance computing (HPC) facilities that do not provide a graphic user interface.
@@ -573,7 +573,7 @@ However, Jupyter currently only supports a linear run of the cells: always from
It is possible to manually execute only one cell, but the previous/next cells that may depend on it, also have to be manually run (a common source of human error, and frustration for complex operations).
Integration of directional graph features (dependencies between the cells) into Jupyter has been discussed, but as of this publication, there is no plan to implement it (see Jupyter's GitHub issue 1175\footnote{\inlinecode{\url{https://github.com/jupyter/notebook/issues/1175}}}).
-The fact that the \inlinecode{.ipynb} format stores narrative text, code, and multi-media visualization of the outputs in one file, is another major hurdle and against the modularity criteria proposed here.
+The fact that the \inlinecode{.ipynb} format stores narrative text, code, and multi-media visualization of the outputs in one file, is another major hurdle and against the modularity criterion proposed here.
The files can easily become very large (in volume/bytes) and hard to read when the Jupyter web-interface is not accessible.
Both are critical for scientific processing, especially the latter: when a web browser with proper JavaScript features is not available (can happen in a few years).
This is further exacerbated by the fact that binary data (for example images) are not directly supported in JSON and have to be converted into a much less memory-efficient textual encoding.
@@ -606,7 +606,7 @@ In this context, it is more focused on the latter.
Because of their nature, higher-level languages evolve very fast, creating incompatibilities on the way.
The most prominent example is the transition from Python 2 (released in 2000) to Python 3 (released in 2008).
Python 3 was incompatible with Python 2 and it was decided to abandon the former by 2015.
-However, due to community pressure, this was delayed to January 1st, 2020.
+However, due to community pressure, this was delayed to 1 January 2020.
The end-of-life of Python 2 caused many problems for projects that had invested heavily in Python 2: all their previous work had to be translated, for example, see Jenness\citeappendix{jenness17} or Appendix \ref{appendix:sciunit}.
Some projects could not make this investment and their developers decided to stop maintaining it, for example VisTrails (see Appendix \ref{appendix:vistrails}).
@@ -617,7 +617,7 @@ This is not particular to Python, a similar evolution occurred in Perl: in 2000
However, the Perl community decided not to abandon Perl 5, and Perl 6 was eventually defined as a new language that is now officially called ``Raku'' (\url{https://raku.org}).
It is unreasonably optimistic to assume that high-level languages will not undergo similar incompatible evolutions in the (not too distant) future.
-For industial software developers, this is not a major problem: non-scientific software, and the general population's usage of them, has a similarly fast evolution and shelf-life.
+For industrial software developers, this is not a major problem: non-scientific software, and the general population's usage of them, has a similarly fast evolution and shelf-life.
Hence, it is rarely (if ever) necessary to look into industrial/business codes that are more than a couple of years old.
However, in the sciences (which are commonly funded by public money) this is a major caveat for the longer-term usability of solutions.
@@ -627,10 +627,10 @@ Beyond technical, low-level, problems for the developers mentioned above, this c
\subsubsection{Dependency hell}
The evolution of high-level languages is extremely fast, even within one version.
-For example, packages that are written in Python 3 often only work with a special interval of Python 3 versions.
-For example Snakemake and Occam which can only be run on Python versions 3.4 and 3.5 or newer respectively, see Appendices \ref{appendix:snakemake} and \ref{appendix:occam}.
-This is not just limited to the core language, much faster changes occur in their higher-level libraries.
-For example version 1.9 of Numpy (Python's numerical analysis module) discontinued support for Numpy's predecessor (called Numeric), causing many problems for scientific users\citeappendix{hinsen15}.
+For example, packages that are written in Python 3 often only work with a specific interval of Python 3 versions.
+For example, Snakemake and Occam, which can only be run on Python versions 3.4 and 3.5 or newer respectively, see Appendices \ref{appendix:snakemake} and \ref{appendix:occam}.
+This is not just limited to the core language; much faster changes occur in their higher-level libraries.
+For example, version 1.9 of Numpy (Python's numerical analysis module) discontinued support for Numpy's predecessor (called Numeric), causing many problems for scientific users\citeappendix{hinsen15}.
On the other hand, the dependency graph of tools written in high-level languages is often extremely complex.
For example, see Figure 1 of Alliez et al.\cite{alliez19}.
@@ -640,10 +640,9 @@ Acceptable version intervals between the dependencies will cause incompatibiliti
Since a domain scientist does not always have the resources/knowledge to modify the conflicting part(s), many are forced to create complex environments with different versions of Python and pass the data between them (for example just to use the work of a previous PhD student in the team).
This greatly increases the complexity of the project, even for the principal author.
A well-designed reproducible workflow like Maneage that has no dependencies beyond a C compiler in a Unix-like operating system can account for this.
-However, when the actual workflow system (not the analysis software) is written in a high-level language like the examples above.
+However, when the actual workflow system (not the analysis software) is written in a high-level language like the examples above, this will cause problems.
-Another relevant example of the dependency hell is mentioned here:
-merely installing the Python installer (\inlinecode{pip}) on a Debian system (with \inlinecode{apt install pip2} for Python 2 packages), required 32 other packages as dependencies.
+Another relevant example of the dependency hell is the following: installing the Python installer (\inlinecode{pip}) on a Debian system (with \inlinecode{apt install pip2} for Python 2 packages) required 32 other packages as dependencies.
\inlinecode{pip} is necessary to install Popper and Sciunit (Appendices \ref{appendix:popper} and \ref{appendix:sciunit}).
As of this writing, the \inlinecode{pip3 install popper} and \inlinecode{pip2 install sciunit2} commands for installing each, required 17 and 26 Python modules as dependencies.
It is impossible to run either of these solutions if there is a single conflict in this very complex dependency graph.