aboutsummaryrefslogtreecommitdiff
path: root/paper.tex
diff options
context:
space:
mode:
Diffstat (limited to 'paper.tex')
-rw-r--r--paper.tex8
1 files changed, 4 insertions, 4 deletions
diff --git a/paper.tex b/paper.tex
index 20b463a..8c74578 100644
--- a/paper.tex
+++ b/paper.tex
@@ -230,10 +230,10 @@ Fewer explicit execution requirements would mean higher \emph{execution possibil
\textbf{Criterion 2: Modularity.}
A modular project enables and encourages independent modules with well-defined inputs/outputs and minimal side effects.
-\new{In terms of file management, a modular project will \emph{only} contain the hand-written project source of that particular high-level project: no automaticly generated files (e.g., built binaries), software source code (maintained separately), or data (archived separately) should be included.
+\new{In terms of file management, a modular project will \emph{only} contain the hand-written project source of that particular high-level project: no automatically generated files (e.g., built binaries), software source code (maintained separately), or data (archived separately) should be included.
The latter two (developing low-level software or collecting data) are separate projects in themselves and can be used in other high-level projects.}
Explicit communication between various modules enables optimizations on many levels:
-(1) Storage and archival cost (no duplicate software or data files): a snapshot of a project will be far less than a mega byte.
+(1) Storage and archival cost (no duplicate software or data files): a snapshot of a project should be less than a megabyte.
(2) Modular analysis components can be executed in parallel and avoid redundancies (when a dependency of a module has not changed, it will not be re-run).
(3) Usage in other projects.
(4) Easy debugging and improvements.
@@ -857,7 +857,7 @@ The main reply they got in the discussion is to build the Conda environment in a
However, as described in Appendix \ref{appendix:independentenvironment} containers just hide the reproducibility problem, they do not fix it: containers are not static and need to evolve (i.e., re-built) with the project.
Given these limitations, \citeappendix{uhse19} are forced to host their conda-packaged software as tarballs on a separate repository.
-Conda installs with a shell script that contains a binary-blob (+500 mega bytes, embedded in the shell script).
+Conda installs with a shell script that contains a binary-blob (+500 megabytes, embedded in the shell script).
This is the first major issue with Conda: from the shell script, it is not clear what is in this binary blob and what it does.
After installing Conda in any location, users can easily activate that environment by loading a special shell script into their shell.
However, the resulting environment is not fully independent of the host operating system as described below:
@@ -1585,7 +1585,7 @@ Furthermore, ReproZip just copies the binary/compiled files used in a project, i
As mentioned in this paper, and also \citeappendix{oliveira18} the question of ``how'' the environment was built is critical for understanding the results and simply having the binaries cannot necessarily be useful.
For the data, it is similarly not possible to extract which data server they came from.
-Hence two projects that each use a 1 terra byte dataset will need a full copy of that same 1 terra byte file in their bundle, making long term preservation extremely expensive.
+Hence two projects that each use a 1-terabyte dataset will need a full copy of that same 1-terabyte file in their bundle, making long term preservation extremely expensive.