aboutsummaryrefslogtreecommitdiff
path: root/paper.tex
diff options
context:
space:
mode:
authorBoud Roukema <boud@cosmo.torun.pl>2020-04-19 16:55:51 +0200
committerBoud Roukema <boud@cosmo.torun.pl>2020-04-19 16:55:51 +0200
commit6e97fdde49333084d9cc9185f29c38a43d0b460f (patch)
treeaca45efe8a92d1f5575bd633daccce1c509abad5 /paper.tex
parent49cdb178c5ba2712bf922ab178d0379949381c5f (diff)
Principles - P1 - Complete
Compression by about 40 words. Updating python2 to python3 is often nothing more than modifying print statements, so removing this doesn't weaken the text by much. Re-creation helps avoid thinking of watching movies, going to the beach, reading a novel, when seeing the word "recreation": https://en.wiktionary.org/wiki/recreation#Usage_notes The matplotlib sentence was not so clear: now it's a bit shorter and hopefully clearer.
Diffstat (limited to 'paper.tex')
-rw-r--r--paper.tex38
1 files changed, 19 insertions, 19 deletions
diff --git a/paper.tex b/paper.tex
index b0b3242..7cf43bc 100644
--- a/paper.tex
+++ b/paper.tex
@@ -210,30 +210,30 @@ A detailed list of principles shows how Maneage is unique compared to these othe
(i) does not depend on anything beyond the Portable operating system Interface (POSIX),
(ii) does not affect the host system,
(iii) does not require root/administrator privileges,
- (iv) does not need an internet connection (when its inputs are on the file-system), and
- (v) is stored in a format that does not require any software beyond POSIX tools to open, parse or execute.
+ (iv) does not need an internet connection (its inputs can be stored on the local file system), and
+ (v) is stored in a format that only needs POSIX tools to open, parse or execute.
A complete project can
(i) automatically access the inputs (see definition \ref{definition:input}),
- (ii) build its necessary software,
+ (ii) build the software it needs,
(iii) do the analysis (run the software on the data) and
- (iv) create the final narrative report/paper as well as its visualizations, in its final format (usually in PDF or HTML).
- No manual/human interaction is required to run a complete project, as \citet{claerbout1992} put it: ``\emph{a clerk can do it}''.
- Generally, manual intervention in any of the steps above, or an interactive interface, constitutes an incompleteness.
- Lastly, the plain-text format is particularly important because any other storage format will require specialized software \emph{before} the project can be opened.
-
- \emph{Comparison with existing:} Except for IPOL, none of the tools above are complete as they all have many dependencies far beyond POSIX.
- For example, most of the recent ones use Python (the project/workflow, not the analysis), or rely on Jupyter notebooks.
- Such high-level tools have very short lifespans and evolve very fast (e.g., Python 2 code cannot run with Python 3).
- They also have a very complex dependency trees, making them extremely vulnerable and hard to maintain. For example, see Figure 1 of \citet{alliez19} on the dependency tree of Matplotlib (one of the smaller Jupyter dependencies).
- It is important to remember that the longevity of a workflow (not the analysis itself) is determined by its shortest-lived dependency.
-
- Many existing tools therefore do not attempt to store the project as plain text, but pre-built binary blobs (containers or virtual machines) that can rarely be recreated and also have a short lifespan.
- Their recreation is hard because most are built with the package manager of the blob's OS, or Conda.
- Both are highly dependent on the time they are executed: precise versions are rarely stored, and the servers remove old binaries.
- Docker containers are a good example of their short lifespan: Docker only runs on long-term support OSs, not older.
+ (iv) create the final narrative report/paper and its visualizations in their final format (e.g., PDF/HTML).
+ No manual/human interaction is required to run a complete project (``\emph{a clerk can do it}''; \citet{claerbout1992}).
+ A need for manual intervention in any of the steps above, or an interactive interface, constitutes incompleteness.
+ Plain-text format is vital because any other storage format will require specialized software \emph{before} the project can be opened.
+
+ \emph{Comparison with existing:} Except for IPOL, none of the tools above are complete, as they all have many dependencies far beyond POSIX.
+ For example, most recent projects use Python (for project/workflow, not analysis), or rely on Jupyter notebooks.
+ Such high-level tools have short lifespans and evolve fast.
+ They also have complex dependency trees, making them vulnerable and hard to maintain. For example, see the dependency tree of Matlplotlib (one of the smaller Jupyter dependencies; \citet[][Fig.~1]{alliez19}).
+ The longevity of a workflow (not the analysis itself) is determined by its shortest-lived dependency.
+
+ Many existing tools do not store the project as plain text, but instead provide pre-built binary blobs (containers or virtual machines) that can rarely be re-created; these have a short lifespan.
+ Their re-creation is difficult because most are built with the package manager of the blob's OS, or Conda.
+ Both are highly dependent on the date of execution: precise versions are rarely stored, and the servers remove old binaries.
+ Docker containers are a good example of the short lifespan problem: Docker only runs on long-term support OSs, not older ones.
In GNU/Linux systems, this corresponds to Linux kernel 3.2.x (initially released in 2012) and above.
- As plain-text, besides being extremely low volume ($\sim100$ kilobytes), the project is still human-readable and parsable by any machine, even when it can't be executed.
+ A plain-text project, besides being extremely low volume ($\sim100$ kilobytes), is human-readable and parsable by any machine, even if it can't be executed.
\item \label{principle:modularity}\textbf{Modularity:}
A project should be compartmentalized or partitioned into independent modules or components with well-defined inputs/outputs having no side-effects.