aboutsummaryrefslogtreecommitdiff
path: root/paper.tex
diff options
context:
space:
mode:
Diffstat (limited to 'paper.tex')
-rw-r--r--paper.tex15
1 files changed, 8 insertions, 7 deletions
diff --git a/paper.tex b/paper.tex
index 70b0346..7757060 100644
--- a/paper.tex
+++ b/paper.tex
@@ -231,12 +231,13 @@ To help in the comparison, the founding principles of Maneage are listed below.
(P1.7) requires no manual/human interaction and can run automatically \citep[according to][``\emph{a clerk can do it}'']{claerbout1992}.
\emph{Comparison with existing:} with many dependencies beyond POSIX, except for IPOL, none of the tools above are complete.
- For example, most recent solutions need Python (for the workflow, not the analysis), or rely on Jupyter notebooks.
- Because of the complexity in maintaining such high-level tools (see \ref{principle:complexity}), the primary storage format of most recent solutions is pre-built binary blobs like containers or virtual machines.
- They are large (Giga-bytes) and expensive to archive, furthermore third-party package managers (e.g., Conda), or the OS's (e.g., \inlinecode{apt} or \inlinecode{yum}) are used to setup its environment.
- However, exact versions of \emph{every software} are rarely included, and the servers remove old binaries, hence recreating them is very hard.
- Blobs also have a short lifespan, e.g., Docker containers made today can only run on Linux 3.2.x (after 2012), it does not promise longevity.
- A plain-text project is human-readable and parsable by any machine (even if it can't be executed) and consumes no less than a megabyte.
+ For example, the workflow of most recent solutions need Python or Jupyter notebooks.
+ Because of their complexity (see \ref{principle:complexity}), pre-built binary blobs like containers or virtual machines are the chosen storage format, which are large (Giga-bytes) and expensive to archive.
+ Furthermore, third-party package managers setup the environment, like Conda, or the OS's, like \inlinecode{apt} or \inlinecode{yum}.
+ However, exact versions of \emph{every software} are rarely included, and the servers remove old binaries, hence blobs are hard to recreate.
+ Blobs also have a short lifespan, e.g., Docker containers made today, may not be operable with future versions of Docker or Linux (currently Linux 3.2.x is the earliest supported version, released in 2012).
+ In general they mostly aim for short-term reproducibility.
+ A plain-text project is readable by humans and machines (even if it can't be executed) and consumes no less than a megabyte.
\item \label{principle:modularity}\textbf{Modularity:}
A project should be compartmentalized into independent modules with well-defined inputs/outputs having no side effects.
@@ -707,7 +708,7 @@ Once adopted on a wide scale, Maneage projects can be fed them into machine lear
Because Maneage is complete, even inputs (software algorithms and data selection), or failed tests can enter this optimization.
Furthermore, since it connects the analysis directly to the narrative and history of a project, this can include natural language processing.
On the other hand, parsers can be written over Maneage-derived projects for meta-research and data provenance studies, for example to generate Research Objects.
-For example if a bug is found in a software, all affected projects can be found and the scale of the effect can be measured.
+For example, when a bug is found in one software, all affected projects can be found and the scale of the effect can be measured.
Combined with Software Heritage, precise parts Maneage projects (high-level science) can be cited, at various points in its history (e.g., failed/abandoned tests).
Many components of Machine actionable data management plans \citep{miksa19b} can also be automatically filled with Maneage, useful for project PIs and and grant organizations.