aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--paper.tex43
1 files changed, 15 insertions, 28 deletions
diff --git a/paper.tex b/paper.tex
index 5d6b88e..0fb14d9 100644
--- a/paper.tex
+++ b/paper.tex
@@ -24,7 +24,7 @@
-\title{Maneage, a Customizable Framework for Managing Data Lineage}
+\title{Acheiving long-term and archive-able reproducibility}
\author{\large\mpregular \authoraffil{Mohammad Akhlaghi}{1,2,3},
\large\mpregular \authoraffil{Ra\'ul Infante-Sainz}{1,2},
\large\mpregular \authoraffil{Boudewijn F. Roukema}{4,3},
@@ -52,35 +52,22 @@
%% Abstract
{\noindent\mpregular
- %%CONTEXT
+ %% CONTEXT
Many reproducible workflow solutions have been proposed during recent decades.
- Most use high-level technology that is popular, providing immediate reproducibility that is not sustainable in the long term.
- This creates generational gaps between scientists and makes it hard to build upon previous work.
- Decades after their results are published, scientists lack the resources to re-write their project software.
- %% [This is probably the sentence in this section that could most easily be
- %%removed: it more or less repeats the basic issue of reproducibility.]
- %%AIM
- We aim to introduce a standard of reproducibility criteria that is more rigorous than those previously adopted.
- %%METHOD
- In this paper, we propose this new standard: completeness (no dependency beyond a POSIX-compatible operating system, no administrator privileges, and no network connection); modular and straightforward design; temporal provenance; scalability; and free software.
- %% I would suggest "free-licensed software" or "free-and-open-source
- %% software". RMS would scream at us, but the risk is that the editor (or
- %% reader) thinks of free-as-in-beer software. Alternatives include "free
- %% software (as in free speech)" - but that looks a bit too informal - or
- %% long expressions such as "free software (in the sense of the Free
- %% Software Definition). If we have enough words available, "software
- %% satisfying the Free Software Definition" would be clear and formal (but
- %% probably too specific, since there's also the Open Source Software
- %% Definition of the OSI, and Debian's DFSG).
- %
- %%RESULTS
- We demonstrate that these criteria are achievable by presenting a concrete example that satisfies these criteria.
- "Maneage" (managing+lineage) is stored in machine-actionable, human-readable plain-text format, with version-control, archival, automatic parsing to extract data provenance, and peer-reviewable paper verification.
- %% It can build its environment automatically or can be placed in a container as a binary blob for immediate/fast reproduction.
- %% [This sentence is probably sort of true for many systems, and is less critical to the "research question"; I suggest dropping it.]
- Maneage has already been used in several scientific publications including the present one, with snapshot \projectversion.
+ Most use the popular high-level technologies when they were created, providing an immediate solution that is not sustainable in the long-term.
+ However, decades later, scientists lack the resources to re-write their projects, while still being accountable for their results.
+ This creates generational gaps and due to the obsolete technologies, impedes reproducibility or building upon previous work.
+ %% AIM
+ We aim to introduce a set of criteria to address this problem and demonstrate their practicality.
+ %% METHOD
+ The criteria are: completeness (i.e., no dependency beyond a POSIX-compatible operating system, no administrator privileges, no network connection and primarily stored in plain-text); modular design; temporal provenance; scalability; and free-and-open-source software.
+ %% RESULTS
+ Their usefulness is tested through an implementation: "Maneage" (managing+lineage).
+ It is stored in machine-actionable and human-readable plain-text, enabling version-control, cheap archival, automatic parsing to extract data provenance, and peer-reviewable verification.
+ Furthermore, we show that these criteria are not limited to long-term reproducibility but also the immediate/fast regime.
+ It has been tested in several research publications including the present one, with snapshot \projectversion.
%%CONCLUSION
- Thus, it is realistic to require that reproducibility solutions satisfy our newly proposed standard.
+ We conclude that requiring longevity from solutions is realistic, and discuss the benefits of these criteria for scientific progress.
\horizontalline