aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMohammad Akhlaghi <mohammad@akhlaghi.org>2020-05-01 03:29:19 +0100
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2020-05-01 03:39:38 +0100
commit2e525d9a1e1bd6829fb97ca2f1e39309852c179c (patch)
tree18e3f4932c4a46101348c07bbc8af1d68c5aff72
parent8f14213f53ce59a7bc13a8018b5b937f53976fda (diff)
Edited abstract for more clarity, still in the 250 word limit
Boud's suggestions in the previous commit were great and really helped in improving the tone of the abstract (and thus the whole paper shortly!), better putting it in the big picture. I had forgot to give the exact word limit (which was 250), so Boud had set it to a very conservative value of 190, I added around 22 words to better highlight the points we want to make, while still being below the limit.
-rw-r--r--paper.tex43
1 files changed, 15 insertions, 28 deletions
diff --git a/paper.tex b/paper.tex
index 5d6b88e..0fb14d9 100644
--- a/paper.tex
+++ b/paper.tex
@@ -24,7 +24,7 @@
-\title{Maneage, a Customizable Framework for Managing Data Lineage}
+\title{Acheiving long-term and archive-able reproducibility}
\author{\large\mpregular \authoraffil{Mohammad Akhlaghi}{1,2,3},
\large\mpregular \authoraffil{Ra\'ul Infante-Sainz}{1,2},
\large\mpregular \authoraffil{Boudewijn F. Roukema}{4,3},
@@ -52,35 +52,22 @@
%% Abstract
{\noindent\mpregular
- %%CONTEXT
+ %% CONTEXT
Many reproducible workflow solutions have been proposed during recent decades.
- Most use high-level technology that is popular, providing immediate reproducibility that is not sustainable in the long term.
- This creates generational gaps between scientists and makes it hard to build upon previous work.
- Decades after their results are published, scientists lack the resources to re-write their project software.
- %% [This is probably the sentence in this section that could most easily be
- %%removed: it more or less repeats the basic issue of reproducibility.]
- %%AIM
- We aim to introduce a standard of reproducibility criteria that is more rigorous than those previously adopted.
- %%METHOD
- In this paper, we propose this new standard: completeness (no dependency beyond a POSIX-compatible operating system, no administrator privileges, and no network connection); modular and straightforward design; temporal provenance; scalability; and free software.
- %% I would suggest "free-licensed software" or "free-and-open-source
- %% software". RMS would scream at us, but the risk is that the editor (or
- %% reader) thinks of free-as-in-beer software. Alternatives include "free
- %% software (as in free speech)" - but that looks a bit too informal - or
- %% long expressions such as "free software (in the sense of the Free
- %% Software Definition). If we have enough words available, "software
- %% satisfying the Free Software Definition" would be clear and formal (but
- %% probably too specific, since there's also the Open Source Software
- %% Definition of the OSI, and Debian's DFSG).
- %
- %%RESULTS
- We demonstrate that these criteria are achievable by presenting a concrete example that satisfies these criteria.
- "Maneage" (managing+lineage) is stored in machine-actionable, human-readable plain-text format, with version-control, archival, automatic parsing to extract data provenance, and peer-reviewable paper verification.
- %% It can build its environment automatically or can be placed in a container as a binary blob for immediate/fast reproduction.
- %% [This sentence is probably sort of true for many systems, and is less critical to the "research question"; I suggest dropping it.]
- Maneage has already been used in several scientific publications including the present one, with snapshot \projectversion.
+ Most use the popular high-level technologies when they were created, providing an immediate solution that is not sustainable in the long-term.
+ However, decades later, scientists lack the resources to re-write their projects, while still being accountable for their results.
+ This creates generational gaps and due to the obsolete technologies, impedes reproducibility or building upon previous work.
+ %% AIM
+ We aim to introduce a set of criteria to address this problem and demonstrate their practicality.
+ %% METHOD
+ The criteria are: completeness (i.e., no dependency beyond a POSIX-compatible operating system, no administrator privileges, no network connection and primarily stored in plain-text); modular design; temporal provenance; scalability; and free-and-open-source software.
+ %% RESULTS
+ Their usefulness is tested through an implementation: "Maneage" (managing+lineage).
+ It is stored in machine-actionable and human-readable plain-text, enabling version-control, cheap archival, automatic parsing to extract data provenance, and peer-reviewable verification.
+ Furthermore, we show that these criteria are not limited to long-term reproducibility but also the immediate/fast regime.
+ It has been tested in several research publications including the present one, with snapshot \projectversion.
%%CONCLUSION
- Thus, it is realistic to require that reproducibility solutions satisfy our newly proposed standard.
+ We conclude that requiring longevity from solutions is realistic, and discuss the benefits of these criteria for scientific progress.
\horizontalline