diff options
| -rw-r--r-- | paper.tex | 43 | 
1 files changed, 15 insertions, 28 deletions
@@ -24,7 +24,7 @@ -\title{Maneage, a Customizable Framework for Managing Data Lineage} +\title{Acheiving long-term and archive-able reproducibility}  \author{\large\mpregular \authoraffil{Mohammad Akhlaghi}{1,2,3},          \large\mpregular \authoraffil{Ra\'ul Infante-Sainz}{1,2},          \large\mpregular \authoraffil{Boudewijn F. Roukema}{4,3}, @@ -52,35 +52,22 @@  %% Abstract  {\noindent\mpregular -  %%CONTEXT +  %% CONTEXT    Many reproducible workflow solutions have been proposed during recent decades. -  Most use high-level technology that is popular, providing immediate reproducibility that is not sustainable in the long term. -  This creates generational gaps between scientists and makes it hard to build upon previous work. -  Decades after their results are published, scientists lack the resources to re-write their project software. -  %% [This is probably the sentence in this section that could most easily be -  %%removed: it more or less repeats the basic issue of reproducibility.] -  %%AIM -  We aim to introduce a standard of reproducibility criteria that is more rigorous than those previously adopted. -  %%METHOD -  In this paper, we propose this new standard: completeness (no dependency beyond a POSIX-compatible operating system, no administrator privileges, and no network connection); modular and straightforward design; temporal provenance; scalability; and free software. -  %% I would suggest "free-licensed software" or "free-and-open-source -  %% software". RMS would scream at us, but the risk is that the editor (or -  %% reader) thinks of free-as-in-beer software. Alternatives include "free -  %% software (as in free speech)" - but that looks a bit too informal - or -  %% long expressions such as "free software (in the sense of the Free -  %% Software Definition). If we have enough words available, "software -  %% satisfying the Free Software Definition" would be clear and formal (but -  %% probably too specific, since there's also the Open Source Software -  %% Definition of the OSI, and Debian's DFSG). -  % -  %%RESULTS -  We demonstrate that these criteria are achievable by presenting a concrete example that satisfies these criteria. -  "Maneage" (managing+lineage) is stored in machine-actionable, human-readable plain-text format, with version-control, archival, automatic parsing to extract data provenance, and peer-reviewable paper verification. -  %% It can build its environment automatically or can be placed in a container as a binary blob for immediate/fast reproduction. -  %% [This sentence is probably sort of true for many systems, and is less critical to the "research question"; I suggest dropping it.] -  Maneage has already been used in several scientific publications including the present one, with snapshot \projectversion. +  Most use the popular high-level technologies when they were created, providing an immediate solution that is not sustainable in the long-term. +  However, decades later, scientists lack the resources to re-write their projects, while still being accountable for their results. +  This creates generational gaps and due to the obsolete technologies, impedes reproducibility or building upon previous work. +  %% AIM +  We aim to introduce a set of criteria to address this problem and demonstrate their practicality. +  %% METHOD +  The criteria are: completeness (i.e., no dependency beyond a POSIX-compatible operating system, no administrator privileges, no network connection and primarily stored in plain-text); modular design; temporal provenance; scalability; and free-and-open-source software. +  %% RESULTS +  Their usefulness is tested through an implementation: "Maneage" (managing+lineage). +  It is stored in machine-actionable and human-readable plain-text, enabling version-control, cheap archival, automatic parsing to extract data provenance, and peer-reviewable verification. +  Furthermore, we show that these criteria are not limited to long-term reproducibility but also the immediate/fast regime. +  It has been tested in several research publications including the present one, with snapshot \projectversion.    %%CONCLUSION -  Thus, it is realistic to require that reproducibility solutions satisfy our newly proposed standard. +  We conclude that requiring longevity from solutions is realistic, and discuss the benefits of these criteria for scientific progress.    \horizontalline  | 
