diff options
-rw-r--r-- | paper.tex | 43 |
1 files changed, 15 insertions, 28 deletions
@@ -24,7 +24,7 @@ -\title{Maneage, a Customizable Framework for Managing Data Lineage} +\title{Acheiving long-term and archive-able reproducibility} \author{\large\mpregular \authoraffil{Mohammad Akhlaghi}{1,2,3}, \large\mpregular \authoraffil{Ra\'ul Infante-Sainz}{1,2}, \large\mpregular \authoraffil{Boudewijn F. Roukema}{4,3}, @@ -52,35 +52,22 @@ %% Abstract {\noindent\mpregular - %%CONTEXT + %% CONTEXT Many reproducible workflow solutions have been proposed during recent decades. - Most use high-level technology that is popular, providing immediate reproducibility that is not sustainable in the long term. - This creates generational gaps between scientists and makes it hard to build upon previous work. - Decades after their results are published, scientists lack the resources to re-write their project software. - %% [This is probably the sentence in this section that could most easily be - %%removed: it more or less repeats the basic issue of reproducibility.] - %%AIM - We aim to introduce a standard of reproducibility criteria that is more rigorous than those previously adopted. - %%METHOD - In this paper, we propose this new standard: completeness (no dependency beyond a POSIX-compatible operating system, no administrator privileges, and no network connection); modular and straightforward design; temporal provenance; scalability; and free software. - %% I would suggest "free-licensed software" or "free-and-open-source - %% software". RMS would scream at us, but the risk is that the editor (or - %% reader) thinks of free-as-in-beer software. Alternatives include "free - %% software (as in free speech)" - but that looks a bit too informal - or - %% long expressions such as "free software (in the sense of the Free - %% Software Definition). If we have enough words available, "software - %% satisfying the Free Software Definition" would be clear and formal (but - %% probably too specific, since there's also the Open Source Software - %% Definition of the OSI, and Debian's DFSG). - % - %%RESULTS - We demonstrate that these criteria are achievable by presenting a concrete example that satisfies these criteria. - "Maneage" (managing+lineage) is stored in machine-actionable, human-readable plain-text format, with version-control, archival, automatic parsing to extract data provenance, and peer-reviewable paper verification. - %% It can build its environment automatically or can be placed in a container as a binary blob for immediate/fast reproduction. - %% [This sentence is probably sort of true for many systems, and is less critical to the "research question"; I suggest dropping it.] - Maneage has already been used in several scientific publications including the present one, with snapshot \projectversion. + Most use the popular high-level technologies when they were created, providing an immediate solution that is not sustainable in the long-term. + However, decades later, scientists lack the resources to re-write their projects, while still being accountable for their results. + This creates generational gaps and due to the obsolete technologies, impedes reproducibility or building upon previous work. + %% AIM + We aim to introduce a set of criteria to address this problem and demonstrate their practicality. + %% METHOD + The criteria are: completeness (i.e., no dependency beyond a POSIX-compatible operating system, no administrator privileges, no network connection and primarily stored in plain-text); modular design; temporal provenance; scalability; and free-and-open-source software. + %% RESULTS + Their usefulness is tested through an implementation: "Maneage" (managing+lineage). + It is stored in machine-actionable and human-readable plain-text, enabling version-control, cheap archival, automatic parsing to extract data provenance, and peer-reviewable verification. + Furthermore, we show that these criteria are not limited to long-term reproducibility but also the immediate/fast regime. + It has been tested in several research publications including the present one, with snapshot \projectversion. %%CONCLUSION - Thus, it is realistic to require that reproducibility solutions satisfy our newly proposed standard. + We conclude that requiring longevity from solutions is realistic, and discuss the benefits of these criteria for scientific progress. \horizontalline |