aboutsummaryrefslogtreecommitdiff
path: root/paper.tex
diff options
context:
space:
mode:
Diffstat (limited to 'paper.tex')
-rw-r--r--paper.tex27
1 files changed, 19 insertions, 8 deletions
diff --git a/paper.tex b/paper.tex
index d6ea107..115f5f7 100644
--- a/paper.tex
+++ b/paper.tex
@@ -170,7 +170,7 @@ Because of this, in October 2020 Docker Hub (where many workflows are archived)
Furthermore, Docker requires root permissions, and only supports recent (LTS) versions of the host kernel.
Hence older Docker images may not be executable (their longevity is determined by the host kernel, typically a decade).
-Once the host OS is ready, PMs are used to install the software or environment.
+Once the host OS is ready, package managers (PMs) are used to install the software or environment.
Usually the OS's PM, such as `\inlinecode{apt}' or `\inlinecode{yum}', is used first and higher-level software are built with generic PMs.
The former has the same longevity as the OS, while some of the latter (such as Conda and Spack) are written in high-level languages like Python, so the PM itself depends on the host's Python installation with a typical longevity of a few years.
Nix and GNU Guix produce bit-wise identical programs with considerably better longevity; that of their supported CPU architectures.
@@ -206,9 +206,9 @@ Notebooks can therefore rarely deliver their promised potential \cite{rule18} an
\section{Proposed criteria for longevity}
\label{criteria}
The main premise here is that starting a project with a robust data management strategy (or tools that provide it) is much more effective, for researchers and the community, than imposing it just before publication \cite{austin17,fineberg19}.
-In this context, researchers play a critical role \cite{austin17} in making their research more Findable, Accessible, Interoperable, and Reusable (the FAIR principles).
+In this context, researchers play a critical role \cite{austin17} in making their research more Findable, Accessible, Interoperable, and Reusable (the FAIR principles\footnote{FAIR originally targeted data, work is ongoing to adopt it for software through initiatives like FAIR4RS (FAIR for Research Software).}).
Simply archiving a project workflow in a repository after the project is finished is, on its own, insufficient, and maintaining it by repository staff is often either practically unfeasible or unscalable.
-We argue and propose that workflows satisfying the following criteria can not only improve researcher flexibility during a research project, but can also increase the FAIRness of the deliverables for future researchers:
+We argue and propose that workflows satisfying the following criteria can not only improve researcher flexibility during a research project, but can also increase the FAIRness of the deliverables for future researchers.
\textbf{Criterion 1: Completeness.}
A project that is complete (self-contained) has the following properties.
@@ -449,8 +449,9 @@ The core Maneage git repository is hosted at \href{http://git.maneage.org/projec
Derived projects start by creating a branch and customizing it (e.g., adding a title, data links, narrative, and subMakefiles for its particular analysis, see Listing \ref{code:branching}).
There is a thoroughly elaborated customization checklist in \inlinecode{README-hacking.md}.
-The current project's Git hash is provided to the authors as a \LaTeX{} macro (shown here at the end of the abstract), as well as the Git hash of the last commit in the Maneage branch (shown in the acknowledgments).
-These macros are created in \inlinecode{initialize.mk}, with other basic information from the running system like the CPU architecture, byte order or address sizes (shown in the acknowledgments).
+The current project's Git hash is provided to the authors as a \LaTeX{} macro (shown here in the abstract and acknowledgments), as well as the Git hash of the last commit in the Maneage branch (shown here in the acknowledgments).
+These macros are created in \inlinecode{initialize.mk}, with other basic information from the running system like the CPU details (shown in the acknowledgments).
+As opposed to Git ``tag''s, the hash is a core concept in the Git paradigm and is immutable for a given history, it is therefore the recommended timestamp.
Figure \ref{fig:branching} shows how projects can re-import Maneage at a later time (technically: \emph{merge}), thus improving their low-level infrastructure: in (a) authors do the merge during an ongoing project;
in (b) readers do it after publication; e.g., the project remains reproducible but the infrastructure is outdated, or a bug is fixed in Maneage.
@@ -543,10 +544,16 @@ The completeness criterion implies that algorithms and data selection can be inc
Furthermore, through elements like the macros, natural language processing can also be included, automatically analyzing the connection between an analysis with the resulting narrative \emph{and} the history of that analysis+narrative.
Parsers can be written over projects for meta-research and provenance studies, e.g., to generate Research Objects
\ifdefined\separatesupplement
-(see the supplement appendix).
+(see supplement appendix B).
\else
(see Appendix \ref{appendix:researchobject}).
\fi
+or allow interoperability with Common Workflow Language (CWL) or higher-level concepts like Canonical Workflow Framework for Research, or CWFR
+\ifdefined\separatesupplement
+(see supplement appendix A).
+\else
+(see Appendix \ref{appendix:genericworkflows}).
+\fi
Likewise, when a bug is found in one science software, affected projects can be detected and the scale of the effect can be measured.
Combined with SoftwareHeritage, precise high-level science components of the analysis can be accurately cited (e.g., even failed/abandoned tests at any historical point).
Many components of ``machine-actionable'' data management plans can also be automatically completed as a byproduct, useful for project PIs and grant funders.
@@ -581,17 +588,21 @@ Konrad Hinsen,
Marios Karouzos,
Johan Knapen,
Tamara Kovazh,
+Sebastian Luna Valero,
Terry Mahoney,
+Javier Mold\'on,
Ryan O'Connor,
Mervyn O'Luing,
Simon Portegies Zwart,
+Susana Sanchez Exposito,
Idafen Santana-P\'erez,
Elham Saremi,
Yahya Sefidbakht,
Zahra Sharbaf,
Nadia Tonello,
-Ignacio Trujillo and
-the AMIGA team at the Instituto de Astrof\'isica de Andaluc\'ia for their useful help, suggestions, and feedback on Maneage and this paper.
+Ignacio Trujillo
+and Lourdes Verdes-Montenegro
+for their useful help, suggestions, and feedback on Maneage and this paper.
The five referees and editors of CiSE (Lorena Barba and George Thiruvathukal) provided many points that greatly helped to clarify this paper.
This project (commit \inlinecode{\projectversion}) is maintained in Maneage (\emph{Man}aging data lin\emph{eage}).