aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--paper.tex18
1 files changed, 9 insertions, 9 deletions
diff --git a/paper.tex b/paper.tex
index 758c418..63db54e 100644
--- a/paper.tex
+++ b/paper.tex
@@ -52,15 +52,15 @@
%% Abstract
{\noindent\mpregular
- The era of big data has ushered an era of big responsibility.
- In the absence of reproducibility, as a test on understanding data lineage, the result can be the subject of perpetual debate.
- To address this problem, we introduce Maneage (management + lineage) which is founded on the principles of completeness (e.g., no dependency beyond a POSIX-compatible operating system, no administrator privileges, and no network connection), modular and straightforward design, temporal provenance, scalability, and free software.
- A project using Maneage is fully stored in machine\--action\-able, and human\--read\-able plain-text format, facilitating version-control, publication, archival, and automatic parsing to extract data provenance.
- The provided lineage is not limited to high-level processing, but also includes building the necessary software from source with fixed versions and build configurations.
- Additionally, a project's final visualizations and narrative report are also included, establishing direct links between the analysis and the narrative or visualizations, to the precision of a word within a sentence or a point in a plot.
- Maneage also enables incremental projects, where a new project can branch off an existing one, with moderate changes to enable experimentation on published methods.
- Once Maneage is implemented in a sufficiently wide scale, automatic and optimized workflow creation through machine learning, or automating data management plans, can easily be set up.
- Maneage was a recipient of a research data alliance (RDA) Europe Adoption Grant in 2019, and has already been tested and used in several scientific papers, including the present one, with snapshot \projectversion.
+ Over the last 30 years, many reproducible workflow solutions have been proposed, mostly using the common high-level technology of the day.
+ Thus providing immediate reproducibility, but problematic in the long-term because high-level technologies evolve.
+ Scientists are accountable to their results decades later, and don't have the resources to re-write their projects.
+ This creates generational gaps between scientists and makes it hard to build upon previous work.
+ In this paper, we report the result of our research project on a fundamentally new design that is founded on the principles of completeness (e.g., no dependency beyond a POSIX-compatible operating system, no administrator privileges, and no network connection), with modular and straightforward design, temporal provenance, scalability, and free software.
+ It is called Maneage (managing+lineage) and is stored in machine-actionable, and human-readable plain-text format.
+ Facilitating version-control, publication, archival, and automatic parsing to extract data provenance.
+ It can build its environment automatically, possibly, in virtual machines, containers or any future technology as a binary blob for immediate/fast reproduction.
+ Maneage has already been used in several scientific publications including the present one, with snapshot \projectversion.
\horizontalline
\noindent