diff options
-rw-r--r-- | paper.tex | 18 |
1 files changed, 9 insertions, 9 deletions
@@ -52,15 +52,15 @@ %% Abstract {\noindent\mpregular - The era of big data has ushered an era of big responsibility. - In the absence of reproducibility, as a test on understanding data lineage, the result can be the subject of perpetual debate. - To address this problem, we introduce Maneage (management + lineage) which is founded on the principles of completeness (e.g., no dependency beyond a POSIX-compatible operating system, no administrator privileges, and no network connection), modular and straightforward design, temporal provenance, scalability, and free software. - A project using Maneage is fully stored in machine\--action\-able, and human\--read\-able plain-text format, facilitating version-control, publication, archival, and automatic parsing to extract data provenance. - The provided lineage is not limited to high-level processing, but also includes building the necessary software from source with fixed versions and build configurations. - Additionally, a project's final visualizations and narrative report are also included, establishing direct links between the analysis and the narrative or visualizations, to the precision of a word within a sentence or a point in a plot. - Maneage also enables incremental projects, where a new project can branch off an existing one, with moderate changes to enable experimentation on published methods. - Once Maneage is implemented in a sufficiently wide scale, automatic and optimized workflow creation through machine learning, or automating data management plans, can easily be set up. - Maneage was a recipient of a research data alliance (RDA) Europe Adoption Grant in 2019, and has already been tested and used in several scientific papers, including the present one, with snapshot \projectversion. + Over the last 30 years, many reproducible workflow solutions have been proposed, mostly using the common high-level technology of the day. + Thus providing immediate reproducibility, but problematic in the long-term because high-level technologies evolve. + Scientists are accountable to their results decades later, and don't have the resources to re-write their projects. + This creates generational gaps between scientists and makes it hard to build upon previous work. + In this paper, we report the result of our research project on a fundamentally new design that is founded on the principles of completeness (e.g., no dependency beyond a POSIX-compatible operating system, no administrator privileges, and no network connection), with modular and straightforward design, temporal provenance, scalability, and free software. + It is called Maneage (managing+lineage) and is stored in machine-actionable, and human-readable plain-text format. + Facilitating version-control, publication, archival, and automatic parsing to extract data provenance. + It can build its environment automatically, possibly, in virtual machines, containers or any future technology as a binary blob for immediate/fast reproduction. + Maneage has already been used in several scientific publications including the present one, with snapshot \projectversion. \horizontalline \noindent |