diff options
-rw-r--r-- | paper.tex | 6 |
1 files changed, 3 insertions, 3 deletions
@@ -51,10 +51,10 @@ {\noindent\mpregular The era of big data has ushered an era of big responsibility. In the absence of reproducibility, as a test on understanding data lineage, the result can be the subject of perpetual debate. - To address this problem, we introduce Maneage (management + lineage), founded on the principles of completeness (e.g., no dependency beyond a POSIX-compatible operating system, no administrator privileges, and no network connection), modular and straightforward design, temporal lineage and free software. + To address this problem, we introduce Maneage (management + lineage) which is founded on the principles of completeness (e.g., no dependency beyond a POSIX-compatible operating system, no administrator privileges, and no network connection), modular and straightforward design, temporal lineage and free software. A project using Maneage is fully stored in machine\--action\-able, and human\--read\-able plain-text format, facilitating version-control, publication, archival, and automatic parsing to extract data provenance. The provided lineage is not limited to high-level processing, but also includes building the necessary software from source with fixed versions and build configurations. - Additionally, a project's final visualizations and narrative report are also included, establishing direct links between the analysis and the narrative or visualizations, to the precision of a word within a sentence or each point in a plot. + Additionally, a project's final visualizations and narrative report are also included, establishing direct links between the analysis and the narrative or visualizations, to the precision of a word within a sentence or a point in a plot. Maneage also enables incremental projects, where a new project can branch off an existing one, with moderate changes to enable experimentation on published methods. Once Maneage is implemented in a sufficiently wide scale, automatic and optimized workflow creation through machine learning, or automating data management plans, can easily be set up. Maneage was a recipient of a research data alliance (RDA) Europe Adoption Grant in 2019, and has already been tested and used in several scientific papers, including the present one, with snapshot \projectversion. @@ -126,7 +126,7 @@ Yet, this is not a new problem in the sciences: back in 2011, Elsevier conducted Even before that, in an attempt to simulate research projects, \citet{ioannidis05} proved that ``\emph{most claimed research findings are false}''. In the 1990s, \citet{schwab2000, buckheit1995, claerbout1992} described the same problem very eloquently and also provided some solutions they used.\tonote{DVG: more details here, one is left wondering ...} Even earlier, through his famous quartet, \citet{anscombe73} qualitatively showed how distancing of researchers from the intricacies of algorithms/methods can lead to misinterpretation of the results. -One of the earliest such efforts we are aware of is the work of \citet{roberts69}, who discussed conventions in \texttt{FORTRAN} programming and documentation to help in publishing research codes. +One of the earliest such efforts we are aware of is the work of \citet{roberts69}, who discussed conventions in Fortran programming and documentation to help in publishing research codes. While the situation has somewhat improved, all these papers still resonate strongly with the frustrations of today's scientists. In this paper, we introduce Maneage as a solution to the collective problem of preserving a project's data lineage and its software dependencies. |