diff options
Diffstat (limited to 'paper.tex')
-rw-r--r-- | paper.tex | 19 |
1 files changed, 10 insertions, 9 deletions
@@ -48,12 +48,12 @@ %% Abstract {\noindent\mpregular The era of big data has ushered an era of big responsibility. - In the absence of reproducibility, as a test on controlling the data lineage, the result's integrity will be subject to perpetual debate. - Maneage (management + lineage) is introduced here as a host to the computational and narrative components of an analysis. - Analysis steps are added to a new project with lineage in mind, thus facilitating the project's execution and testing as the project evolves, while being friendly to publishing and archival because it is wholly in machine\--action\-able, and human\--read\-able, plain-text. - Maneage is founded on the principles of completeness (e.g., no dependency beyond a POSIX-compatible operating system, no administrator privileges, or no network connection), modular and straight-forward design, temporal lineage and free software. - The lineage is not limited to downloading the inputs and processing them automatically, but also includes building the necessary software with fixed versions and build configurations. - Additionally, Maneage also builds the final PDF report of the project, establishing direct and automatic links between the data analysis and the narrative, with the precision of a word in a sentence. + In the absence of reproducibility, as a test on the reported data lineage, the result's integrity will be subject to perpetual debate. + To address this problem, we introduce Maneage (management + lineage) which has already been tested and used in several scientific papers. + Maneage is founded on the principles of completeness (e.g., no dependency beyond a POSIX-compatible operating system, no administrator privileges, or no network connection), modular and straight-forward design, temporal lineage and free software, to enable precise reproducibility. + The Maneage lineage, or workflow, is in machine\--action\-able, and human\--read\-able, plain-text format, facilitating version-control, publication, archival, or automatic parsing to extract data provenance. + The lineage is not limited to high-level processing, but also includes building the necessary software from source with fixed versions and build configurations. + Additionally, the project's final visualizations and narrative report are also included, establishing direct, and parse-able, links between the data analysis and the narrative or plots, with the precision of a word in a sentence or a point in a plot. Maneage enables incremental projects, where a new project can branch off an existing one, with moderate changes to enable experimentation on published methods. Once Maneage is implemented in a sufficiently wide scale, it can aid in automatic and optimized workflow creation through machine learning, or automating data management plans. Maneage was a recipient of the research data alliance (RDA) Europe Adoption Grant in 2019. @@ -86,6 +86,7 @@ What operations were done on those inputs? How were the configurations or traini How did the quantitative results get visualized into the final demonstration plots, figures or narrative/qualitative interpretation? May there be a bias in the visualization? See Figure \ref{fig:questions} for a more detailed visual representation of such questions for various stages of the workflow. +\tonote{Johan: add some general references.} In data science and database management, this type of metadata are commonly known as \emph{data provenance}, and the lower-level implementation is \emph{data lineage} (for more on the definitions, see Section \ref{sec:definitions}). Data lineage is being increasingly demanded for integrity checking from both the scientific and industrial/legal domains. @@ -798,13 +799,13 @@ Once the improvements become substantial, new paper(s) will be written to comple \section{Discussion} \label{sec:discussion} - - +\section{Summary and conclusion} +\label{sec:conclusion} %% Acknowledgements \section{Acknowledgments} -The authors wish to thank Pedram Ashofteh Ardakani, Zahra Sharbaf and Surena Fatemi for their useful suggestions and feedback on Maneage and this paper and to David Valls-Gabaud, Ignacio Trujillo, Johan Knapen, Roland Bacon for their support. +The authors wish to thank Pedram Ashofteh Ardakani, Elham Saremi, Zahra Sharbaf and Surena Fatemi for their useful suggestions and feedback on Maneage and this paper and to David Valls-Gabaud, Ignacio Trujillo, Johan Knapen, Roland Bacon for their support. We also thank Julia Aguilar-Cabello for designing the Maneage logo. Work on the reproducible paper template has been funded by the Japanese Ministry of Education, Culture, Sports, Science, and Technology ({\small MEXT}) scholarship and its Grant-in-Aid for Scientific Research (21244012, 24253003), the European Research Council (ERC) advanced grant 339659-MUSICOS, European Union’s Horizon 2020 research and innovation programme under Marie Sklodowska-Curie grant agreement No 721463 to the SUNDIAL ITN, and from the Spanish Ministry of Economy and Competitiveness (MINECO) under grant number AYA2016-76219-P. |