aboutsummaryrefslogtreecommitdiff
path: root/paper.tex
diff options
context:
space:
mode:
authorMohammad Akhlaghi <mohammad@akhlaghi.org>2020-04-10 01:03:59 +0100
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2020-04-10 01:03:59 +0100
commit276ad6df42f5bcac66c96440d7254dc119a2446e (patch)
tree9520a37e94bb08cc55ea0e8dbb293100053b8243 /paper.tex
parent29f1313a45910e1199d812a41d4461a8c0baa221 (diff)
Minor corrections based on Zahra's suggestions
A parenthesis was added to the abstract to hightlight the importance of data lineage for reproducibility. Also, the definitions that Zahra had given for reproducibility were added as comments above the part on defining reproducibility. We'll later decide how to blend them in, if possible.
Diffstat (limited to 'paper.tex')
-rw-r--r--paper.tex16
1 files changed, 9 insertions, 7 deletions
diff --git a/paper.tex b/paper.tex
index f4895b0..259c2e7 100644
--- a/paper.tex
+++ b/paper.tex
@@ -45,16 +45,12 @@
%% Abstract
{\noindent\mpregular
- The era of big data has also ushered an era of big responsability.Without it, the integrity of the result will be a subject of perpetual debate.
-\tonote{
-Because one of the most important properties of Maneage is reproducibility. I think is it better to say something about it in the abstract, like the thing that you do in your speech.
-Something like this:
-Reproducibility (as a test on sufficiently conveying the data lineage) is necessary for other scientists to study, check and build-upon each other’s work ( I got this sentence from the introduction). According to a U.S. National Science Foundation (NSF), the definition of reproducibility is “reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator. That is, a second researcher might use the same raw data to build the same analysis files and implement the same statistical analysis in an attempt to yield the same results…. Reproducibility is a minimum necessary condition for a finding to be believable and informative.”(K. Bollen, J. T. Cacioppo, R. Kaplan, J. Krosnick, J. L. Olds, Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science (National Science Foundation, Arlington, VA, 2015)).
-}
+ The era of big data has also ushered an era of big responsability.
+ Without it, the integrity of the result will be a subject of perpetual debate.
In this paper Maneage is introduced as a low-level solution to this problem.
Maneage (management + lineage) is an executable workflow for project authors and readers in the sciences or the industry.
It is designed following principles: complete (e.g., not requiring anything beyond a POSIX-compatible system, administrator previlages or a network connection), modular, fully in plain-text, minimal complexity in design, verifiable inputs and outputs, temporal lineage/provenance, and free software (in scientific applications).
- A project that uses Maneage will have full control over the data lineage, making it exactly reproducible.
+ A project that uses Maneage will be able to publish the complete data lineage, making it exactly reproducible (as a test on sufficiently conveying the data lineage).
This control goes as far back as the automatic downloading of input data, and automatic building of necessary software that are used to analyze the data, with fixed versions and build configurations.
It also contains the narrative description of the final project's report (built into a PDF), while providing automatic and direct links between the analysis and the part of the narrative description that it was used.
Also, starting new projects, or editing previously published papers is trivial because of its version control system.
@@ -153,6 +149,7 @@ However, this is not a practical solution because software updates are necessary
Generally, software is not a secular component of projects, where one software can easily be swapped with another.
Projects are built around specific software technologies, and research in software methods and implementations is itself a vibrant research topic in many domains \citep{dicosmo19}.
+\tonote{add a short summary of the advantages of Maneage.}
This paper introduces Maneage as a solution to these important issues.
Section \ref{sec:definitions} defines the necessay concepts and terminology used in this paper leading to a discussion of the necessary guiding principles in Section \ref{sec:principles}.
@@ -276,6 +273,11 @@ But before doing so, it is important to highlight that in this paper, we are onl
Therefore, many of the definitions reviewed in \citet{plesser18}, that are about data collection, are out of context here.
We adopt the same definition of \citet{leek17,fineberg19}, among others:
+%% From Zahra Sharbaf:
+%% According to a U.S. National Science Foundation (NSF), the definition of reproducibility is “reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator.
+%% That is, a second researcher might use the same raw data to build the same analysis files and implement the same statistical analysis in an attempt to yield the same results….
+%% Reproducibility is a minimum necessary condition for a finding to be believable and informative.”(K. Bollen, J. T. Cacioppo, R. Kaplan, J. Krosnick, J. L. Olds, Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science (National Science Foundation, Arlington, VA, 2015)).
+
\begin{itemize}
\item {\bf\small Reproducibility:} (same inputs $\rightarrow$ consistant result).
Formally: ``obtaining consistent [not necessarily identical] results using the same input data; computational steps, methods, and code; and conditions of analysis'' \citep{fineberg19}.