aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMohammad Akhlaghi <mohammad@akhlaghi.org>2020-04-23 23:05:09 +0100
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2020-04-23 23:05:09 +0100
commit8bdf5f8e8fefa857c8082acf45685dbefd7174c3 (patch)
treedf03f3cbd43cac1053d078f0a955e870558bbe29
parent6e2ea987a8972b1f0d8f07be47e535e9495d1caf (diff)
Minor edits on Boud's great corrections
Reading over Boud's edits, I noticed a few other parts that I could summarize more and corrected one or two other parts to fit the original purpose of the sentence better.
-rw-r--r--paper.tex25
1 files changed, 12 insertions, 13 deletions
diff --git a/paper.tex b/paper.tex
index d8cb91c..d2aef68 100644
--- a/paper.tex
+++ b/paper.tex
@@ -693,27 +693,26 @@ For example, \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481
\label{sec:discussion}
The primordial implementation was written for \citet{akhlaghi15}.
-This later evolved in \citet{bacon17}, in which two particular sections were done by M. Akhlaghi (\href{http://doi.org/10.5281/zenodo.1163746}{zenodo.1163746} and \href{http://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}).
-In these two sections, the customizable skeleton was separated from the flesh as a more abstract ``template''.
+To use in other projects without a full re-write, the skeleton was separated from the flesh as a more abstract ``template'' that was used in \citet{bacon17}, in particular Sections 4 and 7.3 (respectively in \href{http://doi.org/10.5281/zenodo.1163746}{zenodo.1163746} and \href{http://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}).
Later, software building was incorporated and used in \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}]{akhlaghi19} and \citet[\href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}]{infante20}.
-After this paper is published, bugs will still be found and Maneage will continue to evolve and improve, with the most significant changes to be listed in \inlinecode{README-hacking.md}.
+After this paper is published, bugs will still be found and Maneage will continue to evolve and improve, significant changes from this paper will be listed in \inlinecode{README-hacking.md}.
Adoption of Maneage projects on a wide scale will make it possible to feed these into machine learning (ML) tools for automatic workflow generation, optimized for desired characteristics of the results.
-Because Maneage is complete, choices of algorithms and data selection methods or failed tests can be optimized.
-Furthermore, since Maneage connects the analysis directly to the narrative and history of a project, natural language processing can be studied.
-Parsers can be written over Maneage-derived projects for meta-research and data provenance studies, to generate ``research objects''.
-For example, when a bug is found in one software package, all affected projects can be found and the scale of the effect can be measured.
-Combined with software heritage, precise high-level science parts of Maneage projects can be accurately cited (e.g., failed/abandoned tests).
+Because Maneage is complete, algorithms and data selection methods can be optimized and by connecting the analysis directly to the narrative and history of a project, natural language processing can be studied.
+Parsers can be written over Maneage-derived projects for meta-research and data provenance studies, for example to generate ``research objects''.
+As another example, when a bug is found in one software package, all affected projects can be found and the scale of the effect can be measured.
+Combined with SoftwareHeritage, precise high-level science parts of Maneage projects can be accurately cited (e.g., failed/abandoned tests at any historical point).
Many components of ``machine-actionable'' data management plans \citep{miksa19b} can be automatically filled out by Maneage, which is useful for project PIs and and grant funders.
-Following a Research Data Alliance (RDA) adoption grant for Maneage for implementing the recommendations of the Publishing Data Workflows working group \citep{austin17}, Maneage's user base and development grew phenomenally, highlighting caveats.
+Maneage was awarded a Research Data Alliance (RDA) adoption grant for implementing the recommendations of the Publishing Data Workflows working group \citep{austin17}.
+Maneage's user base and development grew phenomenally, highlighting caveats.
Firstly, Maneage uses very low-level tools that are not widely used by scientists, e.g., Git, \LaTeX, Make and the command line.
This is primarily because of a lack of exposure.
-Many (especially early career researchers) have started mastering these tools as they adopt Maneage.
+Witnessing the improvements in their research, many (especially early career researchers) have started mastering these tools as they adopt Maneage.
We are thus working on tutorials and improving documentation.
-Secondly, the variety of software packages used on various POSIX-compatible systems require maintenance.
-However, because Maneage builds its software in the same Make framework as the analysis, users' experience with Make in analysis empowers them to add/fix their required software with the same Make tools.
+Secondly, the many software packages used on various POSIX-compatible systems require maintenance.
+However, because Maneage builds its software in the same Make framework as the analysis, users' experience in the analysis empowers them to add/fix their required software with the same Make tools.
This has already happened, with improvements contributed to the core Maneage branch, propagating to all projects.
Thirdly, publishing a project's reproducible data lineage immediately after publication enables others to continue with followup papers in competition with the original authors.
@@ -734,7 +733,7 @@ This is a long-term goal and requires major changes to academic value systems.
\section{Conclusion \& Summary}
\label{sec:conclusion}
-To effectively leverage the scientific power of big data, we need to have a complete view of its lineage.
+To optimally extract the potentials of big data in science, we need to have a complete view of its lineage.
Scientists are, however, rarely trained sufficiently in data management or software development, and the plethora of high-level tools that change every few years does not help.
Such high-level tools are primarily targetted at software developers, who are paid to learn them and use them effectively for short-term projects.
Scientists, on the other hand, need to focus on their own research fields, and need to think about longevity.