From 8bdf5f8e8fefa857c8082acf45685dbefd7174c3 Mon Sep 17 00:00:00 2001 From: Mohammad Akhlaghi Date: Thu, 23 Apr 2020 23:05:09 +0100 Subject: Minor edits on Boud's great corrections Reading over Boud's edits, I noticed a few other parts that I could summarize more and corrected one or two other parts to fit the original purpose of the sentence better. --- paper.tex | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/paper.tex b/paper.tex index d8cb91c..d2aef68 100644 --- a/paper.tex +++ b/paper.tex @@ -693,27 +693,26 @@ For example, \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481 \label{sec:discussion} The primordial implementation was written for \citet{akhlaghi15}. -This later evolved in \citet{bacon17}, in which two particular sections were done by M. Akhlaghi (\href{http://doi.org/10.5281/zenodo.1163746}{zenodo.1163746} and \href{http://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}). -In these two sections, the customizable skeleton was separated from the flesh as a more abstract ``template''. +To use in other projects without a full re-write, the skeleton was separated from the flesh as a more abstract ``template'' that was used in \citet{bacon17}, in particular Sections 4 and 7.3 (respectively in \href{http://doi.org/10.5281/zenodo.1163746}{zenodo.1163746} and \href{http://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}). Later, software building was incorporated and used in \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}]{akhlaghi19} and \citet[\href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}]{infante20}. -After this paper is published, bugs will still be found and Maneage will continue to evolve and improve, with the most significant changes to be listed in \inlinecode{README-hacking.md}. +After this paper is published, bugs will still be found and Maneage will continue to evolve and improve, significant changes from this paper will be listed in \inlinecode{README-hacking.md}. Adoption of Maneage projects on a wide scale will make it possible to feed these into machine learning (ML) tools for automatic workflow generation, optimized for desired characteristics of the results. -Because Maneage is complete, choices of algorithms and data selection methods or failed tests can be optimized. -Furthermore, since Maneage connects the analysis directly to the narrative and history of a project, natural language processing can be studied. -Parsers can be written over Maneage-derived projects for meta-research and data provenance studies, to generate ``research objects''. -For example, when a bug is found in one software package, all affected projects can be found and the scale of the effect can be measured. -Combined with software heritage, precise high-level science parts of Maneage projects can be accurately cited (e.g., failed/abandoned tests). +Because Maneage is complete, algorithms and data selection methods can be optimized and by connecting the analysis directly to the narrative and history of a project, natural language processing can be studied. +Parsers can be written over Maneage-derived projects for meta-research and data provenance studies, for example to generate ``research objects''. +As another example, when a bug is found in one software package, all affected projects can be found and the scale of the effect can be measured. +Combined with SoftwareHeritage, precise high-level science parts of Maneage projects can be accurately cited (e.g., failed/abandoned tests at any historical point). Many components of ``machine-actionable'' data management plans \citep{miksa19b} can be automatically filled out by Maneage, which is useful for project PIs and and grant funders. -Following a Research Data Alliance (RDA) adoption grant for Maneage for implementing the recommendations of the Publishing Data Workflows working group \citep{austin17}, Maneage's user base and development grew phenomenally, highlighting caveats. +Maneage was awarded a Research Data Alliance (RDA) adoption grant for implementing the recommendations of the Publishing Data Workflows working group \citep{austin17}. +Maneage's user base and development grew phenomenally, highlighting caveats. Firstly, Maneage uses very low-level tools that are not widely used by scientists, e.g., Git, \LaTeX, Make and the command line. This is primarily because of a lack of exposure. -Many (especially early career researchers) have started mastering these tools as they adopt Maneage. +Witnessing the improvements in their research, many (especially early career researchers) have started mastering these tools as they adopt Maneage. We are thus working on tutorials and improving documentation. -Secondly, the variety of software packages used on various POSIX-compatible systems require maintenance. -However, because Maneage builds its software in the same Make framework as the analysis, users' experience with Make in analysis empowers them to add/fix their required software with the same Make tools. +Secondly, the many software packages used on various POSIX-compatible systems require maintenance. +However, because Maneage builds its software in the same Make framework as the analysis, users' experience in the analysis empowers them to add/fix their required software with the same Make tools. This has already happened, with improvements contributed to the core Maneage branch, propagating to all projects. Thirdly, publishing a project's reproducible data lineage immediately after publication enables others to continue with followup papers in competition with the original authors. @@ -734,7 +733,7 @@ This is a long-term goal and requires major changes to academic value systems. \section{Conclusion \& Summary} \label{sec:conclusion} -To effectively leverage the scientific power of big data, we need to have a complete view of its lineage. +To optimally extract the potentials of big data in science, we need to have a complete view of its lineage. Scientists are, however, rarely trained sufficiently in data management or software development, and the plethora of high-level tools that change every few years does not help. Such high-level tools are primarily targetted at software developers, who are paid to learn them and use them effectively for short-term projects. Scientists, on the other hand, need to focus on their own research fields, and need to think about longevity. -- cgit v1.2.1