diff options
author | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2020-04-23 23:05:09 +0100 |
---|---|---|
committer | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2020-04-23 23:05:09 +0100 |
commit | 8bdf5f8e8fefa857c8082acf45685dbefd7174c3 (patch) | |
tree | df03f3cbd43cac1053d078f0a955e870558bbe29 | |
parent | 6e2ea987a8972b1f0d8f07be47e535e9495d1caf (diff) |
Minor edits on Boud's great corrections
Reading over Boud's edits, I noticed a few other parts that I could
summarize more and corrected one or two other parts to fit the original
purpose of the sentence better.
-rw-r--r-- | paper.tex | 25 |
1 files changed, 12 insertions, 13 deletions
@@ -693,27 +693,26 @@ For example, \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481 \label{sec:discussion} The primordial implementation was written for \citet{akhlaghi15}. -This later evolved in \citet{bacon17}, in which two particular sections were done by M. Akhlaghi (\href{http://doi.org/10.5281/zenodo.1163746}{zenodo.1163746} and \href{http://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}). -In these two sections, the customizable skeleton was separated from the flesh as a more abstract ``template''. +To use in other projects without a full re-write, the skeleton was separated from the flesh as a more abstract ``template'' that was used in \citet{bacon17}, in particular Sections 4 and 7.3 (respectively in \href{http://doi.org/10.5281/zenodo.1163746}{zenodo.1163746} and \href{http://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}). Later, software building was incorporated and used in \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}]{akhlaghi19} and \citet[\href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}]{infante20}. -After this paper is published, bugs will still be found and Maneage will continue to evolve and improve, with the most significant changes to be listed in \inlinecode{README-hacking.md}. +After this paper is published, bugs will still be found and Maneage will continue to evolve and improve, significant changes from this paper will be listed in \inlinecode{README-hacking.md}. Adoption of Maneage projects on a wide scale will make it possible to feed these into machine learning (ML) tools for automatic workflow generation, optimized for desired characteristics of the results. -Because Maneage is complete, choices of algorithms and data selection methods or failed tests can be optimized. -Furthermore, since Maneage connects the analysis directly to the narrative and history of a project, natural language processing can be studied. -Parsers can be written over Maneage-derived projects for meta-research and data provenance studies, to generate ``research objects''. -For example, when a bug is found in one software package, all affected projects can be found and the scale of the effect can be measured. -Combined with software heritage, precise high-level science parts of Maneage projects can be accurately cited (e.g., failed/abandoned tests). +Because Maneage is complete, algorithms and data selection methods can be optimized and by connecting the analysis directly to the narrative and history of a project, natural language processing can be studied. +Parsers can be written over Maneage-derived projects for meta-research and data provenance studies, for example to generate ``research objects''. +As another example, when a bug is found in one software package, all affected projects can be found and the scale of the effect can be measured. +Combined with SoftwareHeritage, precise high-level science parts of Maneage projects can be accurately cited (e.g., failed/abandoned tests at any historical point). Many components of ``machine-actionable'' data management plans \citep{miksa19b} can be automatically filled out by Maneage, which is useful for project PIs and and grant funders. -Following a Research Data Alliance (RDA) adoption grant for Maneage for implementing the recommendations of the Publishing Data Workflows working group \citep{austin17}, Maneage's user base and development grew phenomenally, highlighting caveats. +Maneage was awarded a Research Data Alliance (RDA) adoption grant for implementing the recommendations of the Publishing Data Workflows working group \citep{austin17}. +Maneage's user base and development grew phenomenally, highlighting caveats. Firstly, Maneage uses very low-level tools that are not widely used by scientists, e.g., Git, \LaTeX, Make and the command line. This is primarily because of a lack of exposure. -Many (especially early career researchers) have started mastering these tools as they adopt Maneage. +Witnessing the improvements in their research, many (especially early career researchers) have started mastering these tools as they adopt Maneage. We are thus working on tutorials and improving documentation. -Secondly, the variety of software packages used on various POSIX-compatible systems require maintenance. -However, because Maneage builds its software in the same Make framework as the analysis, users' experience with Make in analysis empowers them to add/fix their required software with the same Make tools. +Secondly, the many software packages used on various POSIX-compatible systems require maintenance. +However, because Maneage builds its software in the same Make framework as the analysis, users' experience in the analysis empowers them to add/fix their required software with the same Make tools. This has already happened, with improvements contributed to the core Maneage branch, propagating to all projects. Thirdly, publishing a project's reproducible data lineage immediately after publication enables others to continue with followup papers in competition with the original authors. @@ -734,7 +733,7 @@ This is a long-term goal and requires major changes to academic value systems. \section{Conclusion \& Summary} \label{sec:conclusion} -To effectively leverage the scientific power of big data, we need to have a complete view of its lineage. +To optimally extract the potentials of big data in science, we need to have a complete view of its lineage. Scientists are, however, rarely trained sufficiently in data management or software development, and the plethora of high-level tools that change every few years does not help. Such high-level tools are primarily targetted at software developers, who are paid to learn them and use them effectively for short-term projects. Scientists, on the other hand, need to focus on their own research fields, and need to think about longevity. |