From 1de931db43851c60a5ebaece053c69cf07bfc66d Mon Sep 17 00:00:00 2001 From: Boud Roukema Date: Thu, 23 Apr 2020 17:12:56 +0200 Subject: Discussion/caveats section. Reduction of about 50 words. There were a couple of expressions that look a bit like some sort of software/research analysis jargon, such as `Research Objects`, `Software Heritage`, `Machine actionable`. Unless these are defined, capitalising them makes the reader assume that there is some well-known formal meaning and that s/he has to search for that him/herself. As lower case expressions, the reader can guess some reasonable meanings of these. The word "embargo" was introduced for proposal 2) to handle the third caveat. --- paper.tex | 54 +++++++++++++++++++++++++++--------------------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/paper.tex b/paper.tex index a4f546d..99cc4da 100644 --- a/paper.tex +++ b/paper.tex @@ -693,34 +693,34 @@ For example, \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481 \label{sec:discussion} The primordial implementation was written for \citet{akhlaghi15}. -It later evolved in \citet{bacon17}, and in particular the two sections of that paper that were done by M. Akhlaghi (\href{http://doi.org/10.5281/zenodo.1163746}{zenodo.1163746} and \href{http://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}). -With these, the customizable skeleton was separated from the flesh as a more abstract ``template''. -Later, software building was also included and used in \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}]{akhlaghi19} and \citet[\href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}]{infante20}. -After this paper is published, bugs will still be found and Maneage will continue to evolve and improve, notable changes beyond this paper will be kept in \inlinecode{README-hacking.md}. - -Once adopted on a wide scale, Maneage projects can be fed them into machine learning (ML) tools for automatic workflow generation, optimized for certain aspects of the result. -Because Maneage is complete, even inputs (software algorithms and data selection), or failed tests can enter this optimization. -Furthermore, since it connects the analysis directly to the narrative and history of a project, this can include natural language processing. -On the other hand, parsers can be written over Maneage-derived projects for meta-research and data provenance studies, for example to generate Research Objects. -For example, when a bug is found in one software, all affected projects can be found and the scale of the effect can be measured. -Combined with Software Heritage, precise parts Maneage projects (high-level science) can be cited, at various points in its history (e.g., failed/abandoned tests). -Many components of Machine actionable data management plans \citep{miksa19b} can also be automatically filled with Maneage, useful for project PIs and and grant organizations. - -Maneage was awarded a Research Data Alliance (RDA) adoption grant for implementing the recommendations of the Publishing Data Workflows working group \citep{austin17}. -Its user base, and thus its development, grew phenomenally afterwards and highlighted some caveats. -The first is that Maneage uses very low-level tools that are not widely used by scientists, e.g., Git, \LaTeX, Make and the command-line. -We have discovered that this is primarily because of a lack of exposure. -Many (in particular early career researchers) have started mastering them as they adopt Maneage, but it does take time. -We are thus working on several tutorials and improving the documentation. - -A second caveat is the maintenance of the various software packages on the many POSIX-compatible systems. -However, because Maneage builds its software in same framework as the analysis (in Make), users are empowered to add/fix their necessary software without learning anything new. -This has already happened, with submitted changes to the core Maneage branch, which are propagated to all projects. -Another caveat that has been raised is that publishing the project's reproducible data lineage immediately after publication enables others to continue with followup papers before they can do it themselves. +This later evolved in \citet{bacon17}, in which two particular sections were done by M. Akhlaghi (\href{http://doi.org/10.5281/zenodo.1163746}{zenodo.1163746} and \href{http://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}). +In these two sections, the customizable skeleton was separated from the flesh as a more abstract ``template''. +Later, software building was incorporated and used in \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}]{akhlaghi19} and \citet[\href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}]{infante20}. +After this paper is published, bugs will still be found and Maneage will continue to evolve and improve, with the most significant changes to be listed in \inlinecode{README-hacking.md}. + +Adoption of Maneage projects on a wide scale will make it possible to feed these into machine learning (ML) tools for automatic workflow generation, optimized for desired characteristics of the results. +Because Maneage is complete, choices of algorithms and data selection methods or failed tests can be optimized. +Furthermore, since Maneage connects the analysis directly to the narrative and history of a project, natural language processing can be studied. +Parsers can be written over Maneage-derived projects for meta-research and data provenance studies, to generate ``research objects''. +For example, when a bug is found in one software package, all affected projects can be found and the scale of the effect can be measured. +Combined with software heritage, precise high-level science parts of Maneage projects can be accurately cited (e.g., failed/abandoned tests). +Many components of ``machine-actionable'' data management plans \citep{miksa19b} can be automatically filled out by Maneage, which is useful for project PIs and and grant funders. + +Following a Research Data Alliance (RDA) adoption grant for Maneage for implementing the recommendations of the Publishing Data Workflows working group \citep{austin17}, Maneage's user base and development grew phenomenally, highlighting caveats. +Firstly, Maneage uses very low-level tools that are not widely used by scientists, e.g., Git, \LaTeX, Make and the command line. +This is primarily because of a lack of exposure. +Many (especially early career researchers) have started mastering these tools as they adopt Maneage. +We are thus working on tutorials and improving documentation. + +Secondly, the variety of software packages used on various POSIX-compatible systems require maintenance. +However, because Maneage builds its software in the same Make framework as the analysis, users' experience with Make in analysis empowers them to add/fix their required software with the same Make tools. +This has already happened, with improvements contributed to the core Maneage branch, propagating to all projects. + +Thirdly, publishing a project's reproducible data lineage immediately after publication enables others to continue with followup papers in competition with the original authors. We propose these solutions: -1) Through the Git history, the added work by another team, at any phase of the project, can be quantified, contributing to a new concept of authorship in scientific projects and helping to quantify Newton's famous ``\emph{standing on the shoulders of giants}'' quote. -This is however a long-term goal and requires major changes to academic value systems. -2) Authors can be given a grace period where the journal, or some third authority, keeps the source and publishes it a certain interval after publication. +1) Through the Git history, the work added by another team at any phase of the project can be quantified, contributing to a new concept of authorship in scientific projects and helping to quantify Newton's famous ``\emph{standing on the shoulders of giants}'' quote. +This is a long-term goal and requires major changes to academic value systems. +2) Authors can be given a grace period where the journal or a third party embargoes the source, keeping it private for the embargo period and then publishing it. -- cgit v1.2.1