aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBoud Roukema <boud@cosmo.torun.pl>2020-04-23 17:12:56 +0200
committerBoud Roukema <boud@cosmo.torun.pl>2020-04-23 17:12:56 +0200
commit1de931db43851c60a5ebaece053c69cf07bfc66d (patch)
tree7f88f9c9e9738c3e02870783029542ef07975640
parent3c5ae2cb26ac9f244714ae251855944b2da5e8f1 (diff)
Discussion/caveats section.
Reduction of about 50 words. There were a couple of expressions that look a bit like some sort of software/research analysis jargon, such as `Research Objects`, `Software Heritage`, `Machine actionable`. Unless these are defined, capitalising them makes the reader assume that there is some well-known formal meaning and that s/he has to search for that him/herself. As lower case expressions, the reader can guess some reasonable meanings of these. The word "embargo" was introduced for proposal 2) to handle the third caveat.
-rw-r--r--paper.tex54
1 files changed, 27 insertions, 27 deletions
diff --git a/paper.tex b/paper.tex
index a4f546d..99cc4da 100644
--- a/paper.tex
+++ b/paper.tex
@@ -693,34 +693,34 @@ For example, \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481
\label{sec:discussion}
The primordial implementation was written for \citet{akhlaghi15}.
-It later evolved in \citet{bacon17}, and in particular the two sections of that paper that were done by M. Akhlaghi (\href{http://doi.org/10.5281/zenodo.1163746}{zenodo.1163746} and \href{http://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}).
-With these, the customizable skeleton was separated from the flesh as a more abstract ``template''.
-Later, software building was also included and used in \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}]{akhlaghi19} and \citet[\href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}]{infante20}.
-After this paper is published, bugs will still be found and Maneage will continue to evolve and improve, notable changes beyond this paper will be kept in \inlinecode{README-hacking.md}.
-
-Once adopted on a wide scale, Maneage projects can be fed them into machine learning (ML) tools for automatic workflow generation, optimized for certain aspects of the result.
-Because Maneage is complete, even inputs (software algorithms and data selection), or failed tests can enter this optimization.
-Furthermore, since it connects the analysis directly to the narrative and history of a project, this can include natural language processing.
-On the other hand, parsers can be written over Maneage-derived projects for meta-research and data provenance studies, for example to generate Research Objects.
-For example, when a bug is found in one software, all affected projects can be found and the scale of the effect can be measured.
-Combined with Software Heritage, precise parts Maneage projects (high-level science) can be cited, at various points in its history (e.g., failed/abandoned tests).
-Many components of Machine actionable data management plans \citep{miksa19b} can also be automatically filled with Maneage, useful for project PIs and and grant organizations.
-
-Maneage was awarded a Research Data Alliance (RDA) adoption grant for implementing the recommendations of the Publishing Data Workflows working group \citep{austin17}.
-Its user base, and thus its development, grew phenomenally afterwards and highlighted some caveats.
-The first is that Maneage uses very low-level tools that are not widely used by scientists, e.g., Git, \LaTeX, Make and the command-line.
-We have discovered that this is primarily because of a lack of exposure.
-Many (in particular early career researchers) have started mastering them as they adopt Maneage, but it does take time.
-We are thus working on several tutorials and improving the documentation.
-
-A second caveat is the maintenance of the various software packages on the many POSIX-compatible systems.
-However, because Maneage builds its software in same framework as the analysis (in Make), users are empowered to add/fix their necessary software without learning anything new.
-This has already happened, with submitted changes to the core Maneage branch, which are propagated to all projects.
-Another caveat that has been raised is that publishing the project's reproducible data lineage immediately after publication enables others to continue with followup papers before they can do it themselves.
+This later evolved in \citet{bacon17}, in which two particular sections were done by M. Akhlaghi (\href{http://doi.org/10.5281/zenodo.1163746}{zenodo.1163746} and \href{http://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}).
+In these two sections, the customizable skeleton was separated from the flesh as a more abstract ``template''.
+Later, software building was incorporated and used in \citet[\href{https://doi.org/10.5281/zenodo.3408481}{zenodo.3408481}]{akhlaghi19} and \citet[\href{https://doi.org/10.5281/zenodo.3524937}{zenodo.3524937}]{infante20}.
+After this paper is published, bugs will still be found and Maneage will continue to evolve and improve, with the most significant changes to be listed in \inlinecode{README-hacking.md}.
+
+Adoption of Maneage projects on a wide scale will make it possible to feed these into machine learning (ML) tools for automatic workflow generation, optimized for desired characteristics of the results.
+Because Maneage is complete, choices of algorithms and data selection methods or failed tests can be optimized.
+Furthermore, since Maneage connects the analysis directly to the narrative and history of a project, natural language processing can be studied.
+Parsers can be written over Maneage-derived projects for meta-research and data provenance studies, to generate ``research objects''.
+For example, when a bug is found in one software package, all affected projects can be found and the scale of the effect can be measured.
+Combined with software heritage, precise high-level science parts of Maneage projects can be accurately cited (e.g., failed/abandoned tests).
+Many components of ``machine-actionable'' data management plans \citep{miksa19b} can be automatically filled out by Maneage, which is useful for project PIs and and grant funders.
+
+Following a Research Data Alliance (RDA) adoption grant for Maneage for implementing the recommendations of the Publishing Data Workflows working group \citep{austin17}, Maneage's user base and development grew phenomenally, highlighting caveats.
+Firstly, Maneage uses very low-level tools that are not widely used by scientists, e.g., Git, \LaTeX, Make and the command line.
+This is primarily because of a lack of exposure.
+Many (especially early career researchers) have started mastering these tools as they adopt Maneage.
+We are thus working on tutorials and improving documentation.
+
+Secondly, the variety of software packages used on various POSIX-compatible systems require maintenance.
+However, because Maneage builds its software in the same Make framework as the analysis, users' experience with Make in analysis empowers them to add/fix their required software with the same Make tools.
+This has already happened, with improvements contributed to the core Maneage branch, propagating to all projects.
+
+Thirdly, publishing a project's reproducible data lineage immediately after publication enables others to continue with followup papers in competition with the original authors.
We propose these solutions:
-1) Through the Git history, the added work by another team, at any phase of the project, can be quantified, contributing to a new concept of authorship in scientific projects and helping to quantify Newton's famous ``\emph{standing on the shoulders of giants}'' quote.
-This is however a long-term goal and requires major changes to academic value systems.
-2) Authors can be given a grace period where the journal, or some third authority, keeps the source and publishes it a certain interval after publication.
+1) Through the Git history, the work added by another team at any phase of the project can be quantified, contributing to a new concept of authorship in scientific projects and helping to quantify Newton's famous ``\emph{standing on the shoulders of giants}'' quote.
+This is a long-term goal and requires major changes to academic value systems.
+2) Authors can be given a grace period where the journal or a third party embargoes the source, keeping it private for the embargo period and then publishing it.