aboutsummaryrefslogtreecommitdiff
path: root/paper.tex
diff options
context:
space:
mode:
authorMohammad Akhlaghi <mohammad@akhlaghi.org>2020-04-14 06:00:43 +0100
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2020-04-14 06:00:43 +0100
commitde23a683ec8b57e75ef47d9298d7ce15e67c9af9 (patch)
treeaf7f096efdf6c127c30b55fd136af9dfad94c0ce /paper.tex
parent08091a0ee5403ffd36cbadc8b0d0ed712ef9577f (diff)
Added first summarized draft of discussion and conclusions
A first draft for these was added and will probably become much better in the next few iterations.
Diffstat (limited to 'paper.tex')
-rw-r--r--paper.tex47
1 files changed, 31 insertions, 16 deletions
diff --git a/paper.tex b/paper.tex
index a2a1054..9ec7eb4 100644
--- a/paper.tex
+++ b/paper.tex
@@ -642,6 +642,7 @@ Another useful scenario is reviving a finished/published project at later date,
In that figure, a new team of researchers have decided to experiment on the results of the published paper and have merged it with the Maneage branch (commit \inlinecode{a92b25a}) to make it usable for their system (e.g., assming the original project was completed years ago, and is no longer directly executable).
Other scenarios include a third project that can easily merge various high-level components from different projects into its own branch, thus adding a temporal dimension to their data lineage.
+This structure also enables easy propagation of low-level fixes to all projects using Maneage.
Modern version control systems provide many more capabilities that can be leveraged through Maneage in project management, thanks to the shared branch it has with \emph{all} derived projects, and that it is complete (\ref{principle:complete}).
\subsection{Multi-user collaboration on single build directory}
@@ -679,7 +680,7 @@ For example in \citet{akhlaghi19}, along with the source files mentioned above,
-\section{Discussion}
+\section{Discussion \& Caveats}
\label{sec:discussion}
Maneage is the final product of various research projects (in astrophysics) over the last 5 years.
@@ -693,35 +694,49 @@ That template later matured into Maneage by including the installation of all ne
In the last year and with the Research Data Alliance (RDA) grant that was awarded to Maneage, its user base (and thus its development) grew phenomenally and it has evolved to become much more customizable, well-tested and well-documented.
But it is far from complete: its core architecture will continue evolve after the publication of this paper, therefore a list of the notable changes after the publication of this paper will be kept in the \inlinecode{README-hacking.md} file.
-Based on user input, we have seen the following caveats for Maneage.
-The first caveat is regarding its widespread adoption: by principle, it uses very low-level tools like Git, \LaTeX, Make and command-line tools to run in non-interactive mode.
-We understand that a very large fraction of the scientific community are accustomed to interactive graphic user interface (GUI) tools.
-Thanks to the RDA grant, we have had the chance to introduce Maneage to several early career researchers and the feedback we have got was very promising: most were simply not familiar with any of these tools.
-After seeing them in action together (as a \emph{complete} Maneage project) they have started using and enjoying them.
-Unfortunately by their low-level nature, their documentation alone is directly useful for scientists.
-Based on this positive feedback, we thus working on several tutorials and scientist-friendly documentation of such tools for usage in Maneage.
+Based on early adopters, we have seen the following caveats for Maneage.
+The first caveat is regarding its widespread adoption: by principle, Maneage uses very low-level tools like Git, \LaTeX, Make and command-line tools to run in non-interactive mode.
+However, a large fraction of the scientific community are accustomed to interactive graphic user interface (GUI) tools.
+But this is not often a final choice: some of our early users simply didn't know such tools existed.
+After seeing them in action together (as a \emph{complete} Maneage project) they have started using these tools effectively.
+Unfortunately by their low-level nature, the documentation of these tools alone discourages scientists, we thus working on several tutorials and scientist-friendly documentation of such tools, hopefully by collaborating with efforts like \href{http://software.ac.uk}{software.ac.uk} and \href{http://urssi.us}{urssi.us}.
+\citet{fineberg19} also note the importance that a project starts by following good practice, not to force it in the end.
-A second caveat is the fact that Maneage is effectively a distribution of the GNU operating system, tailored to each project, but built ontop of an existing POSIX-compatible operating system, but not using its environment.
+A second caveat is the fact that Maneage is effectively an almost complete GNU operating system, tailored to each project.
+It is just built ontop of an existing POSIX-compatible operating system, using its kernel.
Maneage has many generic scripts for simplifying the software packaging.
-However, maintaining them (updating versions or fixing bugs on some hosts) can take time and is annoying for a small team.
-Because package management (Section \ref{sec:projectconfigure}) is in the same language as analysis, some users have learnt to package their necessary software tools themselves.
-They later send those additions as merge-requests to the core Maneage branch, making them available to any future project.
-With a larger user-base we hope the fraction of such volunteers increases and decreases the burden.
+However, maintaining them (updating versions or fixing bugs on some hosts) can take time for a small team.
+Because package management (Section \ref{sec:projectconfigure}) is in the same language as the analysis, some users have learnt to package their necessary software or correct some bugs themselves.
+They later send those additions as merge-requests to the core Maneage branch, thus propagating the improvement to all projects using Maneage.
+With a larger user-base we hope the fraction of such volunteers increases and decreases the burden on our core team.
Another caveat that has been raised by some is that publishing the project's reproducible data lineage immediately after publication may hamper their ability to continue harvesting from all their hard work.
Given the strong integrity checks in Maneage, we believe it has features to address this problem in the following ways:
1) Through the Git history, it is clear how much extra work the other team has added.
In this way, Maneage can contribute to a new concept of authorship in scientific projects and help to quantify Newton's famous ``standing on the shoulders of giants'' quote.
However, this is a long term goal and requires major changes to academic value systems.
-2) Authors can be given a grace period where the journal, or some third authority, keeps the source and publishes it a certain time after publication.
+2) Authors can be given a grace period where the journal, or some third authority, keeps the source and publishes it a certain interval after publication.
-\section{Summary and conclusion}
+Once Maneage is adopted on a wide scale in a special topic, it is possible to feed them into machine learning algorithms for automatic workflow generation, optimized for certain aspects of the result.
+Because Maneage is complete and also includes the project's history, even inputs (software and input data) or failed tests during the projects can enter this optimization process.
+Furthermore, writing parsers of Maneage projects to generate Research Objects is trivial, and very useful for meta-research and data provenance studies.
+
+\section{Conclusion \& Summary}
\label{sec:conclusion}
+To effectively leaverage the power of big data, we need to have a complete view of its lineage.
+However scientists are rarely trained sufficiently in data management or software development, the plethora of high-level tools, that change every few years also doesn't help.
+Maneage is desigend as a complete template, providing scientists with a built low-level skeleton that scientists can customize for any project and adopt modern, robust and efficient data management in practice on their own projects.
+
+In this paper we introduced Maneage and how it is built upon the principles of completeness, modularity, minimal complexity, verifiable inputs and outputs, temporal provenance, and free software.
+We showed how these principles are implemented in an already built structure that users just have to customize for the high-level aspects of their projects and discussed the caveats and advantages of this implementation.
+With a larger user-base and wider application in scientific (and hopefully industrial) applications, Maneage will certainly grow and become even more stable user friendly.
+
+\tonote{One more paragraph will be added here.}
%% Acknowledgements
\section{Acknowledgments}
-The authors wish to thank David Valls-Gabaud, Konrad Hinsen, Yahya Sefidbakht, Pedram Ashofteh Ardakani, Elham Saremi, Zahra Sharbaf and Surena Fatemi for their useful suggestions and feedback on Maneage and this paper and to Ignacio Trujillo, Johan Knapen, Roland Bacon for their support.
+The authors wish to thank David Valls-Gabaud, Johan Knapen, Ignacio Trujillo, Roland Bacon, Konrad Hinsen, Yahya Sefidbakht, Simon Portegies Zwart, Pedram Ashofteh Ardakani, Elham Saremi, Zahra Sharbaf and Surena Fatemi for their useful suggestions and feedback on Maneage and this paper.
We also thank Julia Aguilar-Cabello for designing the Maneage logo.
During its development, Maneage has been partially funded (in historical order) by the following institutions:
The Japanese Ministry of Education, Culture, Sports, Science, and Technology ({\small MEXT}) PhD scholarship to M.A and its Grant-in-Aid for Scientific Research (21244012, 24253003).