From 1b513de78bb77d276ed337e1f03aa3a8168eb1d3 Mon Sep 17 00:00:00 2001 From: Mohammad Akhlaghi Date: Mon, 23 Nov 2020 15:27:51 +0000 Subject: Minor edits and corrections Raul's added point on the answer to the referee was very good, so I edited it a little to be more clear (and removed his name). Also, after looking in a few parts of the text, I fixed a few typos. --- paper.tex | 12 ++++++------ peer-review/1-answer.txt | 13 ++++++++----- reproduce/analysis/config/metadata.conf | 2 +- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/paper.tex b/paper.tex index 16658a4..3907c8e 100644 --- a/paper.tex +++ b/paper.tex @@ -329,16 +329,16 @@ Figure \ref{fig:datalineage} (right) is the data lineage graph that produced it \vspace{-3mm} \caption{\label{fig:datalineage} Left: an enhanced replica of Figure 1C in \cite{menke20}, shown here for demonstrating Maneage. - It shows the ratio of the number of papers mentioning software tools (green line, left vertical axis) to the total number of papers studied in that year (light red bars, right vertical axis on a log scale). + It shows the fraction of the number of papers mentioning software tools (green line, left vertical axis) in each year (red bars, right vertical axis on a log scale). Right: Schematic representation of the data lineage, or workflow, to generate the plot on the left. - Each colored box is a file in the project and \new{arrows show the operation of various software, showing what inputs it takes and what outputs it produces}. + Each colored box is a file in the project and \new{arrows show the operation of various software: linking input file(s) to output file(s)}. Green files/boxes are plain-text files that are under version control and in the project source directory. Blue files/boxes are output files in the build directory, shown within the Makefile (\inlinecode{*.mk}) where they are defined as a \emph{target}. For example, \inlinecode{paper.pdf} \new{is created by running \LaTeX{} on} \inlinecode{project.tex} (in the build directory; generated automatically) and \inlinecode{paper.tex} (in the source directory; written manually). \new{Other software are used in other steps.} - The solid arrows and full-opacity built boxes correspond to this paper. - The dotted arrows and built boxes show the scalability by adding hypothetical steps to the project. - The underlying data of the top plot is available at + The solid arrows and full-opacity built boxes correspond to the lineage of this paper. + The dotted arrows and built boxes show the scalability of Maneage (ease of adding hypothetical steps to the project as it evolves). + The underlying data of the left plot is available at \href{https://zenodo.org/record/\projectzenodoid/files/tools-per-year.txt}{zenodo.\projectzenodoid/tools-per-year.txt}. } \end{figure*} @@ -741,7 +741,7 @@ For example \citeappendix{lofstead19} propose a ``data pallet'' concept to conta In summary, containers or VMs are just a built product themselves. If they are built properly (for example building a Maneage'd project inside a Docker container), they can be useful for immediate usage and fast moving of the project from one system to another. With robust building, the container or VM can also be exactly reproduced later. -However, attempting to archive the actual binary container or VM files as a black box (not knowing the precise versions of the software in them) is expensive, and will not be able to answer the most fundamental +However, attempting to archive the actual binary container or VM files as a black box (not knowing the precise versions of the software in them, and \emph{how} they were built) is expensive, and will not be able to answer the most fundamental questions. \subsubsection{Independent build in host's file system} \label{appendix:independentbuild} diff --git a/peer-review/1-answer.txt b/peer-review/1-answer.txt index 6ccf8d4..6600d2b 100644 --- a/peer-review/1-answer.txt +++ b/peer-review/1-answer.txt @@ -925,11 +925,14 @@ VMs in 2011 and 2014 are no longer active, and how even Dockerhub will be deleting containers that are not used for more than 6 months in free accounts (due to the large storage costs). -Raul: it would be interesting to mention here that Maneage has the criterion of -"Minimal complexity". This means that even if for any reason the project is not -able to be run in the future, the content, analysis scripts, etc. are accesible -for the interested reader (because it is in plain text). So, it is transparent -in any case and the interested reader can follow the analysis and study the +Furthermore, As a unique new feature, Maneage has the criterion of "Minimal +complexity". This means that even if for any reason the project is not able +to be run in the future, the content, analysis scripts, etc. are accesible +for the interested reader (because it is in plain text). Unlike Nix or Guix +it also doesn't have a third-party package package manager: the +instructions of building all the software of a project are directly in the +same project as the high-level analysis software. So, it is transparent in +any case and the interested reader can follow the analysis and study the different decissions of each step (why and how the analysis was done). ------------------------------ diff --git a/reproduce/analysis/config/metadata.conf b/reproduce/analysis/config/metadata.conf index a06b43c..07a1145 100644 --- a/reproduce/analysis/config/metadata.conf +++ b/reproduce/analysis/config/metadata.conf @@ -10,7 +10,7 @@ # warranty. # Project information -metadata-title = Long-term and Archivable Reproducibility +metadata-title = Towards Long-term and Archivable Reproducibility # DOIs and identifiers. metadata-arxiv = 2006.03018 -- cgit v1.2.1