aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBoud Roukema <boud@cosmo.torun.pl>2020-04-22 18:08:31 +0200
committerBoud Roukema <boud@cosmo.torun.pl>2020-04-22 18:08:31 +0200
commit8d88566f2b5c976f779b81a612b30c23fd4208f1 (patch)
tree148b6ae7884c9f2623b4a5f5760a8f6bcc55c253
parentf0622d8a21b15c4c9374ffc50a20a14056d42e09 (diff)
4.3 Project analysis intro
Minor rewording of 4.3 Project analysis - introduction. Reduction of about 40 words. 4.2 `parallel` quote: s/http:/https:/
-rw-r--r--paper.tex34
1 files changed, 17 insertions, 17 deletions
diff --git a/paper.tex b/paper.tex
index 54ac4f5..e866635 100644
--- a/paper.tex
+++ b/paper.tex
@@ -432,7 +432,7 @@ This paper uses basic software without associated scientific papers. For softwar
This is particularly important for research software, where citation is critical to justify continued development.
A notable example is GNU Parallel \citep{tange18} which prints citation information each time it is run, proposing to either cite the paper or support it with 10000 euros.
It provides a \inlinecode{--citation} option to disable the notice.
-In \href{http://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt?h=master}{its FAQ} this is justified by ``\emph{If you feel the benefit from using GNU Parallel is too small to warrant a citation, then prove that by simply using another tool}''.
+In \href{https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt?h=master}{its FAQ} this is justified by ``\emph{If you feel the benefit from using GNU Parallel is too small to warrant a citation, then prove that by simply using another tool}''.
Most software does not resort to such drastic measures. However, proper citation is not only useful practically, it is also an ethical imperative.
Given the increasing role of software in research \citep{clement19}, automatic citation, is a robust solution.
@@ -449,18 +449,18 @@ These will be tested and enabled in Maneage.
The analysis operations run with no influence from the host OS, enabling an isolated environment without the extra layer of containers or a virtual machine.
In Maneage, a project's analysis is broken into two phases: 1) preparation, and 2) analysis.
-The former is mostly necessary to optimize extremely large datasets and is only useful for advanced users, while following an identical internal structure to the later.
-We will therefore not go any further into it and refer the interested reader to the documentation.
+Both have an identical internal structure.
+The preparation phase is usually only necessary for advanced users who need to optimize extremely large datasets.
-A project consists of many steps, including data access (possibly by downloading), running various steps of the analysis on the raw inputs, and creating the necessary plots, figures or tables for a published report, or output datasets for a database.
-If all of these steps are organized in a single Makefile, it will become very long, and would be hard to maintain, extend/grow, read, reuse, and cite.
+The analysis phase consists of many steps, including data access (possibly by downloading), running various steps of the analysis on the raw inputs, and creating the necessary figures or tables for a published report, or output datasets for a database.
+If all of these steps were organized in a single Makefile, it would become very long, and would be hard to maintain, extend, read, reuse, and cite.
Large files are in general a bad practice and against the modularity and minimal complexity principles (\ref{principle:modularity} \& \ref{principle:complexity}).
Maneage is thus designed to encourage and facilitate modularity by distributing the analysis into many Makefiles that contain contextually-similar analysis steps.
-Hereafter, these modular or lower-level Makefiles will be called \emph{subMakefiles}.
-When run with the \inlinecode{make} argument, the \inlinecode{project} script (Section \ref{sec:maneage}), calls \inlinecode{top-make.mk} which loads the subMakefiles with a certain order into itself (see Section \ref{sec:analysis}).
-All the analysis Makefiles are in \inlinecode{re\-produce\-/anal\-ysis\-/make} (see Figure \ref{fig:files}) and Figure \ref{fig:datalineage} shows their inter-relation with the target/built files that they manage.
-To keep the project's logic clear and simple (minimal complexity principle, \ref{principle:complexity}), by default recursion is not used (where one instance of Make, calls Make within itself).
+Hereafter, these lower-level Makefiles are termed \emph{subMakefiles}.
+When run with the \inlinecode{make} argument, the \inlinecode{project} script (Section \ref{sec:maneage}), calls \inlinecode{top-make.mk}, which loads the subMakefiles using the \inlinecode{include} directive (see Section \ref{sec:analysis}).
+All the analysis Makefiles are in \inlinecode{re\-produce\-/anal\-ysis\-/make} (see Figure \ref{fig:files}). Figure~\ref{fig:datalineage} shows their relationship with the target/built files that they manage.
+To keep the project's logic clear and simple (minimal complexity principle, \ref{principle:complexity}), recursion (where one instance of Make calls Make internally) is, by default, not used.
\begin{figure}[t]
\begin{center}
@@ -469,20 +469,20 @@ To keep the project's logic clear and simple (minimal complexity principle, \ref
\vspace{-7mm}
\caption{\label{fig:datalineage}Schematic representation of a project's data lineage, or workflow, for the demonstration analysis of this paper.
Each colored box is a file in the project and the arrows show the dependencies between them.
- Green files/boxes are plain text files that are under version control and in the source directory.
- Blue files/boxes are output files of various steps in the build-directory, shown within the Makefile (\inlinecode{*.mk}) where they are defined as a \emph{target}.
- For example, \inlinecode{paper.pdf} depends on \inlinecode{project.tex} (in the build directory and generated automatically) and \inlinecode{paper.tex} (in the source directory and written by hand).
- The solid arrows and built boxes with full opacity are described in Section \ref{sec:projectanalysis}.
- The dashed arrows and lower opacity built boxes, show the scalability by adding hypothetical steps to the project.
+ Green files/boxes are plain-text files that are under version control and in the source directory.
+ Blue files/boxes are output files in the build-directory, shown within the Makefile (\inlinecode{*.mk}) where they are defined as a \emph{target}.
+ For example, \inlinecode{paper.pdf} depends on \inlinecode{project.tex} (in the build directory; generated automatically) and \inlinecode{paper.tex} (in the source directory; written manually).
+ The solid arrows and full-opacity built boxes are described in Section \ref{sec:projectanalysis}.
+ The dashed arrows and low-opacity built boxes show the scalability by adding hypothetical steps to the project.
}
\end{figure}
To avoid getting too abstract in the subsections below, where necessary we will do a basic analysis on the data of \citet{menke20} (hereafter M20) and replicate one of the results.
-Note that because we are not using the same software, this is not a reproduction (see \ref{definition:reproduction}).
-We cannot use the same software because M20 use Microsoft Excel for the analysis which violates several of our principles: \ref{principle:complete}, \ref{principle:complexity} and \ref{principle:freesoftware}.
+We cannot use the same software as M20, because M20 used Microsoft Excel for their analysis, violating several of our principles: \ref{principle:complete}, \ref{principle:complexity} and \ref{principle:freesoftware}.
+Since we do not use the same software, this does not qualify as a reproduction (see \ref{definition:reproduction}).
In the subsections below, this paper's analysis on that dataset is described using the data lineage graph of Figure \ref{fig:datalineage}.
We will follow Make's paradigm (see Section \ref{sec:usingmake}) of starting the lineage backwards form the ultimate target in Section \ref{sec:paperpdf} (bottom of Figure \ref{fig:datalineage}) to the configuration files \ref{sec:configfiles} (top of Figure \ref{fig:datalineage}).
-To better understand this project, we encourage looking into this paper's own Maneage source, published as a supplement.
+To better understand this project, we recommend study of this paper's own Maneage source, published as a supplement.
\subsubsection{Ultimate target: the project's paper or report (\inlinecode{paper.pdf})}
\label{sec:paperpdf}