diff options
| -rw-r--r-- | paper.tex | 16 | 
1 files changed, 8 insertions, 8 deletions
| @@ -10,7 +10,7 @@  %% When defined (value is irrelevant), `\highlightchanges' will cause text  %% in `\tonote' and `\new' to become colored. This is useful in cases that  %% you need to distribute drafts that is undergoing revision and you want -%% to hightlight to your colleagues which parts are new and which parts are +%% to highlight to your colleagues which parts are new and which parts are  %% only for discussion.  \newcommand{\highlightchanges}{} @@ -369,7 +369,7 @@ The plain-text files containing Make rules and their components are called Makef  Hence Make doesn't replace scripting languages like the shell, Python or R, it is a higher-level structure enabling modular/atomic scripts (in any language) to be put in a workflow.  The formal connection of targets with prerequisites that is defined in Make, enables creation of a precise lineage as a formal/codified executable system that is very mature and has stood the test of time: Make is actively developed and used in the building of most OS components. -Besides formalizing data lineage, Make also greatly encourages experimentation in a project because a recipe is executed only when atleast one prerequisite is more recent than its target. +Besides formalizing data lineage, Make also greatly encourages experimentation in a project because a recipe is executed only when at least one prerequisite is more recent than its target.  Therefore when only $5\%$ of a project's targets are affected by a change, only they will be recreated, the other $95\%$ remain untouched.  Furthermore, Make first examines the full lineage before starting the execution of recipes.  It can thus execute independent rules in parallel, further improving the speed and encouraging experimentation. @@ -401,7 +401,7 @@ Sections \ref{sec:buildsoftware} and \ref{sec:softwarecitation} elaborate more o  \label{sec:buildsoftware}  To compile the necessary software from source Maneage currently needs the host to have a C compiler (available on any POSIX-compliant OS). -This C compiler will be used by maneage to install all the Maneage meta-software (software to build other software) with fixed versions, this includes GNU Bash, GNU AWK, GNU Coreutils, and many more on all supported operating systems (including macOS). +This C compiler will be used by Maneage to install all the Maneage meta-software (software to build other software) with fixed versions, this includes GNU Bash, GNU AWK, GNU Coreutils, and many more on all supported operating systems (including macOS).  For example the full list of installed software for this paper is available in Acknowledgments of this paper.  On GNU/Linux OSs, a fixed version of the GNU Binutils and GNU C Compiler (GCC) is also included, and soon Maneage will also install its own fixed version of the GNU C Library to be fully independent of the host on such systems (Task 15390\footnote{\url{https://savannah.nongnu.org/task/?15390}}).  In effect, except for the Kernel, Maneage builds all other components of the GNU OS on the host from source. @@ -421,7 +421,7 @@ However the important factor is that such binary blobs are an optional output of  Maneage contains the full list of built software for each project, their versions and their configuration options.  However, this information is buried deep into each project's source. -Maneage also prints a distilled fraction of this informationin the project's final report, blended into the narrative, as seen in the Acknowledgments of this paper. +Maneage also prints a distilled fraction of this information in the project's final report, blended into the narrative, as seen in the Acknowledgments of this paper.  Furthermore, when the software is associate with a published paper, that paper's Bib\TeX{} entry is also added to the final report and is cited with the software's name and version.  This is particularly important in the case for research software, where the researcher has invested significant time in building the software, and requires official citation to justify continued work on it. @@ -449,7 +449,7 @@ Once the project is configured (Section \ref{sec:projectconfigure}), a unique an  All analysis operations run such that the host OS settings cannot penetrate it, enabling an isolated environment without the extra layer of containers or a virtual machine.  In Maneage, a project's analysis is broken into two phases: data preparation and analysis.  The former is mostly necessary in special situations where the datasets are extremely large and some initial preparation needs to be done on them to avoid slowing down the whole project in each run. -That phase is organized in an identical manner as the analysis phase, so we won't to into it any furhter here and refer the interested reader to the documentation of Maneage. +That phase is organized in an identical manner as the analysis phase, so we won't to into it any further here and refer the interested reader to the documentation of Maneage.  A project consists of many steps, including data access (possibly by downloading), running various steps of the analysis on the obtained data, and creating the necessary plots, figures or tables for a published report, or output datasets for a database.  If all of these steps are organized in a single Makefile, it will become very large, or long, and will be hard to maintain, extend/grow, read, reuse, and cite. @@ -510,7 +510,7 @@ The ma\-cro ``\inlinecode{\small\textbackslash{}demosfoptimizedsn}'' is automati  The built \inlinecode{project.tex} file stores all such reported values.  However, managing all the necessary \LaTeX{} macros in one file is against the modularity principle and can be frustrating and buggy. -To address this problem, Maneage has the convention that all subMakefiles \emph{must} contain a fixed target with the same base-name, but with a \inlinecode{.tex} suffix to store reporeted values generated in that subMakefile. +To address this problem, Maneage has the convention that all subMakefiles \emph{must} contain a fixed target with the same base-name, but with a \inlinecode{.tex} suffix to store reported values generated in that subMakefile.  In Figure \ref{fig:datalineage}, these macro files can be seen in every subMakefile, except for \inlinecode{paper.mk} (which doesn't need it).  These \LaTeX{} macro files thus form the core skeleton of a Maneage project: as shown in Figure \ref{fig:datalineage}, the outward arrows of all built files of any subMakefile ultimately leads to one of these \LaTeX{} macro files, possibly in another subMakefile. @@ -606,7 +606,7 @@ Each external dataset has some basic information, including its expected name on  In Maneage, such information regarding a project's input dataset(s) is in the \inlinecode{INPUTS.conf} file.  See Figures \ref{fig:files} \& \ref{fig:datalineage} for the position of \inlinecode{INPUTS.conf} in the project's file structure and data lineage respectively.  For demonstration, we are using the datasets of \citet{menke20} which are stored in one \inlinecode{.xlsx} file on bioXriv\footnote{\label{footnote:dataurl}Full data URL: \url{\menketwentyurl}}. -Figure \ref{fig:inputconf} shows the corresponding \inlinecode{INPUTS.conf} where the the necessary information are stored as Make variables and are automatically loaded into the full project when Make starts (and is most often used in \inlinecode{download.mk}). +Figure \ref{fig:inputconf} shows the corresponding \inlinecode{INPUTS.conf} where the necessary information are stored as Make variables and are automatically loaded into the full project when Make starts (and is most often used in \inlinecode{download.mk}).  \begin{figure}[t]    \input{tex/src/figure-src-inputconf.tex} @@ -681,7 +681,7 @@ This is demonstrated in the first phase of Figure \ref{fig:branching} where a pr  After a project starts, Maneage will evolve.  For example new features will be added, low-level bugs will be fixed that are useful for any project. -Because all the changes in Maneage are committed on the \inlinecode{maneage} branch, and all projects branch-off from it, updating the project's lowlevel infra-structure is as easy as merging the \inlinecode{maneage} branch into the project's branch. +Because all the changes in Maneage are committed on the \inlinecode{maneage} branch, and all projects branch-off from it, updating the project's low-level infra-structure is as easy as merging the \inlinecode{maneage} branch into the project's branch.  For example in Figure \ref{fig:branching} (phase 1), see how Maneage's \inlinecode{3c05235} commit has been merged into project's branch trough commit \inlinecode{2ed0c82} .  This doesn't just apply to the pre-publication phase, when done in Maneage, a project can be revived at any later date by other researchers as shown in phase 2 of Figure \ref{fig:branching}. | 
