From 59715dbdb3582707e0b2d87c797f5e21ea716894 Mon Sep 17 00:00:00 2001 From: Boud Roukema Date: Thu, 23 Apr 2020 01:50:56 +0200 Subject: 4.3.6 Project analysis - configure files Length reduction by about 15 words. A semantically significant change is from `leading to more robust scientific results` to `evolves in the case of exploratory research papers, and better self-consistency in hypothesis testing papers`. I said this in a previous commit, but it can't hurt repeating: In the covidian epoch (though not only), it is especially important to distinguish bayesian type exploratory research (typical in astronomy or searching for a good COVID-19 treatment or vaccine) from hypothesis testing (clinical testing in double-blind random access trials with clinical trials methods published on a public registry prior to the trials taking place). In the latter case, you want your results to be analysed consistently with the plan published before the trials even begin, and ideally you want them to be published (or at least posted on the trial registry website) even if your results are insignificant, to avoid a publication bias in favour of significant results. Test homeopathy against placebos in 1000 independent experiments, analyse them all the same way, and 2-3 experiments will be significant at the 3 sigma level... --- paper.tex | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/paper.tex b/paper.tex index 1ee0bbd..773f835 100644 --- a/paper.tex +++ b/paper.tex @@ -598,25 +598,26 @@ Figure \ref{fig:inputconf} shows the corresponding \inlinecode{INPUTS.conf} wher \subsubsection{Configuration files} \label{sec:configfiles} -The subMakefiles discussed above should only contain the organization of an analysis, they should not contain any fixed numbers, settings or parameters, as -such elements should only be used as variables which are defined in configuration files. -Configuration files enable the logical separation between the low-level implementation and high-level running of a project. -In the data lineage plot of Figure \ref{fig:datalineage}, configuration files are shown as the sharp-edged, green \inlinecode{*.conf} files in the top row (for example, the \inlinecode{INPUTS.conf} file that was shown in Figure \ref{fig:inputconf} and mentioned in Section \ref{sec:download}). -All the configuration files of a project are placed under the \inlinecode{reproduce/analysis/config} (see Figure \ref{fig:files}) subdirectory, and are loaded into \inlinecode{top-make.mk} before any of the subMakefiles, see Figure \ref{fig:topmake}. - -The demo analysis of Section \ref{sec:analysis} is a good demonstration of their usage: during that discussion we reported the number of papers studied by M20 in \menkenumpapersdemoyear. -However, the year's number is not written by hand in \inlinecode{demo-plot.mk}. -It is referenced through the \inlinecode{menke-year-demo} variable, which is defined in \inlinecode{menke-demo-year.conf}, that is a prerequisite of the \inlinecode{demo-plot.tex} rule. +The subMakefiles discussed above should only organize the analysis, they should not contain any fixed numbers, settings or parameters, which +should instead be set as variables in configuration files. +Configuration files logically separate the low-level implementation from the high-level running of a project. +In the data lineage plot of Figure \ref{fig:datalineage}, configuration files are shown as sharp-edged, green \inlinecode{*.conf} boxes in the top row (for example, the file \inlinecode{INPUTS.conf} that was shown in Figure \ref{fig:inputconf} and mentioned in Section \ref{sec:download}). +All the configuration files of a project are placed under the \inlinecode{reproduce/analysis/config} (see Figure \ref{fig:files}) subdirectory, and are loaded into \inlinecode{top-make.mk} before any of the subMakefiles (Figure \ref{fig:topmake}). + +The example analysis in Section \ref{sec:analysis}, in which we reported the number of papers studied by M20 in \menkenumpapersdemoyear, illustrates this. +The year ``\menkenumpapersdemoyear'' is not written by hand in \inlinecode{demo-plot.mk}.\sloppy +It is referenced through the \inlinecode{menke-year-demo} variable, which is defined in \inlinecode{menke-demo-year.conf}, which is a prerequisite of the \inlinecode{demo-plot.tex} rule. This is also visible in the data lineage of Figure \ref{fig:datalineage}. -If we later would decide to report the number in another year, we would simply have to change the value in \inlinecode{menke-demo-year.conf}. -A configuration file is a prerequisite of the target that uses it, hence its date will be newer than \inlinecode{demo-plot.tex}. -Therefore Make will re-execute the recipe to generate the macro file before this paper is re-built and the corresponding year and value will be updated in this paper, always in synchronization with each other and no matter how many times they are used. +If we wished to report the number in a different year, it would be sufficient to change the value in \inlinecode{menke-demo-year.conf}. +A configuration file is a prerequisite of the target that uses it, so its timestamp will be newer than \inlinecode{demo-plot.tex}. +Thus, Make will re-execute the recipe to generate the macro file before this paper is re-built and the corresponding year and value will be updated in this paper, always in synchronization with each other and no matter how many times they are used. Combined with the fact that all source files in Maneage are under version control, this encourages testing of various settings of the -analysis as the project evolves, leading to more robust scientific results. +analysis as the project evolves in the case of exploratory research papers, and better self-consistency in hypothesis testing papers. \subsubsection{Project initialization (\inlinecode{initialize.mk})} \label{sec:initialize} +\fussy The \inlinecode{initial\-ize\-.mk} subMakefile is present in all projects and is the first subMakefile that is loaded into \inlinecode{top-make.mk} (see Figure \ref{fig:datalineage}). It does not contain any analysis or major processing steps, it just initializes the system by setting the necessary Make environment as well as other general jobs like defining the Git commit hash of the run as a \LaTeX{} (\inlinecode{\textbackslash{}projectversion}) macro that can be loaded into the narrative. Papers using Maneage usually put this hash as the last word in their abstract, for example, see \citet{akhlaghi19} and \citet{infante20}. -- cgit v1.2.1