+%% Beamer settings.
+\setbeamertemplate{footline}[frame number]
+%% Packages to import.
+\usepackage{tcolorbox} %For a color-box.
+\usepackage{textcomp} %For a copyright sign.
+%% To simplify arXiv links
+ (\textcolor{blue}{\href{https://arxiv.org/abs/#1}{arXiv:#1}})}}
+%% Set the title
+\title{Reproducible scientific research in the era of big data}
+%% Set the author
+\author{Mohammad Akhlaghi\\\vspace{2mm}\footnotesize Centre de
+ Recherche Astrophysique de Lyon({\scriptsize CRAL}),\\Universit\'e de
+ Lyon, France.\\
+ \vspace{1.5cm}
+ \includegraphics[width=3.5cm]{img/muse.png}\\
+ \includegraphics[width=1.4cm]{img/cral.png}
+ \includegraphics[width=1.9cm]{img/univ-lyon.png}
+ \includegraphics[width=1cm]{img/cnrs.png}
+ \includegraphics[width=1cm]{img/erc.png}\\
+%% Set the date and insitutional logos.
+ \begin{frame}
+ \titlepage
+ \end{frame}
+ \begin{frame}{Necessity of (exactly) reproducible research}
+ \begin{itemize}
+ \setlength\itemsep{0.3cm}
+ \item To be considered \alert{scientific}, any result has to be
+ reproducible.
+ \item The tsunami of data, fast internet, and high processing
+ power have made it very easy to \alert{promptly arrive at a
+ result}.
+ \item But these factors have also greatly increased the
+ \alert{complexity} of an analysis. Making it impossible to
+ exactly descibe all steps in a published paper.
+ \item Most scientific papers thus ignore the ``details'' (as they
+ interpret it).
+ \item But due to the complexity, even a small deviation from the
+ exact result, can be due to many different parts of the
+ analysis. Hence, its \alert{critical to exactly reproduce} a
+ result.
+ \item The software(s) used, configuration file(s), the order of
+ steps taken, along with the input data are necessary for
+ reproducibility.
+ \item \alert{A solution} is proposed here, which if adopted from
+ the start, can greatly \alert{simplify a scientific research
+ project} and \alert{allow full/exact reproducibility} once it
+ is published.
+ \end{itemize}
+ \end{frame}
+ \begin{frame}{Values in final report/paper}
+ All necessary analysis/processing \alert{input} and \alert{output}
+ values are writen into the final report as \LaTeX{} macros. Shown
+ here is a portion of the \textsf{NoiseChisel} paper and its source
+ (\textcolor{blue}{\small\href{https://arxiv.org/abs/1505.01664}{arXiv:1505.01664}}).
+ \vspace{1.2cm}
+ \includegraphics[width=\linewidth]{img/reproducible-latex.png}
+ \end{frame}
+ \begin{frame}{Values in final report/paper}
+ All necessary analysis/processing \alert{input} and \alert{output}
+ values are writen into the final report as \LaTeX{} macros. Shown
+ here is a portion of the \textsf{NoiseChisel} paper and its source
+ (\textcolor{blue}{\small\href{https://arxiv.org/abs/1505.01664}{arXiv:1505.01664}}).
+ \vspace{1.2cm}
+ \includegraphics[width=\linewidth]{img/reproducible-latex-highlighted.png}
+ \end{frame}
+ \begin{frame}{Values are the pipeline's final product}
+ All the \LaTeX{} macros (processing inputs and outputs) come from
+ a \alert{single file}. This file is the \alert{final product} of
+ the \emph{reproduction pipeline}.
+ \begin{center}
+ \includegraphics[width=0.8\linewidth]{img/reproducible-macros.png}
+ \end{center}
+ \end{frame}
+ \begin{frame}{Values are the pipeline's final product}
+ All the \LaTeX{} macros (processing inputs and outputs) come from
+ a \alert{single file}. This file is the \alert{final product} of
+ the \emph{reproduction pipeline}.
+ \begin{center}
+ \includegraphics[width=0.8\linewidth]{img/reproducible-macros-highlighted.png}
+ \end{center}
+ \end{frame}
+ \begin{frame}{Values written during analysis}
+ Various steps of the analysis pipeline write the macro values as
+ soon as they are calculated internally.
+ \begin{center}
+ \includegraphics[width=0.8\linewidth]{img/reproducible-write-macro.png}
+ \end{center}
+ \end{frame}
+ \begin{frame}{Values written during analysis}
+ Various steps of the analysis pipeline write the macro values as
+ soon as they are calculated internally.
+ \begin{center}
+ \includegraphics[width=0.8\linewidth]{img/reproducible-write-macro-highlight.png}
+ \end{center}
+ \end{frame}
+ \begin{frame}{Reproducible science: Pipeline is managed through a Makefile}
+ \small
+ \begin{columns}
+ \column{5.5cm}
+ The whole pipeline is managed by Makefiles (example from
+ \textcolor{blue}{\small\href{https://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}}):
+ \begin{itemize}
+ \setlength\itemsep{0.2cm}
+ \item Unlike a script which always starts from the top, a
+ Makefile \alert{starts from the end} and steps that don't
+ change will be left untouched (not remade).
+ \item A single \emph{rule} can \alert{manage any number of
+ files}. See the examples here where \textsf{NoiseChisel} and
+ \textsf{MakeCatalog} are run separately on \alert{$\sim20$
+ files} (different filters/fields) with a single rule.
+ \item Make can identify independent steps internally and do them
+ in \alert{parallel}.
+ \item Make was \alert{designed for complex problems} with
+ thousands of files (all major Unix-like components), so it is
+ highly evolved and efficient.
+ \item Make is a very \alert{simple} and \alert{small} language,
+ thus easy to learn with great and free documentation (for
+ example
+ \textcolor{blue}{\href{https://www.gnu.org/software/make/manual/}{GNU
+ Make's manual}}, usable to learn all implementations).
+ \end{itemize}
+ \column{5.5cm}
+ \includegraphics[width=\linewidth]{img/reproducible-makefile.png}
+ \end{columns}
+ \end{frame}
+ \begin{frame}{Reproducing the result and report/paper}
+ Once software dependencies are installed, the two \alert{simple}
+ and \alert{familiar} commands below are enough to exactly
+ reproduce the results at any time (as in
+ \textcolor{blue}{\small\href{https://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}}):
+ \begin{itemize}
+ \item[] \texttt{\$ ./configure{ }{ }{ }{ }{ }{ }\# To
+ define top-level local directories.}
+ \item[] \texttt{\$ make{ }{ }{ }{ }{ }{ }{ }{ }{ }{ }{ }{ }{ }\# To reproduce the analysis and paper.}
+ \end{itemize}
+ \vspace{0.5cm} Enabling version control (e.g. \alert{Git}) will
+ make it very easy to test different ideas while not harming the
+ initial/base result (thus encouraging \alert{creativity} and
+ brainstorming during the project).
+ \vspace{0.5cm} The pipeline can also \alert{download input} data
+ from online archives (databases) if not locally available (as in
+ \textcolor{blue}{\small\href{https://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}}
+ and
+ \textcolor{blue}{\href{https://gitlab.com/makhlaghi/reproduction-pipeline-template}{template}}).
+ \vspace{0.5cm} After publication, \alert{readers} can
+ \alert{change} the input configurations and the numbers and
+ figures of the reproduced paper will respectively change. This
+ encourages creativity and brainstorming after the project as well
+ as sharing of (the hardly gained) experiences with the whole
+ community.
+ \end{frame}
+ \begin{frame}{Publication of the pipeline}
+ A reproduction pipeline like this will have the following
+ (\alert{plain text}) components:
+ \begin{itemize}
+ \item Makefiles.
+ \item \LaTeX{} source files.
+ \item Configuration files.
+ \item Scripts/programming files (e.g., Python, Shell, AWK, C).
+ \end{itemize}
+ The \alert{volume} of the reproduction pipeline will thus be
+ \alert{negligible} compared to a single figure in a paper
+ (especially after compression).
+ \vspace{1.5cm} The reproduction pipeline can be \alert{published} in
+ \begin{itemize}
+ \item \alert{arXiv}: uploaded with the \TeX{} source to always
+ stay with the paper \\(for example
+ \textcolor{blue}{\small\href{https://arxiv.org/abs/1505.01664}{arXiv:1505.01664}}). The
+ file containing all macros must also be uploaded so arXiv's
+ server can easily build the \LaTeX{} source.
+ \item \alert{Zenodo}: Along with all the input datasets (many
+ Gigabytes) and software \\(for example
+ \textcolor{blue}{\small\href{https://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}}) and given a unique DOI.
+ \end{itemize}
+ \end{frame}
+ \begin{frame}
+ A template/blank pipeline has been written and is ready to use,
+ with implementation guidelines and practical tips and
+ recommendations:
+ \textcolor{blue}{\url{https://gitlab.com/makhlaghi/reproducible-paper}}
+ \vspace{2.5cm}
+ Please see this page for more:
+ \textcolor{blue}{\url{http://akhlaghi.org/reproducible-science.html}}
+ \end{frame}