aboutsummaryrefslogtreecommitdiff
path: root/reproduction-pipeline.tex
diff options
context:
space:
mode:
authorMohammad Akhlaghi <mohammad@akhlaghi.org>2018-03-13 11:45:40 +0100
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2018-03-13 11:45:40 +0100
commit1f063ee8276a5653ec59939d79f1a999bf7c529d (patch)
treee71389e235cddc64b1a3b31d663024fb2be44628 /reproduction-pipeline.tex
parent10f9dfe4801268b68d8526ba4507abb324cf349c (diff)
Renamed main file to reproducible-paper.tex
The name of the repository and the various links to it are `reproducibile-paper', so its better that the slides also have a similar name.
Diffstat (limited to 'reproduction-pipeline.tex')
-rw-r--r--reproduction-pipeline.tex258
1 files changed, 0 insertions, 258 deletions
diff --git a/reproduction-pipeline.tex b/reproduction-pipeline.tex
deleted file mode 100644
index d9ce898..0000000
--- a/reproduction-pipeline.tex
+++ /dev/null
@@ -1,258 +0,0 @@
-\documentclass[9pt]{beamer}
-
-
-%% Beamer settings.
-\setbeamertemplate{footline}[frame number]
-
-
-%% Packages to import.
-\usepackage{tcolorbox} %For a color-box.
-\usepackage{textcomp} %For a copyright sign.
-
-
-%% To simplify arXiv links
-\newcommand{\arxivlink}[1]{{\footnotesize
- (\textcolor{blue}{\href{https://arxiv.org/abs/#1}{arXiv:#1}})}}
-
-
-
-%% Set the title
-\title{Reproducible scientific research in the era of big data}
-
-
-%% Set the author
-\author{Mohammad Akhlaghi\\\vspace{2mm}\footnotesize Centre de
- Recherche Astrophysique de Lyon({\scriptsize CRAL}),\\Universit\'e de
- Lyon, France.\\
- \vspace{1.5cm}
- \includegraphics[width=3.5cm]{img/muse.png}\\
- \includegraphics[width=1.4cm]{img/cral.png}
- \includegraphics[width=1.9cm]{img/univ-lyon.png}
- \includegraphics[width=1cm]{img/cnrs.png}
- \includegraphics[width=1cm]{img/erc.png}\\
-}
-
-
-%% Set the date and insitutional logos.
-\date{}
-
-
-
-
-
-
-
-
-
-
-\begin{document}
-
- \begin{frame}
- \titlepage
- \end{frame}
-
-
- \begin{frame}{Necessity of (exactly) reproducible research}
- \begin{itemize}
- \setlength\itemsep{0.3cm}
- \item To be considered \alert{scientific}, any result has to be
- reproducible.
- \item The tsunami of data, fast internet, and high processing
- power have made it very easy to \alert{promptly arrive at a
- result}.
- \item But these factors have also greatly increased the
- \alert{complexity} of an analysis. Making it impossible to
- exactly descibe all steps in a published paper.
- \item Most scientific papers thus ignore the ``details'' (as they
- interpret it).
- \item But due to the complexity, even a small deviation from the
- exact result, can be due to many different parts of the
- analysis. Hence, its \alert{critical to exactly reproduce} a
- result.
- \item The software(s) used, configuration file(s), the order of
- steps taken, along with the input data are necessary for
- reproducibility.
- \item \alert{A solution} is proposed here, which if adopted from
- the start, can greatly \alert{simplify a scientific research
- project} and \alert{allow full/exact reproducibility} once it
- is published.
- \end{itemize}
- \end{frame}
-
-
-
- \begin{frame}{Values in final report/paper}
- All necessary analysis/processing \alert{input} and \alert{output}
- values are writen into the final report as \LaTeX{} macros. Shown
- here is a portion of the \textsf{NoiseChisel} paper and its source
- (\textcolor{blue}{\small\href{https://arxiv.org/abs/1505.01664}{arXiv:1505.01664}}).
-
- \vspace{1.2cm}
- \includegraphics[width=\linewidth]{img/reproducible-latex.png}
- \end{frame}
-
- \begin{frame}{Values in final report/paper}
- All necessary analysis/processing \alert{input} and \alert{output}
- values are writen into the final report as \LaTeX{} macros. Shown
- here is a portion of the \textsf{NoiseChisel} paper and its source
- (\textcolor{blue}{\small\href{https://arxiv.org/abs/1505.01664}{arXiv:1505.01664}}).
-
- \vspace{1.2cm}
- \includegraphics[width=\linewidth]{img/reproducible-latex-highlighted.png}
- \end{frame}
-
-
- \begin{frame}{Values are the pipeline's final product}
- All the \LaTeX{} macros (processing inputs and outputs) come from
- a \alert{single file}. This file is the \alert{final product} of
- the \emph{reproduction pipeline}.
-
- \begin{center}
- \includegraphics[width=0.8\linewidth]{img/reproducible-macros.png}
- \end{center}
- \end{frame}
-
-
-
- \begin{frame}{Values are the pipeline's final product}
- All the \LaTeX{} macros (processing inputs and outputs) come from
- a \alert{single file}. This file is the \alert{final product} of
- the \emph{reproduction pipeline}.
-
- \begin{center}
- \includegraphics[width=0.8\linewidth]{img/reproducible-macros-highlighted.png}
- \end{center}
- \end{frame}
-
-
- \begin{frame}{Values written during analysis}
- Various steps of the analysis pipeline write the macro values as
- soon as they are calculated internally.
-
- \begin{center}
- \includegraphics[width=0.8\linewidth]{img/reproducible-write-macro.png}
- \end{center}
- \end{frame}
-
-
- \begin{frame}{Values written during analysis}
- Various steps of the analysis pipeline write the macro values as
- soon as they are calculated internally.
-
- \begin{center}
- \includegraphics[width=0.8\linewidth]{img/reproducible-write-macro-highlight.png}
- \end{center}
- \end{frame}
-
-
- \begin{frame}{Reproducible science: Pipeline is managed through a Makefile}
- \small
- \begin{columns}
- \column{5.5cm}
-
- The whole pipeline is managed by Makefiles (example from
- \textcolor{blue}{\small\href{https://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}}):
-
- \begin{itemize}
- \setlength\itemsep{0.2cm}
- \item Unlike a script which always starts from the top, a
- Makefile \alert{starts from the end} and steps that don't
- change will be left untouched (not remade).
- \item A single \emph{rule} can \alert{manage any number of
- files}. See the examples here where \textsf{NoiseChisel} and
- \textsf{MakeCatalog} are run separately on \alert{$\sim20$
- files} (different filters/fields) with a single rule.
- \item Make can identify independent steps internally and do them
- in \alert{parallel}.
- \item Make was \alert{designed for complex problems} with
- thousands of files (all major Unix-like components), so it is
- highly evolved and efficient.
- \item Make is a very \alert{simple} and \alert{small} language,
- thus easy to learn with great and free documentation (for
- example
- \textcolor{blue}{\href{https://www.gnu.org/software/make/manual/}{GNU
- Make's manual}}, usable to learn all implementations).
- \end{itemize}
-
- \column{5.5cm}
- \includegraphics[width=\linewidth]{img/reproducible-makefile.png}
- \end{columns}
- \end{frame}
-
-
- \begin{frame}{Reproducing the result and report/paper}
- Once software dependencies are installed, the two \alert{simple}
- and \alert{familiar} commands below are enough to exactly
- reproduce the results at any time (as in
- \textcolor{blue}{\small\href{https://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}}):
-
- \begin{itemize}
- \item[] \texttt{\$ ./configure{ }{ }{ }{ }{ }{ }\# To
- define top-level local directories.}
- \item[] \texttt{\$ make{ }{ }{ }{ }{ }{ }{ }{ }{ }{ }{ }{ }{ }\# To reproduce the analysis and paper.}
- \end{itemize}
-
- \vspace{0.5cm} Enabling version control (e.g. \alert{Git}) will
- make it very easy to test different ideas while not harming the
- initial/base result (thus encouraging \alert{creativity} and
- brainstorming during the project).
-
- \vspace{0.5cm} The pipeline can also \alert{download input} data
- from online archives (databases) if not locally available (as in
- \textcolor{blue}{\small\href{https://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}}
- and
- \textcolor{blue}{\href{https://gitlab.com/makhlaghi/reproduction-pipeline-template}{template}}).
-
- \vspace{0.5cm} After publication, \alert{readers} can
- \alert{change} the input configurations and the numbers and
- figures of the reproduced paper will respectively change. This
- encourages creativity and brainstorming after the project as well
- as sharing of (the hardly gained) experiences with the whole
- community.
- \end{frame}
-
-
-
- \begin{frame}{Publication of the pipeline}
-
- A reproduction pipeline like this will have the following
- (\alert{plain text}) components:
- \begin{itemize}
- \item Makefiles.
- \item \LaTeX{} source files.
- \item Configuration files.
- \item Scripts/programming files (e.g., Python, Shell, AWK, C).
- \end{itemize}
- The \alert{volume} of the reproduction pipeline will thus be
- \alert{negligible} compared to a single figure in a paper
- (especially after compression).
-
- \vspace{1.5cm} The reproduction pipeline can be \alert{published} in
- \begin{itemize}
- \item \alert{arXiv}: uploaded with the \TeX{} source to always
- stay with the paper \\(for example
- \textcolor{blue}{\small\href{https://arxiv.org/abs/1505.01664}{arXiv:1505.01664}}). The
- file containing all macros must also be uploaded so arXiv's
- server can easily build the \LaTeX{} source.
- \item \alert{Zenodo}: Along with all the input datasets (many
- Gigabytes) and software \\(for example
- \textcolor{blue}{\small\href{https://doi.org/10.5281/zenodo.1164774}{zenodo.1164774}}) and given a unique DOI.
- \end{itemize}
-
- \end{frame}
-
-
-
- \begin{frame}
- A template/blank pipeline has been written and is ready to use,
- with implementation guidelines and practical tips and
- recommendations:
-
- \textcolor{blue}{\url{https://gitlab.com/makhlaghi/reproducible-paper}}
-
- \vspace{2.5cm}
- Please see this page for more:
-
- \textcolor{blue}{\url{http://akhlaghi.org/reproducible-science.html}}
- \end{frame}
-\end{document}