Reproducible source for Akhlaghi et al. (2020, arXiv:2006.03018) ---------------------------------------------------------------- Copyright (C) 2018-2020 Mohammad Akhlaghi \ See the end of the file for license conditions. This is the reproducible project source for the paper titled "**Towards Long-term and Archivable Reproducibility**", by Mohammad Akhlaghi, Raúl Infante-Sainz, Boudewijn F. Roukema, David Valls-Gabaud, Roberto Baena-Gallé, see [arXiv:2006.03018](https://arxiv.org/abs/2006.03018) or [zenodo.3872247](https://doi.org/10.5281/zenodo.3872247). To learn more about the purpose, principles and technicalities of this reproducible paper, please see `README-hacking.md`. In the "Quick start" section below we show a minimal set of commands to clone, and reproduce the full project using Git. In the next section the commands are explained more. The following section describes how to deal with a tarball of the project's source (not using Git). In the last section building the project within a Docker container is described. ### Quick start (using Git, with internet access) Run these commands to clone this project's history, enter it, configure it (let it build and install its own software) and "make it (let it do reproduce its analysis). If you already have the project on your system, you can ignore the first step (cloning). In the core Maneage branch, all operations will be done in the build-directory that you specify at configure time, no root permissions are required and no other part of your filesystem is affected. ```shell $ git clone https://gitlab.com/makhlaghi/maneage-paper $ cd maneage-paper $ ./project configure $ ./project make ``` ### Building the project This project was designed to have as few dependencies as possible without requiring root/administrator permissions. 1. Necessary dependencies: 1.1: Minimal software building tools like a C compiler and other very basic POSIX tools found on any Unix-like operating system (GNU/Linux, BSD, Mac OS, and others). All necessary dependencies will be built from source (for use only within this project) by the `./project configure` script (next step). Note that **Git is not mandatory**: if you don't have Git to run the first command above, go to the URL given in the command on your browser, and download the project's source (there is a button to download a compressed tarball of the project). You can also get project's source as a tarball from arXiv or Zenodo. 1.2: (OPTIONAL) Tarball of dependencies. If they are already present (in a directory given at configuration time), they will be used. Otherwise, a downloader (`wget` or `curl`) will be necessary to download any necessary tarball. The necessary tarballs are also collected in the archived project on Zenodo (link below). Just unpack that tarball, and when `./project configure` asks for the "software tarball directory", give the address of the unpacked directory: https://doi.org/10.5281/zenodo.3911395 2. Configure the environment (top-level directories in particular) and build all the necessary software for use in the next step. It is recommended to set directories outside the current directory. Please read the description of each necessary input clearly and set the best value. Note that the configure script also downloads, builds and locally installs (only for this project, no root privileges necessary) many programs (project dependencies). So it may take a while to complete. ```shell $ ./project configure ``` 3. Run the following command to reproduce all the analysis and build the final `paper.pdf` on `8` threads. If your CPU has a different number of threads, change the number (you can see the number of threads available to your operating system by running `./.local/bin/nproc`) ```shell $ ./project make -j8 ``` ### Building project tarball (without Git) If the paper is also published on arXiv, it is highly likely that the authors also uploaded/published the full project there along with the LaTeX sources. If you have downloaded (or plan to download) this source from arXiv, some minor extra steps are necessary as listed below. This is because this tarball is mainly tailored to automatic creation of the final PDF without actually using the './project' command! You can directly run 'latex' on this directory and the paper will be built with no analysis (all necessary built products are already included). #### Only building PDF using tarball (no analysis) 1. If you got the tarball from arXiv and the arXiv code for the paper is 1234.56789, then the downloaded source will be called `1234.56789` (no special identification suffix). However, it is actually a `.tar.gz` file. So take these steps to unpack it to see its contents. ```shell $ arxiv=1234.56789 $ mv $arxiv $arxiv.tar.gz $ mkdir $arxiv $ cd $arxiv $ tar xf ../$arxiv.tar.gz ``` 2. No matter how you got the tarball, if you just want to build the PDF paper from the tarball, simply run the command below. Note that this won't actually install any software or do any analysis, it will just use your host operating system to build the PDF and assums you already have all the necessary LaTeX packages. ```shell $ make # Build PDF in tarball without doing analysis ``` 3. If you want to re-build the figures from scratch, you need to make the following corrections to the paper's main LaTeX source (`paper.tex`): uncomment (remove the starting `%`) of the line containing `\newcommand{\makepdf}{}`. See the comments above it for more information. #### Building full project from tarball (custom software and analysis) Since the tarball is mainly geared to only building only the final PDF, a few small tweaks are necessary to build the full project from scratch (download necessary software and data, build them and run the analysis and finally create the final paper). 1. If you got the tarball from arXiv, before following the standard procedure of projects described at the top of the file above (using the './project' script), its necessary to set its executable flag. arXiv removes the executable flag from the files (for its own security). ```shell $ chmod +x project ``` 2. Make the following change in two of the LaTeX files so LaTeX attempts to build the figures from scratch (to make the tarball, it was configured to avoid building the figures, just using the ones that came with the tarball). - `paper.tex`: uncomment (remove the starting `%`) of the line containing `\newcommand{\makepdf}{}`. See the comments above it for more information. - `tex/src/preamble-pgfplots.tex`: set the `tikzsetexternalprefix` variable value to `tikz/`, so it looks like this: `\tikzsetexternalprefix{tikz/}`. 3. Remove extra files. In order to make sure arXiv can build the paper (resolve conflicts due to different versions of LaTeX packages), it is sometimes necessary to copy raw LaTeX package files in the tarball uploaded to arXiv. Later, we will implement a feature to automatically delete these extra files, but for now, the project's top directory should only have the following contents (where `reproduce` and `tex` are directories). You can safely remove any other file/directory. ```shell $ ls COPYING paper.tex project README-hacking.md README.md reproduce tex ``` ### Building in Docker containers Docker containers are a common way to build projects in an almost independent filesystem, and almost independent operating system without the overheads of a virtual machine. They also allow using a minimal GNU/Linux operating system for each project within proprietary operating systems like macOS or Windows. Furthermore they allow easy movement of built project from one system to another. Just please note that Docker images are large binary files (+1 Gigabytes) and may not be usable in the future. They are mainly good for temporary/testing phases of a project. Hence if you want to save and move your maneaged project as a Docker image, be sure to commit all your project's source files and push them to your external Git repository (you can do these within the Docker image as explained below). #### Constructing the Dockerfile for Maneaged project and building it Below is a series of recommendations on the various components of a `Dockerfile` optimised to store the *built state of a maneaged project* as a Docker image. Each component is also accompanied with explanations. Simply copy the code blocks under each item into a plain-text file in the same order and implement the corrections mentioned in each step (in particular step 4). Then save the plain-text file as `Dockerfile` and run the following command to build the Docker image. Just set a `NAME` for your project and note that Docker only runs as root. ```shell docker build -t NAME ./ ``` **NOTE: Internet necessary for TeXLive:** You can optionally disable the image's internet just after downloading the necessary packages (step 2). However, until [task 15267](https://savannah.nongnu.org/task/?15267) is complete, the project will need internet access to download the necessary TeXLive packages in the `./project configure` phase. TeXLive is needed to build the final PDF. Without TeXLive, the analysis will be exactly reproduced, LaTeX macros will be created and everything will be verified successfully (all in the build directory). However, no PDF will be built to visualize/combine them in one easy-to-read file. 1. **Choose the base operating system:** The first step is to select the operating system that will be used in the docker image. Note that your choice of operating system also determines the commands of the next step to install core software. ```shell FROM debian:stable-slim ``` 2. **The C/C++ compiler:** By default the "slim" versions of the operating systems don't contain a compiler, so you need to use the selected operating system's package manager to include them. It is also recommended to include your favorite text editor so you can modify the project's source files if necessary. ```shell # C and C++ compiler. RUN apt-get update && apt-get install -y gcc g++ # Uncomment this to add a text editor (to modify files later). #RUN apt-get install -y nano ``` 3. **Define a user:** Some core software packages will complain if you try to install them as the default (root) container user. Generally, it is also good practice to avoid being the root user. After building the Docker image, you can always run it as root with this command: `docker run -u 0 -it XXXXXXX` (where `XXXXXXX` is the image identifier). With the commands below we define a `maneager` user and activate it for the next steps. ```shell RUN useradd -ms /bin/sh maneager USER maneager WORKDIR /home/maneager ``` 4. **Copy project files into the container:** these commands make the following assumptions: * The project's source is in the `maneaged-project/` subdirectory of the directory that you will run `docker build` in. The source can either be from cloned from Git (recommended!) or from a tarball. Both are described above (note that arXiv's tarball needs to be corrected as mentioned above). * (OPTIONAL, with internet) By default the project's necessary software source tarballs will be downloaded when necessary during the `./project configure` phase. But if you already have the sources, its better to use them and not waste network traffic (and resulting carbon footprint!). Maneaged projects usually come with a `software-XXXX.tar.gz` file that is published on Zenodo (link above). If you have this file, you put it in the same directory as your `Dockerfile` and include the relevant lines below. * (OPTIONAL, with internet) The project's input data. The `INPUT-FILES` depends on the project, please look into the project's `reproduce/analysis/config/INPUTS.conf` for the URLs and file names. Similar to the software source files, this is not mandatory: if you have internet, the project will download its necessary software automatically in the `./project make` phase. ```shell # Make the project's build directory and copy the project source RUN mkdir build COPY --chown=maneager:maneager ./maneaged-project /home/maneager/source # Optional (for software) COPY --chown=maneager:maneager ./software-XXXX.tar.gz /home/maneager/ RUN tar xf software-XXXX.tar.gz && mv software-XXXX software && rm software-XXXX.tar.gz # Optional (for data) RUN mkdir data COPY --chown=maneager:maneager ./INPUT-FILES /home/maneager/data ``` 5. **Configure the project:** With this line, the Docker image will configure the project (let the project build all its necessary software). This will usually take about an hour on an 8-core system. ```shell RUN cd /home/maneager/source \ && ./project configure --build-dir=/home/maneager/build \ --software-dir=/home/maneager/software \ --input-dir=/home/maneager/data ``` 6. **Do the project's analysis:** You are now ready to add the instruction to automatically reproduce the project's analysis. The length of this step and the storage/memory requirements highly depend on the prarticular project. ```shell RUN cd /home/maneager/source && ./project make ``` #### Interactive tests on built container If you later want to start a container with the built image and enter it in interactive mode (for example for temporary tests), please run the following command. Just replace `NAME` with the same name you specified when building the project. You can always exit the container with the `exit` command. ```shell docker run -it NAME ``` #### Running your own project's shell for same analysis environment But the default operating system has minimal features. You can enter the maneaged project's source directory and use the project's environment to have the same environment as your running project (with easy access to all the software built in the project). For example the project builds Git within itself as well as many other tools that aren't present in the core operating system. ```shell # Once you are in the docker container cd source ./project shell ``` #### Preserving the state of a built container All changes you do in interactive mode will be deleted as soon as you exit the container. THIS IS A VERY GOOD FEATURE! In general, if you want to make persistant changes, you should do it in the project's plain-text source and commit it into your project's online Git repository. But you can also do this within the built container. If you want to preserve the state of your changes after your `exit`, you need to `commit` the container (and thus save it as a Docker "image"). To do this, while the container is still running, in another terminal, run these commands: ```shell # These two commands should be done in another terminal docker container list # Get 'XXXXXXX' from the first column of output above. # Give the new image a name by replacing 'NEW-IMAGE-NAME'. docker commit XXXXXXX NEW-IMAGE-NAME ``` ### Copyright information This file and `.file-metadata` (a binary file, used by Metastore to store file dates when doing Git checkouts) are part of the reproducible project mentioned above and share the same copyright notice (at the start of this file) and license notice (below). This project is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This project is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this project. If not, see .