diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 630 |
1 files changed, 70 insertions, 560 deletions
@@ -292,571 +292,81 @@ light and should be very fast. -### Building in Docker containers - -Docker containers are a common way to build projects in an independent -filesystem, and an almost independent operating system. Containers thus -allow using GNU/Linux operating systems within proprietary operating -systems like macOS or Windows. But without the overhead and huge file size -of virtual machines. Furthermore containers allow easy movement of built -projects from one system to another without rebuilding. Just note that -Docker images are large binary files (+1 Gigabytes) and may not be usable -in the future (for example with new Docker versions not reading old -images). Containers are thus good for temporary/testing phases of a -project, but shouldn't be what you archive for the long term! - -Hence if you want to save and move your maneaged project within a Docker -image, be sure to commit all your project's source files and push them to -your external Git repository (you can do these within the Docker image as -explained below). This way, you can always recreate the container with -future technologies too. Generally, if you are developing within a -container, its good practice to recreate it from scratch every once in a -while, to make sure you haven't forgot to include parts of your work in -your project's version-controlled source. In the sections below we also -describe how you can use the container **only for the software -environment** and keep your data and project source on your host. - -#### Dockerfile for a Maneaged project, and building a Docker image - -Below is a series of recommendations on the various components of a -`Dockerfile` optimized to store the *built state of a maneaged project* as -a Docker image. Each component is also accompanied with -explanations. Simply copy the code blocks under each item into a plain-text -file called `Dockerfile`, in the same order of the items. Don't forget to -implement the suggested corrections (in particular step 4). - -**NOTE: Internet for TeXLive installation:** If you have the project -software tarballs and input data (optional features described below) you -can disable internet. In this situation, the configuration and analysis -will be exactly reproduced, the final LaTeX macros will be created, and all -results will be verified successfully. However, no final `paper.pdf` will -be created to visualize/combine everything in one easy-to-read file. Until -[task 15267](https://savannah.nongnu.org/task/?15267) is complete, we need -internet to install TeXLive packages (using TeXLive's own package manager -`tlmgr`) in the `./project configure` phase. This won't stop the -configuration, and it will finish successfully (since all the analysis can -still be reproduced). We are working on completing this task as soon as -possible, but until then, if you want to disable internet *and* you want to -build the final PDF, please disable internet after the configuration -phase. Note that only the necessary TeXLive packages are installed (~350 -MB), not the full TeXLive collection! - - 0. **Summary:** If you are already familiar with Docker, then the full - Dockerfile to get the project environment setup is shown here (without - any comments or explanations, because explanations are done in the next - items). Note that the last two `COPY` lines (to copy the directory - containing software tarballs used by the project and the possible input - databases) are optional because they will be downloaded if not - available. You can also avoid copying over all, and simply mount your - host directories within the image, we have a separate section on doing - this below ("Only software environment in the Docker image"). Once you - build the Docker image, your project's environment is setup and you can - go into it to run `./project make` manually. - - ```shell - FROM debian:stable-slim - RUN apt update && apt install -y gcc g++ wget - RUN useradd -ms /bin/sh maneager - RUN printf '123\n123' | passwd root - USER maneager - WORKDIR /home/maneager - RUN mkdir build - RUN mkdir software - COPY --chown=maneager:maneager ./project-source /home/maneager/source - COPY --chown=maneager:maneager ./software-dir /home/maneager/software - COPY --chown=maneager:maneager ./data-dir /home/maneager/data - RUN cd /home/maneager/source \ - && ./project configure --build-dir=/home/maneager/build \ - --software-dir=/home/maneager/software \ - --input-dir=/home/maneager/data - ``` - - 1. **Choose the base operating system:** The first step is to select the - operating system that will be used in the docker image. Note that your - choice of operating system also determines the commands of the next - step to install core software. - - ```shell - FROM debian:stable-slim - ``` - - 2. **Maneage dependencies:** By default the "slim" versions of the - operating systems don't contain a compiler (needed by Maneage to - compile precise versions of all the tools). You thus need to use the - selected operating system's package manager to import them (below is - the command for Debian). Optionally, if you don't have the project's - software tarballs, and want the project to download them automatically, - you also need a downloader. - - ```shell - # C and C++ compiler. - RUN apt update && apt install -y gcc g++ - - # Uncomment this if you don't have 'software-XXXX.tar.gz' (below). - #RUN apt install -y wget - ``` - - 3. **Define a user:** Some core software packages will complain if you try - to install them as the default (root) user. Generally, it is also good - practice to avoid being the root user. Hence with the commands below we - define a `maneager` user and activate it for the next steps. But just - in case root access is necessary temporarily, with the `passwd` - command, we are setting the root password to `123`. - - ```shell - RUN useradd -ms /bin/sh maneager - RUN printf '123\n123' | passwd root - USER maneager - WORKDIR /home/maneager - ``` - - 4. **Copy project files into the container:** these commands make the - assumptions listed below. IMPORTANT: you can also avoid copying over - all, and simply mount your host directories within the image, we have a - separate section on doing this below ("Only software environment in the - Docker image"). - - * The project's source is in the `maneaged/` sub-directory and this - directory is in the same directory as the `Dockerfile`. The source - can either be from cloned from Git (highly recommended!) or from a - tarball. Both are described above (note that arXiv's tarball needs to - be corrected as mentioned above). - - * (OPTIONAL) By default the project's necessary software source - tarballs will be downloaded when necessary during the `./project - configure` phase. But if you already have the sources, its better to - use them and not waste network traffic (and resulting carbon - footprint!). Maneaged projects usually come with a - `software-XXXX.tar.gz` file that is published on Zenodo (link above). - If you have this file, put it in the same directory as your - `Dockerfile` and include the relevant lines below. - - * (OPTIONAL) The project's input data. The `INPUT-FILES` depends on the - project, please look into the project's - `reproduce/analysis/config/INPUTS.conf` for the URLs and the file - names of input data. Similar to the software source files mentioned - above, if you don't have them, the project will attempt to download - its necessary data automatically in the `./project make` phase. - - ```shell - # Make the project's build directory and copy the project source - RUN mkdir build - COPY --chown=maneager:maneager ./maneaged /home/maneager/source - - # Optional (for software) - COPY --chown=maneager:maneager ./software-XXXX.tar.gz /home/maneager/ - RUN tar xf software-XXXX.tar.gz && mv software-XXXX software && rm software-XXXX.tar.gz - - # Optional (for data) - RUN mkdir data - COPY --chown=maneager:maneager ./INPUT-FILES /home/maneager/data - ``` - - 5. **Configure the project:** With this line, the Docker image will - configure the project (build all its necessary software). This will - usually take about an hour on an 8-core system. You can also optionally - avoid putting this step (and the next) in the `Dockerfile` and simply - execute them in the Docker image in interactive mode (as explained in - the sub-section below, in this case don't forget to preserve the build - container after you are done). - - ```shell - # Configure project (build full software environment). - RUN cd /home/maneager/source \ - && ./project configure --build-dir=/home/maneager/build \ - --software-dir=/home/maneager/software \ - --input-dir=/home/maneager/data - ``` - - 6. **Project's analysis:** With this line, the Docker image will do the - project's analysis and produce the final `paper.pdf`. The time it takes - for this step to finish, and the storage/memory requirements highly - depend on the particular project. - - ```shell - # Run the project's analysis - RUN cd /home/maneager/source && ./project make - ``` - - 7. **Build the Docker image:** The `Dockerfile` is now ready! In the - terminal, go to its directory and run the command below to build the - Docker image. We recommend to keep the `Dockerfile` in **an empty - directory** and run it from inside that directory too. This is because - Docker considers that directories contents to be part of the - environment. Finally, just set a `NAME` for your project and note that - Docker only runs as root. - - ```shell - sudo su - docker build -t NAME ./ - ``` - - - -#### Interactive tests on built container - -If you later want to start a container with the built image and enter it in -interactive mode (for example for temporary tests), please run the -following command. Just replace `NAME` with the same name you specified -when building the project. You can always exit the container with the -`exit` command (note that all your changes will be discarded once you exit, -see below if you want to preserve your changes after you exit). - -```shell -docker run -it NAME -``` - - - -#### Running your own project's shell for same analysis environment - -The default operating system only has minimal features: not having many of -the tools you are accustomed to in your daily command-line operations. But -your maneaged project has a very complete (for the project!) environment -which is fully built and ready to use interactively with the commands -below. For example the project also builds Git within itself, as well as -many other high-level tools that are used in your project and aren't -present in the container's operating system. - -```shell -# Once you are in the docker container -cd source -./project shell -``` - - - -#### Preserving the state of a built container - -All interactive changes in a container will be deleted as soon as you exit -it. THIS IS A VERY GOOD FEATURE IN GENERAL! If you want to make persistent -changes, you should do it in the project's plain-text source and commit -them into your project's online Git repository. As described in the Docker -introduction above, we strongly recommend to **not rely on a built container -for archival purposes**. - -But for temporary tests it is sometimes good to preserve the state of an -interactive container. To do this, you need to `commit` the container (and -thus save it as a Docker "image"). To do this, while the container is still -running, open another terminal and run these commands: - -```shell -# These two commands should be done in another terminal -docker container list - -# Get 'XXXXXXX' of your desired container from the first column above. -# Give the new image a name by replacing 'NEW-IMAGE-NAME'. -docker commit XXXXXXX NEW-IMAGE-NAME -``` - - - -#### Copying files from the Docker image to host operating system - -The Docker environment's file system is completely indepenent of your host -operating system. One easy way to copy files to and from an open container -is to use the `docker cp` command (very similar to the shell's `cp` -command). - -```shell -docker cp CONTAINER:/file/path/within/container /host/path/target -``` - - - - - -#### Only software environment in the Docker image - -You can set the docker image to only contain the software environment and -keep the project source and built analysis files (data and PDF) on your -host operating system. This enables you to keep the size of the Docker -image to a minimum (only containing the built software environment) to -easily move it from one computer to another. Below we'll summarize the -steps. - - 1. Get your user ID with this command: `id -u`. - - 2. Make a new (empty) directory called `docker` temporarily (will be - deleted later). - - ```shell - mkdir docker-tmp - cd docker-tmp - ``` - - 3. Make a `Dockerfile` (within the new/empty directory) with the - following contents. Just replace `UID` with your user ID (found in - step 1 above). Note that we are manually setting the `maneager` (user) - password to `123` and the root password to '456' (both should be - repeated because they must be confirmed by `passwd`). To install other - operating systems, just change the contents on the `FROM` line. For - example, for CentOS 7 you can use `FROM centos:centos7`, for the - latest CentOS, you can use `FROM centos:latest` (you may need to add - this line `RUN yum install -y passwd` before the `RUN useradd ...` - line.). - - ``` - FROM debian:stable-slim - RUN useradd -ms /bin/sh --uid UID maneager; \ - printf '123\n123' | passwd maneager; \ - printf '456\n456' | passwd root - USER maneager - WORKDIR /home/maneager - RUN mkdir build; mkdir build/analysis - ``` - - 4. Create a Docker image based on the `Dockerfile` above. Just replace - `MANEAGEBASE` with your desired name (this won't be your final image, - so you can safely use a name like `maneage-base`). Note that you need - to have root/administrator previlages when running it, so - - ```shell - sudo docker build -t MANEAGEBASE ./ - ``` - - 5. You don't need the temporary directory any more (the docker image is - saved in Docker's own location, and accessible from anywhere). - - ```shell - cd .. - rm -rf docker-tmp - ``` - - 6. Put the following contents into a newly created plain-text file called - `docker-run`, while setting the mandatory variables based on your - system. The name `docker-run` is already inside Maneage's `.gitignore` - file, so you don't have to worry about mistakenly commiting this file - (which contains private information: directories in this computer). - - ``` - #!/bin/sh - # - # Create a Docker container from an existing image of the built - # software environment, but with the source, data and build (analysis) - # directories directly within the host file system. This script should - # be run in the top project source directory (that has 'README.md' and - # 'paper.tex'). If not, replace the '$(pwd)' part with the project - # source directory. - - # MANDATORY: Name of Docker container - docker_name=MANEAGEBASE - - # MANDATORY: Location of "build" directory on this system (to host the - # 'analysis' sub-directory for output data products and possibly others). - build_dir=/PATH/TO/THIS/PROJECT/S/BUILD/DIR - - # OPTIONAL: Location of project's input data in this system. If not - # present, a 'data' directory under the build directory will be created. - data_dir=/PATH/TO/THIS/PROJECT/S/DATA/DIR - - # OPTIONAL: Location of software tarballs to use in building Maneage's - # internal software environment. - software_dir=/PATH/TO/SOFTWARE/TARBALL/DIR - - - - - - # Internal proceessing - # -------------------- - # - # Sanity check: Make sure that the build directory actually exists. - if ! [ -d $build_dir ]; then - echo "ERROR: '$build_dir' doesn't exist"; exit 1; - fi - - # If the host operating system has '/dev/shm', then give Docker access - # to it also for improved speed in some scenarios (like configuration). - if [ -d /dev/shm ]; then shmopt="-v /dev/shm:/dev/shm"; - else shmopt=""; fi - - # If the 'analysis' and 'data' directories (that are mounted), don't exist, - # then create them (otherwise Docker will create them as 'root' before - # creating the container, and we won't have permission to write in them. - analysis_dir="$build_dir"/analysis - if ! [ -d $analysis_dir ]; then mkdir $analysis_dir; fi - - # If the data or software directories don't exist, put them in the build - # directory (they will remain empty, but this helps in simplifiying the - # mounting command!). - if ! [ x$data_dir = x ]; then - data_dir="$build_dir"/data - if ! [ -d $data_dir ]; then mkdir $data_dir; fi - fi - if ! [ x$software_dir = x ]; then - software_dir="$build_dir"/tarballs-software - if ! [ -d $software_dir ]; then mkdir $software_dir; fi - fi - - # Run the Docker image while setting up the directories. - sudo docker run -v "$software_dir":/home/maneager/tarballs-software \ - -v "$analysis_dir":/home/maneager/build/analysis \ - -v "$data_dir":/home/maneager/data \ - -v "$(pwd)":/home/maneager/source \ - $shmopt -it $docker_name - ``` - - 7. Make the `docker-run` script executable. - - ```shell - chmod +x docker-run - ``` - - 8. Start the Docker daemon (root permissions required). If the operating - system uses systemd you can use the command below. If you want the - Docker daemon to be available after a reboot also (so you don't have - to restart it after turning off your computer), run this command again - but replacing `start` with `enable`. - - ```shell - systemctl start docker - ``` - - 9. You can now start the Docker image by executing your newly added - script like below (it will ask for your root password). You will - notice that you are in the Docker container with the changed prompt. - - ```shell - ./docker-run - ``` - - 10. You are now within the container. First, we'll add the GNU C and C++ - compilers (which are necessary to build our own programs in Maneage) - and the GNU WGet downloader (which may be necessary if you don't have - a core software's tarball already). Maneage will build pre-defined - versions of both and will use them. But for the very first packages, - they are necessary. In the process, by setting the `PS1` environment - variable, we'll define a color-coding for the interactive shell prompt - (red for root and purple for the user). If you build another operating - system, replace the `apt` commands accordingly (for example on CentOS, - you don't need the `apt update` line and you should use `yum install - -y gcc gcc-c++ wget glibc-static` to install the three basic - dependencies). - - ```shell - su - echo 'export PS1="[\[\033[01;31m\]\u@\h \W\[\033[32m\]\[\033[00m\]]# "' >> ~/.bashrc - source ~/.bashrc - apt update - apt install -y gcc g++ wget - exit - echo 'export PS1="[\[\033[01;35m\]\u@\h \W\[\033[32m\]\[\033[00m\]]$ "' >> ~/.bashrc - source ~/.bashrc - ``` - - 11. Now that the compiler is ready, we can start Maneage's - configuration. So let's go into the project source directory and run - these commands to build the software environment. - - ```shell - cd source - ./project configure --input-dir=/home/maneager/data \ - --build-dir=/home/maneager/build \ - --software-dir=/home/maneager/tarballs-software - ``` - - 12. After the configuration finishes successfully, it will say so. It will - then ask you to run `./project make`. **But don't do that - yet**. Keep this Docker container open and don't exit the container or - terminal. Open a new terminal, and follow the steps described in the - sub-section above to preserve (or "commit") the built container as a - Docker image. Let's assume you call it `MY-PROJECT-ENV`. After the new - image is made, you should be able to see the new image in the list of - images with this command (in yet another terminal): - - ```shell - docker image list # In the other terminal. - ``` - - 13. Now that you have safely "committed" your current Docker container - into a separate Docker image, you can **exit the container** safely - with the `exit` command. Don't worry, you won't loose the built - software environment: it is all now saved separately within the Docker - image. - - 14. Re-open your `docker-run` script and change `MANEAGEBASE` to - `MY-PROJECT-ENV` (or any other name you set for the environment you - committed above). - - ```shell - emacs docker-run - ``` - - 15. That is it! You can now always easily enter your container (only for - the software environemnt) with the command below. Within the - container, any file you save/edit in the `source` directory of the - docker container is the same file on your host OS and any file you - build in your `build/analysis` directory (within the Maneage'd - project) will be on your host OS. You can even use your container's - Git to store the history of your project in your host OS. See the next - step in case you want to move your built software environment to - another computer. - - ```shell - ./docker-run - ``` - - 16. In case you want to store the image as a single file as backup or to - move to another computer, you can run the commands below. They will - produce a single `project-env.tar.gz` file. - - ```shell - docker save -o my-project-env.tar MY-PROJECT-ENV - gzip --best project-env.tar - ``` - - 17. To load the tarball above into a clean docker environment (for example - on another system) copy the `my-project-env.tar.gz` file there and run - the command below. You can then create the `docker-run` script for - that system and run it to enter. Just don't forget that if your - `analysis_dir` directory is empty on the new/clean system. So you - should first run the same `./project configure ...` command above in - the docker image so it connects the environment to your source. Don't - worry, it won't build any software and should finish in a second or - two. Afterwards, you can safely run `./project make` and continue - working like you did on the old system. - - ```shell - docker load --input my-project-env.tar.gz - ``` - - - - - -#### Deleting all Docker images - -After doing your tests/work, you may no longer need the multi-gigabyte -files images, so its best to just delete them. To do this, just run the two -commands below to first stop all running containers and then to delete all -the images: - -```shell -docker ps -a -q | xargs docker rm -docker images -a -q | xargs docker rmi -f -``` - - - - - -### Copyright information - -This file and `.file-metadata` (a binary file, used by Metastore to store -file dates when doing Git checkouts) are part of the reproducible project -mentioned above and share the same copyright notice (at the start of this -file) and license notice (below). - -This project is free software: you can redistribute it and/or modify it -under the terms of the GNU General Public License as published by the Free +### Building in containers + +Containers are a common way to build projects in an independent filesystem +and an almost independent operating system without the overhead (in size +and speed) of a virtual machine. As a result, containers allow easy +movement of built projects from one system to another without +rebuilding. However, they are still large binary files (+1 Gigabytes) and +may not be usable in the future (for example with new software versions not +reading old images or old/new kernel issues). Containers are thus good for +execution/testing phases of a project, but shouldn't be what you archive +for the long term! + +It is therefore very important that if you want to save and move your +maneaged project within containers, be sure to commit all your project's +source files and push them to your external Git repository (you can do +these within the container as explained below). This way, you can always +recreate the container with future technologies too. Generally, if you are +developing within a container, its good practice to recreate it from +scratch every once in a while, to make sure you haven't forgot to include +parts of your work in your project's version-controlled source. In the +sections below we also describe how you can use the container **only for +the software environment** and keep your data and project source on your +host. + +If you have the necessary software tarballs and input data (optional +features described below) you can disable internet. In this situation, the +configuration and analysis will be exactly reproduced, the final LaTeX +macros will be created, and all results will be verified +successfully. However, no final `paper.pdf` will be created to +visualize/combine everything in one easy-to-read file. Until [task +15267](https://savannah.nongnu.org/task/?15267) is complete, Maneage only +needs internet to install TeXLive packages (using TeXLive's own package +manager `tlmgr`) in the `./project configure` phase. This won't stop the +configuration (since all the analysis can still be reproduced). We are +working on completing this task as soon as possible, but until then, if you +want to disable internet *and* you want to build the final PDF, please +disable internet after the configuration phase. Note that only the +necessary TeXLive packages are installed (~350 MB), not the full TeXLive +collection! + +The container technologies that Maneage has been tested on an documentation +exists in this project (with the `reproduce/software/containers` directory) +are listed below. See the respective `README-*.md` file in that directory +for the details: + + - [Apptainer](https://apptainer.org): useful in high performance + computing (HPC) facilities (where you do not have root + permissions). Apptainer is fully free and open source software. + Apptainer containers can only be created and used on GNU/Linux + operating systems, but are stored as files (easy to manage). + + - [Docker](https://www.docker.com): requires root access, but useful on + virtual private servers (VPSs). Docker images are stored and managed by + a root-level daemon, so you can only manage them through its own + interface. A docker container build on a GNU/Linux host can also be + executed on Windows or macOS. However, while the Docker engine and its + command-line interface on GNU/Linux are free and open source software, + its desktop application (with a GUI and components necessary for + Windows or macOS) is not (requires payment for large companies). + + + + + +## Copyright information + +This file is free software: you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. -This project is distributed in the hope that it will be useful, but WITHOUT +This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along -with this project. If not, see <https://www.gnu.org/licenses/>. +with this file. If not, see <https://www.gnu.org/licenses/>. |