aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md630
1 files changed, 70 insertions, 560 deletions
diff --git a/README.md b/README.md
index 6e5a2ad..79106ec 100644
--- a/README.md
+++ b/README.md
@@ -292,571 +292,81 @@ light and should be very fast.
-### Building in Docker containers
-
-Docker containers are a common way to build projects in an independent
-filesystem, and an almost independent operating system. Containers thus
-allow using GNU/Linux operating systems within proprietary operating
-systems like macOS or Windows. But without the overhead and huge file size
-of virtual machines. Furthermore containers allow easy movement of built
-projects from one system to another without rebuilding. Just note that
-Docker images are large binary files (+1 Gigabytes) and may not be usable
-in the future (for example with new Docker versions not reading old
-images). Containers are thus good for temporary/testing phases of a
-project, but shouldn't be what you archive for the long term!
-
-Hence if you want to save and move your maneaged project within a Docker
-image, be sure to commit all your project's source files and push them to
-your external Git repository (you can do these within the Docker image as
-explained below). This way, you can always recreate the container with
-future technologies too. Generally, if you are developing within a
-container, its good practice to recreate it from scratch every once in a
-while, to make sure you haven't forgot to include parts of your work in
-your project's version-controlled source. In the sections below we also
-describe how you can use the container **only for the software
-environment** and keep your data and project source on your host.
-
-#### Dockerfile for a Maneaged project, and building a Docker image
-
-Below is a series of recommendations on the various components of a
-`Dockerfile` optimized to store the *built state of a maneaged project* as
-a Docker image. Each component is also accompanied with
-explanations. Simply copy the code blocks under each item into a plain-text
-file called `Dockerfile`, in the same order of the items. Don't forget to
-implement the suggested corrections (in particular step 4).
-
-**NOTE: Internet for TeXLive installation:** If you have the project
-software tarballs and input data (optional features described below) you
-can disable internet. In this situation, the configuration and analysis
-will be exactly reproduced, the final LaTeX macros will be created, and all
-results will be verified successfully. However, no final `paper.pdf` will
-be created to visualize/combine everything in one easy-to-read file. Until
-[task 15267](https://savannah.nongnu.org/task/?15267) is complete, we need
-internet to install TeXLive packages (using TeXLive's own package manager
-`tlmgr`) in the `./project configure` phase. This won't stop the
-configuration, and it will finish successfully (since all the analysis can
-still be reproduced). We are working on completing this task as soon as
-possible, but until then, if you want to disable internet *and* you want to
-build the final PDF, please disable internet after the configuration
-phase. Note that only the necessary TeXLive packages are installed (~350
-MB), not the full TeXLive collection!
-
- 0. **Summary:** If you are already familiar with Docker, then the full
- Dockerfile to get the project environment setup is shown here (without
- any comments or explanations, because explanations are done in the next
- items). Note that the last two `COPY` lines (to copy the directory
- containing software tarballs used by the project and the possible input
- databases) are optional because they will be downloaded if not
- available. You can also avoid copying over all, and simply mount your
- host directories within the image, we have a separate section on doing
- this below ("Only software environment in the Docker image"). Once you
- build the Docker image, your project's environment is setup and you can
- go into it to run `./project make` manually.
-
- ```shell
- FROM debian:stable-slim
- RUN apt update && apt install -y gcc g++ wget
- RUN useradd -ms /bin/sh maneager
- RUN printf '123\n123' | passwd root
- USER maneager
- WORKDIR /home/maneager
- RUN mkdir build
- RUN mkdir software
- COPY --chown=maneager:maneager ./project-source /home/maneager/source
- COPY --chown=maneager:maneager ./software-dir /home/maneager/software
- COPY --chown=maneager:maneager ./data-dir /home/maneager/data
- RUN cd /home/maneager/source \
- && ./project configure --build-dir=/home/maneager/build \
- --software-dir=/home/maneager/software \
- --input-dir=/home/maneager/data
- ```
-
- 1. **Choose the base operating system:** The first step is to select the
- operating system that will be used in the docker image. Note that your
- choice of operating system also determines the commands of the next
- step to install core software.
-
- ```shell
- FROM debian:stable-slim
- ```
-
- 2. **Maneage dependencies:** By default the "slim" versions of the
- operating systems don't contain a compiler (needed by Maneage to
- compile precise versions of all the tools). You thus need to use the
- selected operating system's package manager to import them (below is
- the command for Debian). Optionally, if you don't have the project's
- software tarballs, and want the project to download them automatically,
- you also need a downloader.
-
- ```shell
- # C and C++ compiler.
- RUN apt update && apt install -y gcc g++
-
- # Uncomment this if you don't have 'software-XXXX.tar.gz' (below).
- #RUN apt install -y wget
- ```
-
- 3. **Define a user:** Some core software packages will complain if you try
- to install them as the default (root) user. Generally, it is also good
- practice to avoid being the root user. Hence with the commands below we
- define a `maneager` user and activate it for the next steps. But just
- in case root access is necessary temporarily, with the `passwd`
- command, we are setting the root password to `123`.
-
- ```shell
- RUN useradd -ms /bin/sh maneager
- RUN printf '123\n123' | passwd root
- USER maneager
- WORKDIR /home/maneager
- ```
-
- 4. **Copy project files into the container:** these commands make the
- assumptions listed below. IMPORTANT: you can also avoid copying over
- all, and simply mount your host directories within the image, we have a
- separate section on doing this below ("Only software environment in the
- Docker image").
-
- * The project's source is in the `maneaged/` sub-directory and this
- directory is in the same directory as the `Dockerfile`. The source
- can either be from cloned from Git (highly recommended!) or from a
- tarball. Both are described above (note that arXiv's tarball needs to
- be corrected as mentioned above).
-
- * (OPTIONAL) By default the project's necessary software source
- tarballs will be downloaded when necessary during the `./project
- configure` phase. But if you already have the sources, its better to
- use them and not waste network traffic (and resulting carbon
- footprint!). Maneaged projects usually come with a
- `software-XXXX.tar.gz` file that is published on Zenodo (link above).
- If you have this file, put it in the same directory as your
- `Dockerfile` and include the relevant lines below.
-
- * (OPTIONAL) The project's input data. The `INPUT-FILES` depends on the
- project, please look into the project's
- `reproduce/analysis/config/INPUTS.conf` for the URLs and the file
- names of input data. Similar to the software source files mentioned
- above, if you don't have them, the project will attempt to download
- its necessary data automatically in the `./project make` phase.
-
- ```shell
- # Make the project's build directory and copy the project source
- RUN mkdir build
- COPY --chown=maneager:maneager ./maneaged /home/maneager/source
-
- # Optional (for software)
- COPY --chown=maneager:maneager ./software-XXXX.tar.gz /home/maneager/
- RUN tar xf software-XXXX.tar.gz && mv software-XXXX software && rm software-XXXX.tar.gz
-
- # Optional (for data)
- RUN mkdir data
- COPY --chown=maneager:maneager ./INPUT-FILES /home/maneager/data
- ```
-
- 5. **Configure the project:** With this line, the Docker image will
- configure the project (build all its necessary software). This will
- usually take about an hour on an 8-core system. You can also optionally
- avoid putting this step (and the next) in the `Dockerfile` and simply
- execute them in the Docker image in interactive mode (as explained in
- the sub-section below, in this case don't forget to preserve the build
- container after you are done).
-
- ```shell
- # Configure project (build full software environment).
- RUN cd /home/maneager/source \
- && ./project configure --build-dir=/home/maneager/build \
- --software-dir=/home/maneager/software \
- --input-dir=/home/maneager/data
- ```
-
- 6. **Project's analysis:** With this line, the Docker image will do the
- project's analysis and produce the final `paper.pdf`. The time it takes
- for this step to finish, and the storage/memory requirements highly
- depend on the particular project.
-
- ```shell
- # Run the project's analysis
- RUN cd /home/maneager/source && ./project make
- ```
-
- 7. **Build the Docker image:** The `Dockerfile` is now ready! In the
- terminal, go to its directory and run the command below to build the
- Docker image. We recommend to keep the `Dockerfile` in **an empty
- directory** and run it from inside that directory too. This is because
- Docker considers that directories contents to be part of the
- environment. Finally, just set a `NAME` for your project and note that
- Docker only runs as root.
-
- ```shell
- sudo su
- docker build -t NAME ./
- ```
-
-
-
-#### Interactive tests on built container
-
-If you later want to start a container with the built image and enter it in
-interactive mode (for example for temporary tests), please run the
-following command. Just replace `NAME` with the same name you specified
-when building the project. You can always exit the container with the
-`exit` command (note that all your changes will be discarded once you exit,
-see below if you want to preserve your changes after you exit).
-
-```shell
-docker run -it NAME
-```
-
-
-
-#### Running your own project's shell for same analysis environment
-
-The default operating system only has minimal features: not having many of
-the tools you are accustomed to in your daily command-line operations. But
-your maneaged project has a very complete (for the project!) environment
-which is fully built and ready to use interactively with the commands
-below. For example the project also builds Git within itself, as well as
-many other high-level tools that are used in your project and aren't
-present in the container's operating system.
-
-```shell
-# Once you are in the docker container
-cd source
-./project shell
-```
-
-
-
-#### Preserving the state of a built container
-
-All interactive changes in a container will be deleted as soon as you exit
-it. THIS IS A VERY GOOD FEATURE IN GENERAL! If you want to make persistent
-changes, you should do it in the project's plain-text source and commit
-them into your project's online Git repository. As described in the Docker
-introduction above, we strongly recommend to **not rely on a built container
-for archival purposes**.
-
-But for temporary tests it is sometimes good to preserve the state of an
-interactive container. To do this, you need to `commit` the container (and
-thus save it as a Docker "image"). To do this, while the container is still
-running, open another terminal and run these commands:
-
-```shell
-# These two commands should be done in another terminal
-docker container list
-
-# Get 'XXXXXXX' of your desired container from the first column above.
-# Give the new image a name by replacing 'NEW-IMAGE-NAME'.
-docker commit XXXXXXX NEW-IMAGE-NAME
-```
-
-
-
-#### Copying files from the Docker image to host operating system
-
-The Docker environment's file system is completely indepenent of your host
-operating system. One easy way to copy files to and from an open container
-is to use the `docker cp` command (very similar to the shell's `cp`
-command).
-
-```shell
-docker cp CONTAINER:/file/path/within/container /host/path/target
-```
-
-
-
-
-
-#### Only software environment in the Docker image
-
-You can set the docker image to only contain the software environment and
-keep the project source and built analysis files (data and PDF) on your
-host operating system. This enables you to keep the size of the Docker
-image to a minimum (only containing the built software environment) to
-easily move it from one computer to another. Below we'll summarize the
-steps.
-
- 1. Get your user ID with this command: `id -u`.
-
- 2. Make a new (empty) directory called `docker` temporarily (will be
- deleted later).
-
- ```shell
- mkdir docker-tmp
- cd docker-tmp
- ```
-
- 3. Make a `Dockerfile` (within the new/empty directory) with the
- following contents. Just replace `UID` with your user ID (found in
- step 1 above). Note that we are manually setting the `maneager` (user)
- password to `123` and the root password to '456' (both should be
- repeated because they must be confirmed by `passwd`). To install other
- operating systems, just change the contents on the `FROM` line. For
- example, for CentOS 7 you can use `FROM centos:centos7`, for the
- latest CentOS, you can use `FROM centos:latest` (you may need to add
- this line `RUN yum install -y passwd` before the `RUN useradd ...`
- line.).
-
- ```
- FROM debian:stable-slim
- RUN useradd -ms /bin/sh --uid UID maneager; \
- printf '123\n123' | passwd maneager; \
- printf '456\n456' | passwd root
- USER maneager
- WORKDIR /home/maneager
- RUN mkdir build; mkdir build/analysis
- ```
-
- 4. Create a Docker image based on the `Dockerfile` above. Just replace
- `MANEAGEBASE` with your desired name (this won't be your final image,
- so you can safely use a name like `maneage-base`). Note that you need
- to have root/administrator previlages when running it, so
-
- ```shell
- sudo docker build -t MANEAGEBASE ./
- ```
-
- 5. You don't need the temporary directory any more (the docker image is
- saved in Docker's own location, and accessible from anywhere).
-
- ```shell
- cd ..
- rm -rf docker-tmp
- ```
-
- 6. Put the following contents into a newly created plain-text file called
- `docker-run`, while setting the mandatory variables based on your
- system. The name `docker-run` is already inside Maneage's `.gitignore`
- file, so you don't have to worry about mistakenly commiting this file
- (which contains private information: directories in this computer).
-
- ```
- #!/bin/sh
- #
- # Create a Docker container from an existing image of the built
- # software environment, but with the source, data and build (analysis)
- # directories directly within the host file system. This script should
- # be run in the top project source directory (that has 'README.md' and
- # 'paper.tex'). If not, replace the '$(pwd)' part with the project
- # source directory.
-
- # MANDATORY: Name of Docker container
- docker_name=MANEAGEBASE
-
- # MANDATORY: Location of "build" directory on this system (to host the
- # 'analysis' sub-directory for output data products and possibly others).
- build_dir=/PATH/TO/THIS/PROJECT/S/BUILD/DIR
-
- # OPTIONAL: Location of project's input data in this system. If not
- # present, a 'data' directory under the build directory will be created.
- data_dir=/PATH/TO/THIS/PROJECT/S/DATA/DIR
-
- # OPTIONAL: Location of software tarballs to use in building Maneage's
- # internal software environment.
- software_dir=/PATH/TO/SOFTWARE/TARBALL/DIR
-
-
-
-
-
- # Internal proceessing
- # --------------------
- #
- # Sanity check: Make sure that the build directory actually exists.
- if ! [ -d $build_dir ]; then
- echo "ERROR: '$build_dir' doesn't exist"; exit 1;
- fi
-
- # If the host operating system has '/dev/shm', then give Docker access
- # to it also for improved speed in some scenarios (like configuration).
- if [ -d /dev/shm ]; then shmopt="-v /dev/shm:/dev/shm";
- else shmopt=""; fi
-
- # If the 'analysis' and 'data' directories (that are mounted), don't exist,
- # then create them (otherwise Docker will create them as 'root' before
- # creating the container, and we won't have permission to write in them.
- analysis_dir="$build_dir"/analysis
- if ! [ -d $analysis_dir ]; then mkdir $analysis_dir; fi
-
- # If the data or software directories don't exist, put them in the build
- # directory (they will remain empty, but this helps in simplifiying the
- # mounting command!).
- if ! [ x$data_dir = x ]; then
- data_dir="$build_dir"/data
- if ! [ -d $data_dir ]; then mkdir $data_dir; fi
- fi
- if ! [ x$software_dir = x ]; then
- software_dir="$build_dir"/tarballs-software
- if ! [ -d $software_dir ]; then mkdir $software_dir; fi
- fi
-
- # Run the Docker image while setting up the directories.
- sudo docker run -v "$software_dir":/home/maneager/tarballs-software \
- -v "$analysis_dir":/home/maneager/build/analysis \
- -v "$data_dir":/home/maneager/data \
- -v "$(pwd)":/home/maneager/source \
- $shmopt -it $docker_name
- ```
-
- 7. Make the `docker-run` script executable.
-
- ```shell
- chmod +x docker-run
- ```
-
- 8. Start the Docker daemon (root permissions required). If the operating
- system uses systemd you can use the command below. If you want the
- Docker daemon to be available after a reboot also (so you don't have
- to restart it after turning off your computer), run this command again
- but replacing `start` with `enable`.
-
- ```shell
- systemctl start docker
- ```
-
- 9. You can now start the Docker image by executing your newly added
- script like below (it will ask for your root password). You will
- notice that you are in the Docker container with the changed prompt.
-
- ```shell
- ./docker-run
- ```
-
- 10. You are now within the container. First, we'll add the GNU C and C++
- compilers (which are necessary to build our own programs in Maneage)
- and the GNU WGet downloader (which may be necessary if you don't have
- a core software's tarball already). Maneage will build pre-defined
- versions of both and will use them. But for the very first packages,
- they are necessary. In the process, by setting the `PS1` environment
- variable, we'll define a color-coding for the interactive shell prompt
- (red for root and purple for the user). If you build another operating
- system, replace the `apt` commands accordingly (for example on CentOS,
- you don't need the `apt update` line and you should use `yum install
- -y gcc gcc-c++ wget glibc-static` to install the three basic
- dependencies).
-
- ```shell
- su
- echo 'export PS1="[\[\033[01;31m\]\u@\h \W\[\033[32m\]\[\033[00m\]]# "' >> ~/.bashrc
- source ~/.bashrc
- apt update
- apt install -y gcc g++ wget
- exit
- echo 'export PS1="[\[\033[01;35m\]\u@\h \W\[\033[32m\]\[\033[00m\]]$ "' >> ~/.bashrc
- source ~/.bashrc
- ```
-
- 11. Now that the compiler is ready, we can start Maneage's
- configuration. So let's go into the project source directory and run
- these commands to build the software environment.
-
- ```shell
- cd source
- ./project configure --input-dir=/home/maneager/data \
- --build-dir=/home/maneager/build \
- --software-dir=/home/maneager/tarballs-software
- ```
-
- 12. After the configuration finishes successfully, it will say so. It will
- then ask you to run `./project make`. **But don't do that
- yet**. Keep this Docker container open and don't exit the container or
- terminal. Open a new terminal, and follow the steps described in the
- sub-section above to preserve (or "commit") the built container as a
- Docker image. Let's assume you call it `MY-PROJECT-ENV`. After the new
- image is made, you should be able to see the new image in the list of
- images with this command (in yet another terminal):
-
- ```shell
- docker image list # In the other terminal.
- ```
-
- 13. Now that you have safely "committed" your current Docker container
- into a separate Docker image, you can **exit the container** safely
- with the `exit` command. Don't worry, you won't loose the built
- software environment: it is all now saved separately within the Docker
- image.
-
- 14. Re-open your `docker-run` script and change `MANEAGEBASE` to
- `MY-PROJECT-ENV` (or any other name you set for the environment you
- committed above).
-
- ```shell
- emacs docker-run
- ```
-
- 15. That is it! You can now always easily enter your container (only for
- the software environemnt) with the command below. Within the
- container, any file you save/edit in the `source` directory of the
- docker container is the same file on your host OS and any file you
- build in your `build/analysis` directory (within the Maneage'd
- project) will be on your host OS. You can even use your container's
- Git to store the history of your project in your host OS. See the next
- step in case you want to move your built software environment to
- another computer.
-
- ```shell
- ./docker-run
- ```
-
- 16. In case you want to store the image as a single file as backup or to
- move to another computer, you can run the commands below. They will
- produce a single `project-env.tar.gz` file.
-
- ```shell
- docker save -o my-project-env.tar MY-PROJECT-ENV
- gzip --best project-env.tar
- ```
-
- 17. To load the tarball above into a clean docker environment (for example
- on another system) copy the `my-project-env.tar.gz` file there and run
- the command below. You can then create the `docker-run` script for
- that system and run it to enter. Just don't forget that if your
- `analysis_dir` directory is empty on the new/clean system. So you
- should first run the same `./project configure ...` command above in
- the docker image so it connects the environment to your source. Don't
- worry, it won't build any software and should finish in a second or
- two. Afterwards, you can safely run `./project make` and continue
- working like you did on the old system.
-
- ```shell
- docker load --input my-project-env.tar.gz
- ```
-
-
-
-
-
-#### Deleting all Docker images
-
-After doing your tests/work, you may no longer need the multi-gigabyte
-files images, so its best to just delete them. To do this, just run the two
-commands below to first stop all running containers and then to delete all
-the images:
-
-```shell
-docker ps -a -q | xargs docker rm
-docker images -a -q | xargs docker rmi -f
-```
-
-
-
-
-
-### Copyright information
-
-This file and `.file-metadata` (a binary file, used by Metastore to store
-file dates when doing Git checkouts) are part of the reproducible project
-mentioned above and share the same copyright notice (at the start of this
-file) and license notice (below).
-
-This project is free software: you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the Free
+### Building in containers
+
+Containers are a common way to build projects in an independent filesystem
+and an almost independent operating system without the overhead (in size
+and speed) of a virtual machine. As a result, containers allow easy
+movement of built projects from one system to another without
+rebuilding. However, they are still large binary files (+1 Gigabytes) and
+may not be usable in the future (for example with new software versions not
+reading old images or old/new kernel issues). Containers are thus good for
+execution/testing phases of a project, but shouldn't be what you archive
+for the long term!
+
+It is therefore very important that if you want to save and move your
+maneaged project within containers, be sure to commit all your project's
+source files and push them to your external Git repository (you can do
+these within the container as explained below). This way, you can always
+recreate the container with future technologies too. Generally, if you are
+developing within a container, its good practice to recreate it from
+scratch every once in a while, to make sure you haven't forgot to include
+parts of your work in your project's version-controlled source. In the
+sections below we also describe how you can use the container **only for
+the software environment** and keep your data and project source on your
+host.
+
+If you have the necessary software tarballs and input data (optional
+features described below) you can disable internet. In this situation, the
+configuration and analysis will be exactly reproduced, the final LaTeX
+macros will be created, and all results will be verified
+successfully. However, no final `paper.pdf` will be created to
+visualize/combine everything in one easy-to-read file. Until [task
+15267](https://savannah.nongnu.org/task/?15267) is complete, Maneage only
+needs internet to install TeXLive packages (using TeXLive's own package
+manager `tlmgr`) in the `./project configure` phase. This won't stop the
+configuration (since all the analysis can still be reproduced). We are
+working on completing this task as soon as possible, but until then, if you
+want to disable internet *and* you want to build the final PDF, please
+disable internet after the configuration phase. Note that only the
+necessary TeXLive packages are installed (~350 MB), not the full TeXLive
+collection!
+
+The container technologies that Maneage has been tested on an documentation
+exists in this project (with the `reproduce/software/containers` directory)
+are listed below. See the respective `README-*.md` file in that directory
+for the details:
+
+ - [Apptainer](https://apptainer.org): useful in high performance
+ computing (HPC) facilities (where you do not have root
+ permissions). Apptainer is fully free and open source software.
+ Apptainer containers can only be created and used on GNU/Linux
+ operating systems, but are stored as files (easy to manage).
+
+ - [Docker](https://www.docker.com): requires root access, but useful on
+ virtual private servers (VPSs). Docker images are stored and managed by
+ a root-level daemon, so you can only manage them through its own
+ interface. A docker container build on a GNU/Linux host can also be
+ executed on Windows or macOS. However, while the Docker engine and its
+ command-line interface on GNU/Linux are free and open source software,
+ its desktop application (with a GUI and components necessary for
+ Windows or macOS) is not (requires payment for large companies).
+
+
+
+
+
+## Copyright information
+
+This file is free software: you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option)
any later version.
-This project is distributed in the hope that it will be useful, but WITHOUT
+This file is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
You should have received a copy of the GNU General Public License along
-with this project. If not, see <https://www.gnu.org/licenses/>.
+with this file. If not, see <https://www.gnu.org/licenses/>.