From a1f8947ab7784af4b7e66c617ce19a8bdd9c99ed Mon Sep 17 00:00:00 2001 From: Giacomo Lorenzetti Date: Thu, 27 Feb 2025 17:49:28 +0100 Subject: IMPORTANT: Apptainer and Docker containers, minor restructuring Summary: it is necessary to re-configure your project (just running './project configure -e', not deleting 'build/software' to re-build software) after this commit, see "Affected files" item below). Until now, we only had a relatively long set of manual instructions for building Maneage within Docker in the top-level README. This was hard to automate, focing Maneage users to write custom commands based on the instructions and maintain those scripts outside of Maneage. As a result, experience could not be shared between projects (or at most in the README file!). With this commit, a new 'reproduce/software/containers' directory has been created within Maneage that contains two scripts (with a unified interface) greatly simplifying the building of the project's software environment within a container (one script for Apptainer and one for Docker). Two READMEs have been added for each container to help in their first time usage. Also, the old checklist within the main README has been replaced with a short introduction on containers and points the interested readers to the custom README of each container technology. Since we wanted the containers to be read-only after build, we needed to fully decouple the 'build/software' and 'build/analysis', such that './project configure' only writes to the former and './project make' only writes the latter. The file and directories mentioned in the affected files are cases that both project phases was writing to the 'build/software' and 'build/analysis' directories. Affected files: 'preparation-done.mk' and 'lockdir' which were previously in the 'build/software' directory are now made during the 'make' phase and the 'configure' phase no longer builds the 'build/analysis' or anything within it. Also, the software version LaTeX macros (which were previously written during the 'configure' phase in the 'analysis' directory) are now written in the software directory and copied into the analysis for usage in LaTeX while building the paper. Other minor additions in this commit: - The './project' script has a new '--timing' option to write the starting and ending times of the project in a file. It also builds the high-level analysis directories when './project make' is called (but before calling 'top-make.mk'. - The 'tar' calls in the custom build commands of the software building Makefiles now have the '--no-same-owner --no-same-permissions' options like the 'tar' call within the 'uncompress' function of 'build-rules.mk'. This commit was originally written by Giacomo Lorenzetti only for Apptainer on the registered commit date. It was later re-implemented from scratch by Mohammad Akhlaghi to have a unified interface for both Apptainer and Docker and merged into Maneage on 2025-04-23. --- .gitignore | 10 +- README.md | 630 +++------------------- project | 64 ++- reproduce/analysis/make/initialize.mk | 6 +- reproduce/analysis/make/paper.mk | 41 +- reproduce/analysis/make/prepare.mk | 2 +- reproduce/analysis/make/top-prepare.mk | 2 +- reproduce/software/containers/README-apptainer.md | 69 +++ reproduce/software/containers/README-docker.md | 180 +++++++ reproduce/software/containers/apptainer.sh | 441 +++++++++++++++ reproduce/software/containers/docker.sh | 486 +++++++++++++++++ reproduce/software/make/basic.mk | 46 +- reproduce/software/make/build-rules.mk | 2 +- reproduce/software/make/high-level.mk | 54 +- reproduce/software/make/python.mk | 4 +- reproduce/software/shell/configure.sh | 33 +- reproduce/software/shell/pre-make-build.sh | 2 +- 17 files changed, 1416 insertions(+), 656 deletions(-) create mode 100644 reproduce/software/containers/README-apptainer.md create mode 100644 reproduce/software/containers/README-docker.md create mode 100755 reproduce/software/containers/apptainer.sh create mode 100755 reproduce/software/containers/docker.sh diff --git a/.gitignore b/.gitignore index 6c46b87..eed4fdf 100644 --- a/.gitignore +++ b/.gitignore @@ -18,13 +18,14 @@ *~ *\# -*.txt *.aux *.log -*.pdf *.out -*.zip +*.pdf +*.sif *.swp +*.txt +*.zip .nfs* mmap_* *.tar.gz @@ -32,6 +33,7 @@ mmap_* .tex build +run.sh .local .build Makefile @@ -40,7 +42,7 @@ tex/tikz .DS_Store .texlive* LOCAL.conf -docker-run +timing.txt tex/pipeline LOCAL_tmp.mk LOCAL_old.mk diff --git a/README.md b/README.md index 6e5a2ad..79106ec 100644 --- a/README.md +++ b/README.md @@ -292,571 +292,81 @@ light and should be very fast. -### Building in Docker containers - -Docker containers are a common way to build projects in an independent -filesystem, and an almost independent operating system. Containers thus -allow using GNU/Linux operating systems within proprietary operating -systems like macOS or Windows. But without the overhead and huge file size -of virtual machines. Furthermore containers allow easy movement of built -projects from one system to another without rebuilding. Just note that -Docker images are large binary files (+1 Gigabytes) and may not be usable -in the future (for example with new Docker versions not reading old -images). Containers are thus good for temporary/testing phases of a -project, but shouldn't be what you archive for the long term! - -Hence if you want to save and move your maneaged project within a Docker -image, be sure to commit all your project's source files and push them to -your external Git repository (you can do these within the Docker image as -explained below). This way, you can always recreate the container with -future technologies too. Generally, if you are developing within a -container, its good practice to recreate it from scratch every once in a -while, to make sure you haven't forgot to include parts of your work in -your project's version-controlled source. In the sections below we also -describe how you can use the container **only for the software -environment** and keep your data and project source on your host. - -#### Dockerfile for a Maneaged project, and building a Docker image - -Below is a series of recommendations on the various components of a -`Dockerfile` optimized to store the *built state of a maneaged project* as -a Docker image. Each component is also accompanied with -explanations. Simply copy the code blocks under each item into a plain-text -file called `Dockerfile`, in the same order of the items. Don't forget to -implement the suggested corrections (in particular step 4). - -**NOTE: Internet for TeXLive installation:** If you have the project -software tarballs and input data (optional features described below) you -can disable internet. In this situation, the configuration and analysis -will be exactly reproduced, the final LaTeX macros will be created, and all -results will be verified successfully. However, no final `paper.pdf` will -be created to visualize/combine everything in one easy-to-read file. Until -[task 15267](https://savannah.nongnu.org/task/?15267) is complete, we need -internet to install TeXLive packages (using TeXLive's own package manager -`tlmgr`) in the `./project configure` phase. This won't stop the -configuration, and it will finish successfully (since all the analysis can -still be reproduced). We are working on completing this task as soon as -possible, but until then, if you want to disable internet *and* you want to -build the final PDF, please disable internet after the configuration -phase. Note that only the necessary TeXLive packages are installed (~350 -MB), not the full TeXLive collection! - - 0. **Summary:** If you are already familiar with Docker, then the full - Dockerfile to get the project environment setup is shown here (without - any comments or explanations, because explanations are done in the next - items). Note that the last two `COPY` lines (to copy the directory - containing software tarballs used by the project and the possible input - databases) are optional because they will be downloaded if not - available. You can also avoid copying over all, and simply mount your - host directories within the image, we have a separate section on doing - this below ("Only software environment in the Docker image"). Once you - build the Docker image, your project's environment is setup and you can - go into it to run `./project make` manually. - - ```shell - FROM debian:stable-slim - RUN apt update && apt install -y gcc g++ wget - RUN useradd -ms /bin/sh maneager - RUN printf '123\n123' | passwd root - USER maneager - WORKDIR /home/maneager - RUN mkdir build - RUN mkdir software - COPY --chown=maneager:maneager ./project-source /home/maneager/source - COPY --chown=maneager:maneager ./software-dir /home/maneager/software - COPY --chown=maneager:maneager ./data-dir /home/maneager/data - RUN cd /home/maneager/source \ - && ./project configure --build-dir=/home/maneager/build \ - --software-dir=/home/maneager/software \ - --input-dir=/home/maneager/data - ``` - - 1. **Choose the base operating system:** The first step is to select the - operating system that will be used in the docker image. Note that your - choice of operating system also determines the commands of the next - step to install core software. - - ```shell - FROM debian:stable-slim - ``` - - 2. **Maneage dependencies:** By default the "slim" versions of the - operating systems don't contain a compiler (needed by Maneage to - compile precise versions of all the tools). You thus need to use the - selected operating system's package manager to import them (below is - the command for Debian). Optionally, if you don't have the project's - software tarballs, and want the project to download them automatically, - you also need a downloader. - - ```shell - # C and C++ compiler. - RUN apt update && apt install -y gcc g++ - - # Uncomment this if you don't have 'software-XXXX.tar.gz' (below). - #RUN apt install -y wget - ``` - - 3. **Define a user:** Some core software packages will complain if you try - to install them as the default (root) user. Generally, it is also good - practice to avoid being the root user. Hence with the commands below we - define a `maneager` user and activate it for the next steps. But just - in case root access is necessary temporarily, with the `passwd` - command, we are setting the root password to `123`. - - ```shell - RUN useradd -ms /bin/sh maneager - RUN printf '123\n123' | passwd root - USER maneager - WORKDIR /home/maneager - ``` - - 4. **Copy project files into the container:** these commands make the - assumptions listed below. IMPORTANT: you can also avoid copying over - all, and simply mount your host directories within the image, we have a - separate section on doing this below ("Only software environment in the - Docker image"). - - * The project's source is in the `maneaged/` sub-directory and this - directory is in the same directory as the `Dockerfile`. The source - can either be from cloned from Git (highly recommended!) or from a - tarball. Both are described above (note that arXiv's tarball needs to - be corrected as mentioned above). - - * (OPTIONAL) By default the project's necessary software source - tarballs will be downloaded when necessary during the `./project - configure` phase. But if you already have the sources, its better to - use them and not waste network traffic (and resulting carbon - footprint!). Maneaged projects usually come with a - `software-XXXX.tar.gz` file that is published on Zenodo (link above). - If you have this file, put it in the same directory as your - `Dockerfile` and include the relevant lines below. - - * (OPTIONAL) The project's input data. The `INPUT-FILES` depends on the - project, please look into the project's - `reproduce/analysis/config/INPUTS.conf` for the URLs and the file - names of input data. Similar to the software source files mentioned - above, if you don't have them, the project will attempt to download - its necessary data automatically in the `./project make` phase. - - ```shell - # Make the project's build directory and copy the project source - RUN mkdir build - COPY --chown=maneager:maneager ./maneaged /home/maneager/source - - # Optional (for software) - COPY --chown=maneager:maneager ./software-XXXX.tar.gz /home/maneager/ - RUN tar xf software-XXXX.tar.gz && mv software-XXXX software && rm software-XXXX.tar.gz - - # Optional (for data) - RUN mkdir data - COPY --chown=maneager:maneager ./INPUT-FILES /home/maneager/data - ``` - - 5. **Configure the project:** With this line, the Docker image will - configure the project (build all its necessary software). This will - usually take about an hour on an 8-core system. You can also optionally - avoid putting this step (and the next) in the `Dockerfile` and simply - execute them in the Docker image in interactive mode (as explained in - the sub-section below, in this case don't forget to preserve the build - container after you are done). - - ```shell - # Configure project (build full software environment). - RUN cd /home/maneager/source \ - && ./project configure --build-dir=/home/maneager/build \ - --software-dir=/home/maneager/software \ - --input-dir=/home/maneager/data - ``` - - 6. **Project's analysis:** With this line, the Docker image will do the - project's analysis and produce the final `paper.pdf`. The time it takes - for this step to finish, and the storage/memory requirements highly - depend on the particular project. - - ```shell - # Run the project's analysis - RUN cd /home/maneager/source && ./project make - ``` - - 7. **Build the Docker image:** The `Dockerfile` is now ready! In the - terminal, go to its directory and run the command below to build the - Docker image. We recommend to keep the `Dockerfile` in **an empty - directory** and run it from inside that directory too. This is because - Docker considers that directories contents to be part of the - environment. Finally, just set a `NAME` for your project and note that - Docker only runs as root. - - ```shell - sudo su - docker build -t NAME ./ - ``` - - - -#### Interactive tests on built container - -If you later want to start a container with the built image and enter it in -interactive mode (for example for temporary tests), please run the -following command. Just replace `NAME` with the same name you specified -when building the project. You can always exit the container with the -`exit` command (note that all your changes will be discarded once you exit, -see below if you want to preserve your changes after you exit). - -```shell -docker run -it NAME -``` - - - -#### Running your own project's shell for same analysis environment - -The default operating system only has minimal features: not having many of -the tools you are accustomed to in your daily command-line operations. But -your maneaged project has a very complete (for the project!) environment -which is fully built and ready to use interactively with the commands -below. For example the project also builds Git within itself, as well as -many other high-level tools that are used in your project and aren't -present in the container's operating system. - -```shell -# Once you are in the docker container -cd source -./project shell -``` - - - -#### Preserving the state of a built container - -All interactive changes in a container will be deleted as soon as you exit -it. THIS IS A VERY GOOD FEATURE IN GENERAL! If you want to make persistent -changes, you should do it in the project's plain-text source and commit -them into your project's online Git repository. As described in the Docker -introduction above, we strongly recommend to **not rely on a built container -for archival purposes**. - -But for temporary tests it is sometimes good to preserve the state of an -interactive container. To do this, you need to `commit` the container (and -thus save it as a Docker "image"). To do this, while the container is still -running, open another terminal and run these commands: - -```shell -# These two commands should be done in another terminal -docker container list - -# Get 'XXXXXXX' of your desired container from the first column above. -# Give the new image a name by replacing 'NEW-IMAGE-NAME'. -docker commit XXXXXXX NEW-IMAGE-NAME -``` - - - -#### Copying files from the Docker image to host operating system - -The Docker environment's file system is completely indepenent of your host -operating system. One easy way to copy files to and from an open container -is to use the `docker cp` command (very similar to the shell's `cp` -command). - -```shell -docker cp CONTAINER:/file/path/within/container /host/path/target -``` - - - - - -#### Only software environment in the Docker image - -You can set the docker image to only contain the software environment and -keep the project source and built analysis files (data and PDF) on your -host operating system. This enables you to keep the size of the Docker -image to a minimum (only containing the built software environment) to -easily move it from one computer to another. Below we'll summarize the -steps. - - 1. Get your user ID with this command: `id -u`. - - 2. Make a new (empty) directory called `docker` temporarily (will be - deleted later). - - ```shell - mkdir docker-tmp - cd docker-tmp - ``` - - 3. Make a `Dockerfile` (within the new/empty directory) with the - following contents. Just replace `UID` with your user ID (found in - step 1 above). Note that we are manually setting the `maneager` (user) - password to `123` and the root password to '456' (both should be - repeated because they must be confirmed by `passwd`). To install other - operating systems, just change the contents on the `FROM` line. For - example, for CentOS 7 you can use `FROM centos:centos7`, for the - latest CentOS, you can use `FROM centos:latest` (you may need to add - this line `RUN yum install -y passwd` before the `RUN useradd ...` - line.). - - ``` - FROM debian:stable-slim - RUN useradd -ms /bin/sh --uid UID maneager; \ - printf '123\n123' | passwd maneager; \ - printf '456\n456' | passwd root - USER maneager - WORKDIR /home/maneager - RUN mkdir build; mkdir build/analysis - ``` - - 4. Create a Docker image based on the `Dockerfile` above. Just replace - `MANEAGEBASE` with your desired name (this won't be your final image, - so you can safely use a name like `maneage-base`). Note that you need - to have root/administrator previlages when running it, so - - ```shell - sudo docker build -t MANEAGEBASE ./ - ``` - - 5. You don't need the temporary directory any more (the docker image is - saved in Docker's own location, and accessible from anywhere). - - ```shell - cd .. - rm -rf docker-tmp - ``` - - 6. Put the following contents into a newly created plain-text file called - `docker-run`, while setting the mandatory variables based on your - system. The name `docker-run` is already inside Maneage's `.gitignore` - file, so you don't have to worry about mistakenly commiting this file - (which contains private information: directories in this computer). - - ``` - #!/bin/sh - # - # Create a Docker container from an existing image of the built - # software environment, but with the source, data and build (analysis) - # directories directly within the host file system. This script should - # be run in the top project source directory (that has 'README.md' and - # 'paper.tex'). If not, replace the '$(pwd)' part with the project - # source directory. - - # MANDATORY: Name of Docker container - docker_name=MANEAGEBASE - - # MANDATORY: Location of "build" directory on this system (to host the - # 'analysis' sub-directory for output data products and possibly others). - build_dir=/PATH/TO/THIS/PROJECT/S/BUILD/DIR - - # OPTIONAL: Location of project's input data in this system. If not - # present, a 'data' directory under the build directory will be created. - data_dir=/PATH/TO/THIS/PROJECT/S/DATA/DIR - - # OPTIONAL: Location of software tarballs to use in building Maneage's - # internal software environment. - software_dir=/PATH/TO/SOFTWARE/TARBALL/DIR - - - - - - # Internal proceessing - # -------------------- - # - # Sanity check: Make sure that the build directory actually exists. - if ! [ -d $build_dir ]; then - echo "ERROR: '$build_dir' doesn't exist"; exit 1; - fi - - # If the host operating system has '/dev/shm', then give Docker access - # to it also for improved speed in some scenarios (like configuration). - if [ -d /dev/shm ]; then shmopt="-v /dev/shm:/dev/shm"; - else shmopt=""; fi - - # If the 'analysis' and 'data' directories (that are mounted), don't exist, - # then create them (otherwise Docker will create them as 'root' before - # creating the container, and we won't have permission to write in them. - analysis_dir="$build_dir"/analysis - if ! [ -d $analysis_dir ]; then mkdir $analysis_dir; fi - - # If the data or software directories don't exist, put them in the build - # directory (they will remain empty, but this helps in simplifiying the - # mounting command!). - if ! [ x$data_dir = x ]; then - data_dir="$build_dir"/data - if ! [ -d $data_dir ]; then mkdir $data_dir; fi - fi - if ! [ x$software_dir = x ]; then - software_dir="$build_dir"/tarballs-software - if ! [ -d $software_dir ]; then mkdir $software_dir; fi - fi - - # Run the Docker image while setting up the directories. - sudo docker run -v "$software_dir":/home/maneager/tarballs-software \ - -v "$analysis_dir":/home/maneager/build/analysis \ - -v "$data_dir":/home/maneager/data \ - -v "$(pwd)":/home/maneager/source \ - $shmopt -it $docker_name - ``` - - 7. Make the `docker-run` script executable. - - ```shell - chmod +x docker-run - ``` - - 8. Start the Docker daemon (root permissions required). If the operating - system uses systemd you can use the command below. If you want the - Docker daemon to be available after a reboot also (so you don't have - to restart it after turning off your computer), run this command again - but replacing `start` with `enable`. - - ```shell - systemctl start docker - ``` - - 9. You can now start the Docker image by executing your newly added - script like below (it will ask for your root password). You will - notice that you are in the Docker container with the changed prompt. - - ```shell - ./docker-run - ``` - - 10. You are now within the container. First, we'll add the GNU C and C++ - compilers (which are necessary to build our own programs in Maneage) - and the GNU WGet downloader (which may be necessary if you don't have - a core software's tarball already). Maneage will build pre-defined - versions of both and will use them. But for the very first packages, - they are necessary. In the process, by setting the `PS1` environment - variable, we'll define a color-coding for the interactive shell prompt - (red for root and purple for the user). If you build another operating - system, replace the `apt` commands accordingly (for example on CentOS, - you don't need the `apt update` line and you should use `yum install - -y gcc gcc-c++ wget glibc-static` to install the three basic - dependencies). - - ```shell - su - echo 'export PS1="[\[\033[01;31m\]\u@\h \W\[\033[32m\]\[\033[00m\]]# "' >> ~/.bashrc - source ~/.bashrc - apt update - apt install -y gcc g++ wget - exit - echo 'export PS1="[\[\033[01;35m\]\u@\h \W\[\033[32m\]\[\033[00m\]]$ "' >> ~/.bashrc - source ~/.bashrc - ``` - - 11. Now that the compiler is ready, we can start Maneage's - configuration. So let's go into the project source directory and run - these commands to build the software environment. - - ```shell - cd source - ./project configure --input-dir=/home/maneager/data \ - --build-dir=/home/maneager/build \ - --software-dir=/home/maneager/tarballs-software - ``` - - 12. After the configuration finishes successfully, it will say so. It will - then ask you to run `./project make`. **But don't do that - yet**. Keep this Docker container open and don't exit the container or - terminal. Open a new terminal, and follow the steps described in the - sub-section above to preserve (or "commit") the built container as a - Docker image. Let's assume you call it `MY-PROJECT-ENV`. After the new - image is made, you should be able to see the new image in the list of - images with this command (in yet another terminal): - - ```shell - docker image list # In the other terminal. - ``` - - 13. Now that you have safely "committed" your current Docker container - into a separate Docker image, you can **exit the container** safely - with the `exit` command. Don't worry, you won't loose the built - software environment: it is all now saved separately within the Docker - image. - - 14. Re-open your `docker-run` script and change `MANEAGEBASE` to - `MY-PROJECT-ENV` (or any other name you set for the environment you - committed above). - - ```shell - emacs docker-run - ``` - - 15. That is it! You can now always easily enter your container (only for - the software environemnt) with the command below. Within the - container, any file you save/edit in the `source` directory of the - docker container is the same file on your host OS and any file you - build in your `build/analysis` directory (within the Maneage'd - project) will be on your host OS. You can even use your container's - Git to store the history of your project in your host OS. See the next - step in case you want to move your built software environment to - another computer. - - ```shell - ./docker-run - ``` - - 16. In case you want to store the image as a single file as backup or to - move to another computer, you can run the commands below. They will - produce a single `project-env.tar.gz` file. - - ```shell - docker save -o my-project-env.tar MY-PROJECT-ENV - gzip --best project-env.tar - ``` - - 17. To load the tarball above into a clean docker environment (for example - on another system) copy the `my-project-env.tar.gz` file there and run - the command below. You can then create the `docker-run` script for - that system and run it to enter. Just don't forget that if your - `analysis_dir` directory is empty on the new/clean system. So you - should first run the same `./project configure ...` command above in - the docker image so it connects the environment to your source. Don't - worry, it won't build any software and should finish in a second or - two. Afterwards, you can safely run `./project make` and continue - working like you did on the old system. - - ```shell - docker load --input my-project-env.tar.gz - ``` - - - - - -#### Deleting all Docker images - -After doing your tests/work, you may no longer need the multi-gigabyte -files images, so its best to just delete them. To do this, just run the two -commands below to first stop all running containers and then to delete all -the images: - -```shell -docker ps -a -q | xargs docker rm -docker images -a -q | xargs docker rmi -f -``` - - - - - -### Copyright information - -This file and `.file-metadata` (a binary file, used by Metastore to store -file dates when doing Git checkouts) are part of the reproducible project -mentioned above and share the same copyright notice (at the start of this -file) and license notice (below). - -This project is free software: you can redistribute it and/or modify it -under the terms of the GNU General Public License as published by the Free +### Building in containers + +Containers are a common way to build projects in an independent filesystem +and an almost independent operating system without the overhead (in size +and speed) of a virtual machine. As a result, containers allow easy +movement of built projects from one system to another without +rebuilding. However, they are still large binary files (+1 Gigabytes) and +may not be usable in the future (for example with new software versions not +reading old images or old/new kernel issues). Containers are thus good for +execution/testing phases of a project, but shouldn't be what you archive +for the long term! + +It is therefore very important that if you want to save and move your +maneaged project within containers, be sure to commit all your project's +source files and push them to your external Git repository (you can do +these within the container as explained below). This way, you can always +recreate the container with future technologies too. Generally, if you are +developing within a container, its good practice to recreate it from +scratch every once in a while, to make sure you haven't forgot to include +parts of your work in your project's version-controlled source. In the +sections below we also describe how you can use the container **only for +the software environment** and keep your data and project source on your +host. + +If you have the necessary software tarballs and input data (optional +features described below) you can disable internet. In this situation, the +configuration and analysis will be exactly reproduced, the final LaTeX +macros will be created, and all results will be verified +successfully. However, no final `paper.pdf` will be created to +visualize/combine everything in one easy-to-read file. Until [task +15267](https://savannah.nongnu.org/task/?15267) is complete, Maneage only +needs internet to install TeXLive packages (using TeXLive's own package +manager `tlmgr`) in the `./project configure` phase. This won't stop the +configuration (since all the analysis can still be reproduced). We are +working on completing this task as soon as possible, but until then, if you +want to disable internet *and* you want to build the final PDF, please +disable internet after the configuration phase. Note that only the +necessary TeXLive packages are installed (~350 MB), not the full TeXLive +collection! + +The container technologies that Maneage has been tested on an documentation +exists in this project (with the `reproduce/software/containers` directory) +are listed below. See the respective `README-*.md` file in that directory +for the details: + + - [Apptainer](https://apptainer.org): useful in high performance + computing (HPC) facilities (where you do not have root + permissions). Apptainer is fully free and open source software. + Apptainer containers can only be created and used on GNU/Linux + operating systems, but are stored as files (easy to manage). + + - [Docker](https://www.docker.com): requires root access, but useful on + virtual private servers (VPSs). Docker images are stored and managed by + a root-level daemon, so you can only manage them through its own + interface. A docker container build on a GNU/Linux host can also be + executed on Windows or macOS. However, while the Docker engine and its + command-line interface on GNU/Linux are free and open source software, + its desktop application (with a GUI and components necessary for + Windows or macOS) is not (requires payment for large companies). + + + + + +## Copyright information + +This file is free software: you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. -This project is distributed in the hope that it will be useful, but WITHOUT +This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along -with this project. If not, see . +with this file. If not, see . diff --git a/project b/project index ac801b8..c30bfbf 100755 --- a/project +++ b/project @@ -33,6 +33,7 @@ set -e jobs=0 # 0 is for the default for the 'configure.sh' script. group= debug= +timing=0 host_cc=0 offline= operation= @@ -89,7 +90,7 @@ RECOMMENDATION: If this is the first time you are configuring this template, please don't use the options and let the script explain each parameter in full detail by simply running './project configure'. -Project 'make' special features. +Project 'make' special tagets ./project make Build the project on one thread ./project make -jN Built the project in parallel on N threads. ./project make clean Clean all files generated by 'make' (not software). @@ -127,6 +128,7 @@ Configure and Make options: Make (analysis) options: -p, --prepare-redo Re-do preparation (only done automatically once). + -t, --timing Starting and ending times written in 'timing.txt'. Make (final PDF) options: --refresh-bib Force refresh the bibliography. @@ -216,6 +218,8 @@ do # Make options (analysis): -p|--prepare-redo) prepare_redo=1; shift;; -p=*|--prepare-redo=*) on_off_option_error --prepare-redo; shift;; + -t|--timing) timing=1; shift;; + -t=*|--timing=*) on_off_option_error --timing; shift;; # Make options (final PDF): --refresh-bib) [ -f tex/src/references.tex ] && touch tex/src/references.tex; shift;; @@ -389,6 +393,7 @@ EOF # Run operations in controlled environment # ---------------------------------------- +perms="u+r,u+w,g+r,g+w,o-r,o-w,o-x" controlled_env() { # Get the full address of the build directory: @@ -423,7 +428,6 @@ controlled_env() { # Do requested operation # ---------------------- -perms="u+r,u+w,g+r,g+w,o-r,o-w,o-x" configscript=./reproduce/software/shell/configure.sh case $operation in @@ -444,8 +448,11 @@ case $operation in # to make sure they have them, we are activating the executable # flags by default here every time './project configure' is run. If # any other file in your project needs such flags, add them here. - chmod +x reproduce/software/shell/* reproduce/software/config/*.sh \ - reproduce/analysis/bash/* + if ! [ -x reproduce/software/shell/configure.sh ]; then + chmod +x reproduce/analysis/bash/* \ + reproduce/software/shell/* \ + reproduce/software/config/*.sh + fi # If the user requested, clean the TeX directory from the extra # (to-be-built) directories that may already be there (and will not @@ -499,22 +506,62 @@ case $operation in configuration_necessary fi + # Make sure that the necessary analysis directories directory exist + # in the build directory. These will be necessary in various phases + # of hte analysis and having them inside the lower-level Make steps + # will require setting them as prerequisites for many basic jobs + # (thus making the Makefiles harder to read and add potentials for + # bugs: forgetting to add them for example). Also, we don't want + # the configure phase to make any edits in the analysis directory, + # so they are not built there. + badir=.build/analysis + texdir=$badir/tex + mtexdir=$texdir/macros + if ! [ -d $badir ]; then mkdir $badir; fi + if ! [ -d $texdir ]; then mkdir $texdir; fi + if ! [ -d $mtexdir ]; then mkdir $mtexdir; fi + + # TeX build directory. If built in a group scenario, the TeX build + # directory must be separate for each member (so they can work on their + # relevant parts of the paper without conflicting with each other). + if [ "x$maneage_group_name" = x ]; then + texbdir="$texdir"/build + else + user=$(whoami) + texbdir="$texdir"/build-$user + fi + tikzdir="$texbdir"/tikz + if ! [ -L tex/build ]; then ln -s "$(pwd -P)/$texdir" tex/build; fi + if ! [ -L tex/tikz ]; then ln -s "$(pwd -P)/$tikzdir" tex/tikz; fi + + # Register the start of this run (we are appending the new + # information so previous information is preserved until the user + # intentionally deletes/cleans it). + if [ $timing = 1 ]; then echo "start: $(date)" >> timing.txt; fi + # Run data preparation phase (optionally build Makefiles with # special values for optimizing the main 'top-make.mk'). But note # that data preparation is only done automatically the first time - # the project is built (when '.build/software/preparation-done.mk' + # the project is built (when '.build/analysis/preparation-done.mk' # doesn't yet exist). After that, if the user wants to re-do the # preparation they have to use the '--prepare-redo' option. - if ! [ -f .build/software/preparation-done.mk ] \ + if ! [ -f .build/analysis/preparation-done.mk ] \ || [ x"$prepare_redo" = x1 ]; then controlled_env reproduce/analysis/make/top-prepare.mk fi - # Run the actual project. + # Call top-make (highest level analysis Makefile). controlled_env reproduce/analysis/make/top-make.mk + + # Register the time of the project's ending. + if [ $timing = 1 ]; then echo "end: $(date)" >> timing.txt; fi ;; + + + + # Interactive shell of Maneage. shell) # Make sure the configure script has been completed properly @@ -550,6 +597,9 @@ case $operation in ;; + + + # Operation not specified. *) cat <\ +Copyright (C) 2025-2025 Giacomo Lorenzetti \ +See the end of the file for license conditions. + +For an introduction on containers, see the "Building in containers" section +of the `README.md` file within the top-level directory of this +project. Here, we focus on Apptainer with a simple checklist on how to use +the `apptainer-run.sh` script that we have already prepared in this +directory for easy usage in a Maneage'd project. + + + + + +## Building your Maneage'd project in Apptainer + +Through the steps below, you will create an Apptainer image that will only +contain the software environment and keep the project source and built +analysis files (data and PDF) on your host operating system. This enables +you to keep the size of the image to a minimum (only containing the built +software environment) to easily move it from one computer to another. + + 1. Using your favorite text editor, create a `apptainer-local.sh` in your + project's top directory that contains the usage command shown at the + top of the 'apptainer.sh' script and take the following steps: + * Set the respective directories based on your own preferences. + * The `--software-dir` is optional (if you don't have the source + tarballs, Maneage will download them automatically. But that requires + internet (which may not always be available). If you regularly build + Maneage'd projects, you can clone the repository containing all the + tarballs at https://gitlab.cefca.es/maneage/tarballs-software + * Add an extra `--build-only` for the first run so it doesn't go onto + doing the analysis and just builds the image. After it has completed, + remove the `--build-only` and it will only run the analysis of your + project. + + 2. Once step one finishes, the build directory will contain two + Singularity Image Format (SIF) files listed below. You can move them to + any other (more permanent) positions in your filesystem or to other + computers as needed. + * `maneage-base.sif`: image containing the base operating system that + was used to build your project. You can safely delete this unless you + need to keep it for future builds without internet (you can give it + to the `--base-name` option of this script). If you want a different + name for this, put the same option in your + * `maneaged.sif`: image with the full software environment of your + project. This file is necessary for future runs of your project + within the container. + + + + + +## Copyright information + +This file is free software: you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation, either version 3 of the License, or (at your option) +any later version. + +This file is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details. + +You should have received a copy of the GNU General Public License along +with this file. If not, see . diff --git a/reproduce/software/containers/README-docker.md b/reproduce/software/containers/README-docker.md new file mode 100644 index 0000000..f86dceb --- /dev/null +++ b/reproduce/software/containers/README-docker.md @@ -0,0 +1,180 @@ +# Maneage'd projects in Docker + +Copyright (C) 2021-2025 Mohammad Akhlaghi \ +See the end of the file for license conditions. + +For an introduction on containers, see the "Building in containers" section +of the `README.md` file within the top-level directory of this +project. Here, we focus on Docker with a simple checklist on how to use the +`docker.sh` script that we have already prepared in this directory for easy +usage in a Maneage'd project. + + + + + +## Building your Maneage'd project in Docker + +Through the steps below, you will create a Docker image that will only +contain the software environment and keep the project source and built +analysis files (data and PDF) on your host operating system. This enables +you to keep the size of the image to a minimum (only containing the built +software environment) to easily move it from one computer to another. + + 0. Add your user to the `docker` group: `usermod -aG docker + USERNAME`. This is only necessary once on an operating system. + + 1. Start the Docker daemon (root permissions required). If the operating + system uses systemd you can use the command below. If you want the + Docker daemon to be available after a reboot also (so you don't have to + restart it after turning off your computer), run this command again but + replacing `start` with `enable` (this is not recommended if you don't + regularly use Docker: it will slow the boot time of your OS). + + ```shell + systemctl start docker + ``` + + 2. Using your favorite text editor, create a `docker-local.sh` in your top + Maneage directory (as described in the comments at the start of the + `docker.sh` script in this directory). Just activate `--build-only` on + the first run so it doesn't go onto doing the analysis and just sets up + the software environment. + + 3. After the setup is complete, run the following command to confirm that + the `maneage-base` (the OS of the container) and `maneaged` (your + project's full Maneage'd environment) images are available. If you want + different names for these images, add the `--project-name` and + `--base-name` options to the `docker.sh` call. + + ```shell + docker image list + ``` + + 4. You are now ready to do your analysis by removing the `--build-only` + option. + + + + + +## Script usage tips + +The `docker.sh` script introduced above has many options allowing certain +customizations that you can see when running it with the `--help` +option. The tips below are some of the more useful scenarios that we have +encountered so far. + +### Docker image in a single file + +In case you want to store the image as a single file as backup or to move +to another computer. For such cases, run the `docker.sh` script with the +`--image-file` option (for example `--image-file=myproj.tar.gz`). After +moving the file to the other system, run `docker.sh` with the same option. + +When the given file to `docker.sh` already exists, it will only be used for +loading the environment. When it doesn't exist, the script will save the +image into it. + + + + + +## Docker usage tips + +Below are some useful Docker usage scenarios that have proved to be +relevant for us in Maneage'd projects. + +### Cleaning up + +Docker has stored many large files in your operating system that can drain +valuable storage space. The storage of the cached files are usually orders +of magnitudes larger than what you see in `docker image list`! So after +doing your work, it is best to clean up all those files. If you feel you +may need the image later, you can save it in a single file as mentioned +above and delete all the un-necessary cached files. Afterwards, when you +load the image, only that image will be present with nothing extra. + +The easiest and most powerful way to clean up everything in Docker is the +two commands below. The first will close all open containers. The second +will remove all stopped containers, all networks not used by at least one +container, all images without at least one container associated to them, +and all build cache. + +```shell +docker ps -a -q | xargs docker rm +docker system prune -a +``` + +If you only want to delete the existing used images, run the command +below. But be careful that the cache is the largest storage consumer! So +the command above is the solution if your OS's root partition is close to +getting filled. + +```shell +docker images -a -q | xargs docker rmi -f +``` + + +### Preserving the state of an open container + +All interactive changes in a container will be deleted as soon as you exit +it. This is a very good feature of Docker in general! If you want to make +persistent changes, you should do it in the project's plain-text source and +commit them into your project's online Git repository. But in certain +situations, it is necessary to preserve the state of an interactive +container. To do this, you need to `commit` the container (and thus save it +as a Docker "image"). To do this, while the container is still running, +open another terminal and run these commands: + +```shell +# These two commands should be done in another terminal +docker container list + +# Get the 'XXXXXXX' of your desired container from the first column above. +# Give the new image a name by replacing 'NEW-IMAGE-NAME'. +docker commit XXXXXXX NEW-IMAGE-NAME +``` + + +### Interactive tests on built container + +If you later want to start a container with the built image and enter it in +interactive mode (for example for temporary tests), run the following +command. Just replace `NAME` with the same name you specified when building +the project. You can always exit the container with the `exit` command +(note that all your changes will be discarded once you exit, see below if +you want to preserve your changes after you exit). + +```shell +docker run -it NAME +``` + + +### Copying files from the Docker image to host operating system + +Except for the mounted directories, the Docker environment's file system is +indepenent of your host operating system. One easy way to copy files to and +from an open container is to use the `docker cp` command (very similar to +the shell's `cp` command). + +```shell +docker cp CONTAINER:/file/path/within/container /host/path/target +``` + + + +## Copyright information + +This file is free software: you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation, either version 3 of the License, or (at your option) +any later version. + +This file is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details. + +You should have received a copy of the GNU General Public License along +with this file. If not, see . diff --git a/reproduce/software/containers/apptainer.sh b/reproduce/software/containers/apptainer.sh new file mode 100755 index 0000000..52315f6 --- /dev/null +++ b/reproduce/software/containers/apptainer.sh @@ -0,0 +1,441 @@ +#!/bin/sh +# +# Create a Apptainer container from an existing image of the built software +# environment, but with the source, data and build (analysis) directories +# directly within the host file system. This script is assumed to be run in +# the top project source directory (that has 'README.md' and +# 'paper.tex'). If not, use the '--source-dir' option to specify where the +# Maneage'd project source is located. +# +# Usage: +# +# - When you are at the top Maneage'd project directory, you can run this +# script like the example below. Just set all the '/PATH/TO/...' +# directories. See the items below for optional values. +# +# ./reproduce/software/containers/apptainer.sh \ +# --build-dir=/PATH/TO/BUILD/DIRECTORY \ +# --software-dir=/PATH/TO/SOFTWARE/TARBALLS +# +# - Non-mandatory options: +# +# - If you already have the input data that is necessary for your +# project's, use the '--input-dir' option to specify its location +# on your host file system. Otherwise the necessary analysis +# files will be downloaded directly into the build +# directory. Note that this is only necessary when '--build-only' +# is not given. +# +# - The '--software-dir' is only useful if you want to build a +# container. Even in that case, it is not mandatory: if not +# given, the software tarballs will be downloaded (thus requiring +# internet). +# +# - To avoid having to set them every time you want to start the +# apptainer environment, you can put this command (with the proper +# directories) into a 'run.sh' script in the top Maneage'd project +# source directory and simply execute that. The special name 'run.sh' +# is in Maneage's '.gitignore', so it will not be included in your +# git history by mistake. +# +# Known problems: +# +# Copyright (C) 2025-2025 Mohammad Akhlaghi +# Copyright (C) 2025-2025 Giacomo Lorenzetti +# +# This script is free software: you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the +# Free Software Foundation, either version 3 of the License, or (at your +# option) any later version. +# +# This script is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General +# Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this script. If not, see . + + + + + +# Script settings +# --------------- +# Stop the script if there are any errors. +set -e + + + + + +# Default option values +jobs= +quiet=0 +source_dir= +build_only= +base_name="" +shm_size=20gb +scriptname="$0" +project_name="" +project_shell=0 +container_shell=0 +base_os=debian:stable-slim + +print_help() { + # Print the output. + cat <