diff options
author | Giacomo Lorenzetti <glorenzetti@cefca.es> | 2025-04-03 15:21:16 +0200 |
---|---|---|
committer | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2025-05-12 10:59:26 +0200 |
commit | df9e291826fbc7e717b40d2d07f1d7607a2f2455 (patch) | |
tree | d40c7aeb6b7ef6b09bb7df21080606b58245819a /reproduce/software/containers | |
parent | 2881fc0a6205d593512458c24f3b681d12921005 (diff) |
IMPORTANT: software configuration optimized and better modularized
Summary: after merging this commit into your project, it should be
re-configured since the location of software installation files like
'LOCAL.conf' or the LaTeX macros of the software environment have
changed. But it should not affect the analysis phase of your project.
Until this commit, it was not possible to run a pre-built Maneage'd project
(in a container) on a newly cloned Maneage'd project source. This was
because the containers should be read-only, but during the various checks
of the configuration (to verify that we are using the same software
environment in the container and the source), we were writing/testing many
things in the build directory, and 'LOCAL.conf' which was actually in the
source directory!
Furthermore, the '.local' and '.build' were built at configure time, making
it hard to run the same container from a newly cloned Maneage'd project. To
make things harder for the scenario above, the 'configure.sh' script would
pause on every message and didn't have a quiet mode (making it practically
impossible to run './project configure' before './project make' on every
container run).
With this commit, all these issues have been addressed and it is now
possible to simply get a built container, clone a Maneage'd project and run
the analysis (using the built environment of the container that is verified
on every run). The respective changes/additions are described below:
- The high-level container scripts ('apptainer.sh' and 'docker.sh', along
with their READMEs) have been moved to the 'reproduce/software/shell'
directory and the old 'reproduce/software/containers' directory has been
deleted. This is because we have classified the software files by their
language/format and the container scripts are scripts in the end.
- The './project' script:
- Now has two extra options: '--quiet' and '--no-pause'. Both are
directly passed to the 'configure.sh' script. They will respectively
disable any informative printed message or any pause after that
message (if it is printed).
- The '--build-dir' option is now also relevant for './project make':
when it is given, it will re-create the two '.build' and '.local'
symbolic links at the top source directory in all scenarios
('configure', 'make' or 'shell'). This will allow both the
configuration, analysis and shell phases to safely assume they exist
and match the user's desire at run-time.
- The build/analysis directory's sub-directories that need to be built
before 'top-make.mk' are now built in a separate function to help in
readability.
- The 'configure.sh' script:
- For developers: a new 'check_elapsed' variable has been defined that
will enable the newly added 'elapsed_time_from_prev_step'
function. This function should be used from now on at the end of
every major step to help find bottlenecks.
- The targets of the software in 'pre-make-build.sh' now also have the
version of the software in their file name. Until now, they didn't have
the version, so there was no way to detect if the software has been
updated or not in the source. For Lzip and Make (that also get built
after GCC), the ones in this script have a '-pre-make' suffix also.
- 'Local.conf.in' now has descriptions for every variable.
- The '-std=gnu17' option is now used instead of '-std=c17' for basic
software that cannot be built without specifying the C standard in GCC
15.1 (described in previous commit: 2881fc0a6205). See [1] for more
details; in summary: '-std=gnu17' is also supported on macOS's Clang and
has some features that 'pkg-config' needs
- Generally: some longer code lines have been broken or indentation
decreased to fit the 75 character line length. This has not reduced
readability however. For example the long 'echo' commands are now
replaced by multiple 'printf's, or the indentation is still clearly
visible.
The seeds of the work on this commit started by a branch containing three
commits by Giacomo Lorenzetti (133 insertions, 100 deletions). Upon merging
with the main 'maneage' branch, they were generalized and re-organized to
become this commit.
The following issues have also been addressed with this commit:
- The LaTeX calls (during the building of 'paper.pdf') do not contain
Maneage'd dynamic libraries. This is because we don't build the LaTeX
binaries from source, an TeXLive manager uses the host environment.
- The 'docker.sh' script:
- Adds the '--project-name' option: its internal variable existed, but
the option for the user to define it at run-time was not.
- Ported to macOS: it does not check being a member of the 'docker'
group, and finds the number of threads using macOS-specific tools.
- The 'apptainer.sh' script:
- Now installs 'wget' in the base container also (necessary when the
user doesn't have the tarballs).
[1] https://savannah.nongnu.org/bugs/?67068#comment2
Diffstat (limited to 'reproduce/software/containers')
-rw-r--r-- | reproduce/software/containers/README-apptainer.md | 71 | ||||
-rw-r--r-- | reproduce/software/containers/README-docker.md | 201 | ||||
-rwxr-xr-x | reproduce/software/containers/apptainer.sh | 441 | ||||
-rwxr-xr-x | reproduce/software/containers/docker.sh | 486 |
4 files changed, 0 insertions, 1199 deletions
diff --git a/reproduce/software/containers/README-apptainer.md b/reproduce/software/containers/README-apptainer.md deleted file mode 100644 index a7826ec..0000000 --- a/reproduce/software/containers/README-apptainer.md +++ /dev/null @@ -1,71 +0,0 @@ -# Maneage'd projects in Apptainer - -Copyright (C) 2025-2025 Mohammad Akhlaghi <mohammad@akhlaghi.org>\ -Copyright (C) 2025-2025 Giacomo Lorenzetti <glorenzetti@cefca.es>\ -See the end of the file for license conditions. - -For an introduction on containers, see the "Building in containers" section -of the `README.md` file within the top-level directory of this -project. Here, we focus on Apptainer with a simple checklist on how to use -the `apptainer-run.sh` script that we have already prepared in this -directory for easy usage in a Maneage'd project. - - - - - -## Building your Maneage'd project in Apptainer - -Through the steps below, you will create an Apptainer image that will only -contain the software environment and keep the project source and built -analysis files (data and PDF) on your host operating system. This enables -you to keep the size of the image to a minimum (only containing the built -software environment) to easily move it from one computer to another. - - 1. Using your favorite text editor, create a `run.sh` in your top Maneage - directory (as described in the comments at the start of the - `apptainer.sh` script in this directory). Just add `--build-only` on - the first run so it doesn't go onto doing the analysis and just sets up - the software environment. Set the respective directory(s) based on your - filesystem (the software directory is optional). The `run.sh` file name - is already in `.gitignore` (because it contains local directories), so - Git will ignore it and it won't be committed by mistake. - - 2. Make the script executable with `chmod +x ./run.sh`, and run it with - `./run.sh`. - - 3. Once the build finishes, the build directory (on your host) will - contain two Singularity Image Format (SIF) files listed below. You can - move them to any other (more permanent) positions in your filesystem or - to other computers as needed. - * `maneage-base.sif`: image containing the base operating system that - was used to build your project. You can safely delete this unless you - need to keep it for future builds without internet (you can give it - to the `--base-name` option of this script). If you want a different - name for this, put the same option in your - * `maneaged.sif`: image with the full software environment of your - project. This file is necessary for future runs of your project - within the container. - - 3. To execute your project remote the `--build-only` and use `./run.sh` to - execute it. If you want to enter your Maneage'd project shell, add the - `--project-shell` option to the call inside `./run.sh`. - - - - - -## Copyright information - -This file is free software: you can redistribute it and/or modify it under -the terms of the GNU General Public License as published by the Free -Software Foundation, either version 3 of the License, or (at your option) -any later version. - -This file is distributed in the hope that it will be useful, but WITHOUT -ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or -FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for -more details. - -You should have received a copy of the GNU General Public License along -with this file. If not, see <https://www.gnu.org/licenses/>. diff --git a/reproduce/software/containers/README-docker.md b/reproduce/software/containers/README-docker.md deleted file mode 100644 index d651e22..0000000 --- a/reproduce/software/containers/README-docker.md +++ /dev/null @@ -1,201 +0,0 @@ -# Maneage'd projects in Docker - -Copyright (C) 2021-2025 Mohammad Akhlaghi <mohammad@akhlaghi.org>\ -See the end of the file for license conditions. - -For an introduction on containers, see the "Building in containers" section -of the `README.md` file within the top-level directory of this -project. Here, we focus on Docker with a simple checklist on how to use the -`docker.sh` script that we have already prepared in this directory for easy -usage in a Maneage'd project. - - - - - -## Building your Maneage'd project in Docker - -Through the steps below, you will create a Docker image that will only -contain the software environment and keep the project source and built -analysis files (data and PDF) on your host operating system. This enables -you to keep the size of the image to a minimum (only containing the built -software environment) to easily move it from one computer to another. - - 0. Add your user to the `docker` group: `usermod -aG docker - USERNAME`. This is only necessary once on an operating system. - - 1. Start the Docker daemon (root permissions required). If the operating - system uses systemd you can use the command below. If you want the - Docker daemon to be available after a reboot also (so you don't have to - restart it after turning off your computer), run this command again but - replacing `start` with `enable` (this is not recommended if you don't - regularly use Docker: it will slow the boot time of your OS). - - ```shell - systemctl start docker - ``` - - 2. Using your favorite text editor, create a `run.sh` in your top Maneage - directory (as described in the comments at the start of the `docker.sh` - script in this directory). Just activate `--build-only` on the first - run so it doesn't go onto doing the analysis and just sets up the - software environment. Set the respective directory(s) based on your - filesystem (the software directory is optional). The `run.sh` file name - is already in `.gitignore` (because it contains local directories), so - Git will ignore it and it won't be committed by mistake. - - 3. After the setup is complete, remove the `--build-only` and run the - command below to confirm that `maneage-base` (the OS of the container) - and `maneaged` (your project's full Maneage'd environment) images are - available. If you want different names for these images, add the - `--project-name` and `--base-name` options to the `docker.sh` call. - - ```shell - docker image list - ``` - - 4. You are now ready to do your analysis by removing the `--build-only` - option. - - - - - -## Script usage tips - -The `docker.sh` script introduced above has many options allowing certain -customizations that you can see when running it with the `--help` -option. The tips below are some of the more useful scenarios that we have -encountered so far. - -### Docker image in a single file - -In case you want to store the image as a single file as backup or to move -to another computer. For such cases, run the `docker.sh` script with the -`--image-file` option (for example `--image-file=myproj.tar.gz`). After -moving the file to the other system, run `docker.sh` with the same option. - -When the given file to `docker.sh` already exists, it will only be used for -loading the environment. When it doesn't exist, the script will save the -image into it. - - - - - -## Docker usage tips - -Below are some useful Docker usage scenarios that have proved to be -relevant for us in Maneage'd projects. - -### Saving and loading an image as a file - -Docker keeps its images in hard to access (by humans) location on the -operating system. Very much like Git, but with much less elegance: the -place is shared by all users and projects of the system. So they are not -easy to archive for usage on another system at a low-level. But it does -have an interface (`docker save`) to copy all the relevant files within an -image into a tar ball that you can archive externally. There is also a -separate interface to load the tarball back into docker (`docker load`). - -Both of these have been implemented as the `--image-file` option of the -`docker.sh` script. If you want to save your Maneage'd image into an image, -simply give the tarball name to this option. Alternatively, if you already -have a tarball and want to load it into Docker, give it to this option once -(until you "clean up", as explained below). In fact, docker images take a -lot of space and it is better to "clean up" regularly. And the only way you -can clean up safely is through saving your needed images as a file. - -### Cleaning up - -Docker has stored many large files in your operating system that can drain -valuable storage space. The storage of the cached files are usually orders -of magnitudes larger than what you see in `docker image list`! So after -doing your work, it is best to clean up all those files. If you feel you -may need the image later, you can save it in a single file as mentioned -above and delete all the un-necessary cached files. Afterwards, when you -load the image, only that image will be present with nothing extra. - -The easiest and most powerful way to clean up everything in Docker is the -two commands below. The first will close all open containers. The second -will remove all stopped containers, all networks not used by at least one -container, all images without at least one container associated to them, -and all build cache. - -```shell -docker ps -a -q | xargs docker rm -docker system prune -a -``` - -If you only want to delete the existing used images, run the command -below. But be careful that the cache is the largest storage consumer! So -the command above is the solution if your OS's root partition is close to -getting filled. - -```shell -docker images -a -q | xargs docker rmi -f -``` - - -### Preserving the state of an open container - -All interactive changes in a container will be deleted as soon as you exit -it. This is a very good feature of Docker in general! If you want to make -persistent changes, you should do it in the project's plain-text source and -commit them into your project's online Git repository. But in certain -situations, it is necessary to preserve the state of an interactive -container. To do this, you need to `commit` the container (and thus save it -as a Docker "image"). To do this, while the container is still running, -open another terminal and run these commands: - -```shell -# These two commands should be done in another terminal -docker container list - -# Get the 'XXXXXXX' of your desired container from the first column above. -# Give the new image a name by replacing 'NEW-IMAGE-NAME'. -docker commit XXXXXXX NEW-IMAGE-NAME -``` - - -### Interactive tests on built container - -If you later want to start a container with the built image and enter it in -interactive mode (for example for temporary tests), run the following -command. Just replace `NAME` with the same name you specified when building -the project. You can always exit the container with the `exit` command -(note that all your changes will be discarded once you exit, see below if -you want to preserve your changes after you exit). - -```shell -docker run -it NAME -``` - - -### Copying files from the Docker image to host operating system - -Except for the mounted directories, the Docker environment's file system is -indepenent of your host operating system. One easy way to copy files to and -from an open container is to use the `docker cp` command (very similar to -the shell's `cp` command). - -```shell -docker cp CONTAINER:/file/path/within/container /host/path/target -``` - - - -## Copyright information - -This file is free software: you can redistribute it and/or modify it under -the terms of the GNU General Public License as published by the Free -Software Foundation, either version 3 of the License, or (at your option) -any later version. - -This file is distributed in the hope that it will be useful, but WITHOUT -ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or -FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for -more details. - -You should have received a copy of the GNU General Public License along -with this file. If not, see <https://www.gnu.org/licenses/>. diff --git a/reproduce/software/containers/apptainer.sh b/reproduce/software/containers/apptainer.sh deleted file mode 100755 index 52315f6..0000000 --- a/reproduce/software/containers/apptainer.sh +++ /dev/null @@ -1,441 +0,0 @@ -#!/bin/sh -# -# Create a Apptainer container from an existing image of the built software -# environment, but with the source, data and build (analysis) directories -# directly within the host file system. This script is assumed to be run in -# the top project source directory (that has 'README.md' and -# 'paper.tex'). If not, use the '--source-dir' option to specify where the -# Maneage'd project source is located. -# -# Usage: -# -# - When you are at the top Maneage'd project directory, you can run this -# script like the example below. Just set all the '/PATH/TO/...' -# directories. See the items below for optional values. -# -# ./reproduce/software/containers/apptainer.sh \ -# --build-dir=/PATH/TO/BUILD/DIRECTORY \ -# --software-dir=/PATH/TO/SOFTWARE/TARBALLS -# -# - Non-mandatory options: -# -# - If you already have the input data that is necessary for your -# project's, use the '--input-dir' option to specify its location -# on your host file system. Otherwise the necessary analysis -# files will be downloaded directly into the build -# directory. Note that this is only necessary when '--build-only' -# is not given. -# -# - The '--software-dir' is only useful if you want to build a -# container. Even in that case, it is not mandatory: if not -# given, the software tarballs will be downloaded (thus requiring -# internet). -# -# - To avoid having to set them every time you want to start the -# apptainer environment, you can put this command (with the proper -# directories) into a 'run.sh' script in the top Maneage'd project -# source directory and simply execute that. The special name 'run.sh' -# is in Maneage's '.gitignore', so it will not be included in your -# git history by mistake. -# -# Known problems: -# -# Copyright (C) 2025-2025 Mohammad Akhlaghi <mohammad@akhlaghi.org> -# Copyright (C) 2025-2025 Giacomo Lorenzetti <glorenzetti@cefca.es> -# -# This script is free software: you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by the -# Free Software Foundation, either version 3 of the License, or (at your -# option) any later version. -# -# This script is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General -# Public License for more details. -# -# You should have received a copy of the GNU General Public License along -# with this script. If not, see <http://www.gnu.org/licenses/>. - - - - - -# Script settings -# --------------- -# Stop the script if there are any errors. -set -e - - - - - -# Default option values -jobs= -quiet=0 -source_dir= -build_only= -base_name="" -shm_size=20gb -scriptname="$0" -project_name="" -project_shell=0 -container_shell=0 -base_os=debian:stable-slim - -print_help() { - # Print the output. - cat <<EOF -Usage: $scriptname [OPTIONS] - -Top-level script to build and run a Maneage'd project within Apptainer. - - Host OS directories (to be mounted in the container): - -b, --build-dir=STR Dir. to build in (only analysis in host). - -i, --input-dir=STR Dir. of input datasets (optional). - -s, --software-dir=STR Directory of necessary software tarballs. - --source-dir=STR Directory of source code (default: 'pwd -P'). - - Apptainer images - --base-os=STR Base OS name (default: '$base_os'). - --base-name=STR Base OS apptainer image (a '.sif' file). - --project-name=STR Project's apptainer image (a '.sif' file). - - Interactive shell - --project-shell Open the project's shell within the container. - --container-shell Open the container shell. - - Operating mode: - --quiet Do not print informative statements. - -?, --help Give this help list. - -j, --jobs=INT Number of threads to use in each phase. - --build-only Just build the container, don't run it. - -Mandatory or optional arguments to long options are also mandatory or -optional for any corresponding short options. - -Maneage URL: https://maneage.org - -Report bugs to mohammad@akhlaghi.org -EOF -} - -on_off_option_error() { - if [ "x$2" = x ]; then - echo "$scriptname: '$1' doesn't take any values" - else - echo "$scriptname: '$1' (or '$2') doesn't take any values" - fi - exit 1 -} - -check_v() { - if [ x"$2" = x ]; then - printf "$scriptname: option '$1' requires an argument. " - printf "Try '$scriptname --help' for more information\n" - exit 1; - fi -} - -while [ $# -gt 0 ] -do - case $1 in - - # OS directories - -b|--build-dir) build_dir="$2"; check_v "$1" "$build_dir"; shift;shift;; - -b=*|--build-dir=*) build_dir="${1#*=}"; check_v "$1" "$build_dir"; shift;; - -b*) build_dir=$(echo "$1" | sed -e's/-b//'); check_v "$1" "$build_dir"; shift;; - -i|--input-dir) input_dir="$2"; check_v "$1" "$input_dir"; shift;shift;; - -i=*|--input-dir=*) input_dir="${1#*=}"; check_v "$1" "$input_dir"; shift;; - -i*) input_dir=$(echo "$1" | sed -e's/-i//'); check_v "$1" "$input_dir"; shift;; - -s|--software-dir) software_dir="$2"; check_v "$1" "$software_dir"; shift;shift;; - -s=*|--software-dir=*) software_dir="${1#*=}"; check_v "$1" "$software_dir"; shift;; - -s*) software_dir=$(echo "$1" | sed -e's/-s//'); check_v "$1" "$software_dir"; shift;; - --source-dir) source_dir="$2"; check_v "$1" "$source_dir"; shift;shift;; - --source-dir=*) source_dir="${1#*=}"; check_v "$1" "$source_dir"; shift;; - - # Container options. - --base-name) base_name="$2"; check_v "$1" "$base_name"; shift;shift;; - --base-name=*) base_name="${1#*=}"; check_v "$1" "$base_name"; shift;; - --project-name) project_name="$2"; check_v "$1" "$project_name"; shift;shift;; - --project-name=*) project_name="${1#*=}"; check_v "$1" "$project_name"; shift;; - - # Interactive shell. - --project-shell) project_shell=1; shift;; - --project_shell=*) on_off_option_error --project-shell;; - --container-shell) container_shell=1; shift;; - --container_shell=*) on_off_option_error --container-shell;; - - # Operating mode - --quiet) quiet=1; shift;; - --quiet=*) on_off_option_error --quiet;; - -j|--jobs) jobs="$2"; check_v "$1" "$jobs"; shift;shift;; - -j=*|--jobs=*) jobs="${1#*=}"; check_v "$1" "$jobs"; shift;; - -j*) jobs=$(echo "$1" | sed -e's/-j//'); check_v "$1" "$jobs"; shift;; - --build-only) build_only=1; shift;; - --build-only=*) on_off_option_error --build-only;; - --shm-size) shm_size="$2"; check_v "$1" "$shm_size"; shift;shift;; - --shm-size=*) shm_size="${1#*=}"; check_v "$1" "$shm_size"; shift;; - -'?'|--help) print_help; exit 0;; - -'?'*|--help=*) on_off_option_error --help -?;; - - # Unrecognized option: - -*) echo "$scriptname: unknown option '$1'"; exit 1;; - esac -done - - - - - -# Sanity checks -# ------------- -# -# Make sure that the build directory is given and that it exists. -if [ x$build_dir = x ]; then - printf "$scriptname: '--build-dir' not provided, this is the location " - printf "that all built analysis files will be kept on the host OS\n" - exit 1; -else - if ! [ -d $build_dir ]; then - printf "$scriptname: '$build_dir' (value to '--build-dir') doesn't " - printf "exist\n" - exit 1; - fi -fi - -# Set the default project and base-OS image names (inside the build -# directory). -if [ x"$base_name" = x ]; then base_name=$build_dir/maneage-base.sif; fi -if [ x"$project_name" = x ]; then project_name=$build_dir/maneaged.sif; fi - - - - - -# Directory preparations -# ---------------------- -# -# If the host operating system has '/dev/shm', then give Apptainer access -# to it also for improved speed in some scenarios (like configuration). -if [ -d /dev/shm ]; then - shm_mnt="--mount type=bind,src=/dev/shm,dst=/dev/shm"; -else shm_mnt=""; -fi - -# If the following directories do not exist within the build directory, -# create them to make sure the '--mount' commands always work and -# that any file. Ideally, the 'input' directory should not be under the 'build' -# directory, but if the user hasn't given it then they don't care about -# potentially deleting it later (Maneage will download the inputs), so put -# it in the build directory. -analysis_dir="$build_dir"/analysis -if ! [ -d $analysis_dir ]; then mkdir $analysis_dir; fi -analysis_dir_mnt="--mount type=bind,src=$analysis_dir,dst=/home/maneager/build/analysis" - -# If no '--source-dir' was given, set it to the output of 'pwd -P' (to get -# the direct path without potential symbolic links) in the running directory. -if [ x"$source_dir" = x ]; then source_dir=$(pwd -P); fi -source_dir_mnt="--mount type=bind,src=$source_dir,dst=/home/maneager/source" - -# Only when an an input directory is given, we need the respective 'mount' -# option for the 'apptainer run' command. -input_dir_mnt="" -if ! [ x"$input_dir" = x ]; then - input_dir_mnt="--mount type=bind,src=$input_dir,dst=/home/maneager/input" -fi - -# If no '--jobs' has been specified, use the maximum available jobs to the -# operating system. -if [ x$jobs = x ]; then jobs=$(nproc); fi - -# [APPTAINER-ONLY] Optional mounting option for the software directory. -software_dir_mnt="" -if ! [ x"$software_dir" = x ]; then - software_dir_mnt="--mount type=bind,src=$software_dir,dst=/home/maneager/tarballs-software" -fi - -# [APPTAINER-ONLY] Since the container is read-only and is run with the -# '--contain' option (which makes an empty '/tmp'), we need to make a -# dedicated directory for the container to be able to write to. This is -# necessary because some software (Biber in particular on the default -# branch) need to write there! See https://github.com/plk/biber/issues/494. -# We'll keep the directory on the host OS within the build directory, but -# as a hidden file (since it is not necessary in other types of build and -# ultimately only contains temporary files of programs that need it). -toptmp=$build_dir/.apptainer-tmp-$(whoami) -if ! [ -d $toptmp ]; then mkdir $toptmp; fi -rm -rf $toptmp/* # So previous runs don't affect this run. - - - - - -# Maneage'd Apptainer SIF container -# --------------------------------- -# -# Build the base operating system using Maneage's './project configure' -# step. -if [ -f $project_name ]; then - if [ $quiet = 0 ]; then - printf "$scriptname: info: project's image ('$project_name') " - printf "already exists and will be used. If you want to build a " - printf "new project image, give a new name to '--project-name'. " - printf "To remove this message run with '--quiet'\n" - fi -else - - # Build the basic definition, with just Debian and gcc/g++ - if [ -f $base_name ]; then - if [ $quiet = 0 ]; then - printf "$scriptname: info: base OS docker image ('$base_name') " - printf "already exists and will be used. If you want to build a " - printf "new base OS image, give a new name to '--base-name'. " - printf "To remove this message run with '--quiet'\n" - fi - else - - base_def=$build_dir/base.def - cat <<EOF > $base_def -Bootstrap: docker -From: $base_os - -%post - apt-get update && apt-get install -y gcc g++ -EOF - # Build the base operating system container and delete the - # temporary definition file. - apptainer build $base_name $base_def - rm $base_def - fi - - # Build the Maneage definition file. - # - About the '$jobs' variable: this definition file is temporarily - # built and deleted immediately after the SIF file is created. So - # instead of using Apptainer's more complex '{{ jobs }}' format to - # pass an argument, we simply write the value of the configure - # script's '--jobs' option as a shell variable here when we are - # building that file. - # - About the removal of Maneage'd tarballs: we are doing this so if - # Maneage has downloaded tarballs during the build they do not - # unecessarily bloat the container. Even when the user has given a - # software tarball directory, they will all be symbolic links that - # aren't valid when the user runs the container (since we only - # mount the software tarballs at build time). - maneage_def=$build_dir/maneage.def - cat <<EOF > $maneage_def -Bootstrap: localimage -From: $base_name - -%setup - mkdir -p \${APPTAINER_ROOTFS}/home/maneager/input - mkdir -p \${APPTAINER_ROOTFS}/home/maneager/source - mkdir -p \${APPTAINER_ROOTFS}/home/maneager/build/analysis - mkdir -p \${APPTAINER_ROOTFS}/home/maneager/tarballs-software - -%post - cd /home/maneager/source - ./project configure --jobs=$jobs \\ - --input-dir=/home/maneager/input \\ - --build-dir=/home/maneager/build \\ - --software-dir=/home/maneager/tarballs-software - rm /home/maneager/build/software/tarballs/* - -%runscript - cd /home/maneager/source - if [ x"\$maneage_apptainer_stat" = xshell ]; then \\ - ./project shell; \\ - elif [ x"\$maneage_apptainer_stat" = xrun ]; then \\ - if [ x"\$maneage_jobs" = x ]; then \\ - ./project make; \\ - else \\ - ./project make --jobs=\$maneage_jobs; \\ - fi; \\ - else \\ - printf "$scriptname: '\$maneage_apptainer_stat' (value "; \\ - printf "to 'maneage_apptainer_stat' environment variable) "; \\ - printf "is not recognized: should be either 'shell' or 'run'"; \\ - exit 1; \\ - fi -EOF - - # Build the maneage container. The last two are arguments (where order - # matters). The first few are options where order does not matter (so - # we have sorted them by line length). - apptainer build \ - $shm_mnt \ - $input_dir_mnt \ - $source_dir_mnt \ - $analysis_dir_mnt \ - $software_dir_mnt \ - --ignore-fakeroot-command \ - \ - $project_name \ - $maneage_def - - # Clean up. - rm $maneage_def -fi - -# If the user just wanted to build the base operating system, abort the -# script here. -if ! [ x"$build_only" = x ]; then - if [ $quiet = 0 ]; then - printf "$scriptname: info: Maneaged project has been configured " - printf "successfully in the '$project_name' image" - fi - exit 0 -fi - - - - - -# Run the Maneage'd container -# --------------------------- -# -# Set the high-level Apptainer operational mode. -if [ $container_shell = 1 ]; then - aopt="shell" -elif [ $project_shell = 1 ]; then - aopt="run --env maneage_apptainer_stat=shell" -else - aopt="run --env maneage_apptainer_stat=run --env maneage_jobs=$jobs" -fi - -# Build the hostname from the name of the SIF file of the project name. -hstname=$(echo "$project_name" \ - | awk 'BEGIN{FS="/"}{print $NF}' \ - | sed -e's|.sif$||') - -# Execute Apptainer: -# -# - We are not using '--unsquash' (to run within a sandbox) because it -# loads the full multi-gigabyte container into RAM (which we usually -# need for data processing). The container is read-only and we are -# using the following two options instead to ensure that we have no -# influence from outside the container. (description of each is from -# the Apptainer manual) -# --contain: use minimal /dev and empty other directories (e.g. /tmp -# and $HOME) instead of sharing filesystems from your host. -# --cleanenv: clean environment before running container". -# -# - We are not mounting '/dev/shm' since Apptainer prints a warning that -# it is already mounted (apparently does not need it at run time). -# -# --no-home and --home: the first ensures that the 'HOME' variable is -# different from the user's home on the host operating system, the -# second sets it to a directory we specify (to keep things like -# '.bash_history'). -apptainer $aopt \ - --no-home \ - --contain \ - --cleanenv \ - --home $toptmp \ - $input_dir_mnt \ - $source_dir_mnt \ - $analysis_dir_mnt \ - --workdir $toptmp \ - --hostname $hstname \ - --cwd /home/maneager/source \ - \ - $project_name diff --git a/reproduce/software/containers/docker.sh b/reproduce/software/containers/docker.sh deleted file mode 100755 index d5b5682..0000000 --- a/reproduce/software/containers/docker.sh +++ /dev/null @@ -1,486 +0,0 @@ -#!/bin/sh -# -# Create a Docker container from an existing image of the built software -# environment, but with the source, data and build (analysis) directories -# directly within the host file system. This script is assumed to be run in -# the top project source directory (that has 'README.md' and -# 'paper.tex'). If not, use the '--source-dir' option to specify where the -# Maneage'd project source is located. -# -# Usage: -# -# - When you are at the top Maneage'd project directory, you can run this -# script like the example below. Just set all the '/PATH/TO/...' -# directories (see below for '--tmp-dir'). See the items below for -# optional values. -# -# ./reproduce/software/containers/docker.sh --shm-size=15gb \ -# --software-dir=/PATH/TO/SOFTWARE/TARBALLS \ -# --build-dir=/PATH/TO/BUILD/DIRECTORY -# -# - Non-mandatory options: -# -# - If you already have the input data that is necessary for your -# project's, use the '--input-dir' option to specify its location -# on your host file system. Otherwise the necessary analysis -# files will be downloaded directly into the build -# directory. Note that this is only necessary when '--build-only' -# is not given. -# -# - The '--software-dir' is only useful if you want to build a -# container. Even in that case, it is not mandatory: if not -# given, the software tarballs will be downloaded (thus requiring -# internet). -# -# - To avoid having to set the directory(s) every time you want to -# start the docker environment, you can put this command (with the -# proper directories) into a 'run.sh' script in the top Maneage'd -# project source directory and simply execute that. The special name -# 'run.sh' is in Maneage's '.gitignore', so it will not be included -# in your git history by mistake. -# -# Known problems: -# -# - As of 2025-04-06 the log file containing the output of the 'docker -# build' command that configures the Maneage'd project does not keep -# all the output (which gets clipped by Docker). with a "[output -# clipped, log limit 2MiB reached]" message. We need to find a way to -# fix this (so nothing gets clipped: useful for debugging). -# -# Copyright (C) 2021-2025 Mohammad Akhlaghi <mohammad@akhlaghi.org> -# -# This script is free software: you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by the -# Free Software Foundation, either version 3 of the License, or (at your -# option) any later version. -# -# This script is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General -# Public License for more details. -# -# You should have received a copy of the GNU General Public License along -# with this script. If not, see <http://www.gnu.org/licenses/>. - - - - - -# Script settings -# --------------- -# Stop the script if there are any errors. -set -e - - - - - -# Default option values -jobs= -quiet=0 -source_dir= -build_only= -image_file="" -shm_size=20gb -scriptname="$0" -project_shell=0 -container_shell=0 -project_name=maneaged -base_name=maneage-base -base_os=debian:stable-slim - -print_help() { - # Print the output. - cat <<EOF -Usage: $scriptname [OPTIONS] - -Top-level script to build and run a Maneage'd project within Docker. - - Host OS directories (to be mounted in the container): - -b, --build-dir=STR Dir. to build in (only analysis in host). - -i, --input-dir=STR Dir. of input datasets (optional). - -s, --software-dir=STR Directory of necessary software tarballs. - --source-dir=STR Directory of source code (default: 'pwd -P'). - - Docker images - --base-os=STR Base OS name (default: '$base_os'). - --base-name=STR Base OS docker image (default: $base_name). - --project-name=STR Project's docker image (default: $project_name). - --image-file=STR [Docker only] Load (if given file exists), or - save (if given file does not exist), the image. - For saving, the given name has to have an - '.tar.gz' suffix. - - Interactive shell - --project-shell Open the project's shell within the container. - --container-shell Open the container shell. - - Operating mode: - --quiet Do not print informative statements. - -?, --help Give this help list. - --shm-size=STR Passed to 'docker build' (default: $shm_size). - -j, --jobs=INT Number of threads to use in each phase. - --build-only Just build the container, don't run it. - -Mandatory or optional arguments to long options are also mandatory or -optional for any corresponding short options. - -Maneage URL: https://maneage.org - -Report bugs to mohammad@akhlaghi.org -EOF -} - -on_off_option_error() { - if [ "x$2" = x ]; then - echo "$scriptname: '$1' doesn't take any values" - else - echo "$scriptname: '$1' (or '$2') doesn't take any values" - fi - exit 1 -} - -check_v() { - if [ x"$2" = x ]; then - printf "$scriptname: option '$1' requires an argument. " - printf "Try '$scriptname --help' for more information\n" - exit 1; - fi -} - -while [ $# -gt 0 ] -do - case $1 in - - # OS directories - -b|--build-dir) build_dir="$2"; check_v "$1" "$build_dir"; shift;shift;; - -b=*|--build-dir=*) build_dir="${1#*=}"; check_v "$1" "$build_dir"; shift;; - -b*) build_dir=$(echo "$1" | sed -e's/-b//'); check_v "$1" "$build_dir"; shift;; - -i|--input-dir) input_dir="$2"; check_v "$1" "$input_dir"; shift;shift;; - -i=*|--input-dir=*) input_dir="${1#*=}"; check_v "$1" "$input_dir"; shift;; - -i*) input_dir=$(echo "$1" | sed -e's/-i//'); check_v "$1" "$input_dir"; shift;; - -s|--software-dir) software_dir="$2"; check_v "$1" "$software_dir"; shift;shift;; - -s=*|--software-dir=*) software_dir="${1#*=}"; check_v "$1" "$software_dir"; shift;; - -s*) software_dir=$(echo "$1" | sed -e's/-s//'); check_v "$1" "$software_dir"; shift;; - --source-dir) source_dir="$2"; check_v "$1" "$source_dir"; shift;shift;; - --source-dir=*) source_dir="${1#*=}"; check_v "$1" "$source_dir"; shift;; - - # Container options. - --base-name) base_name="$2"; check_v "$1" "$base_name"; shift;shift;; - --base-name=*) base_name="${1#*=}"; check_v "$1" "$base_name"; shift;; - - # Interactive shell. - --project-shell) project_shell=1; shift;; - --project_shell=*) on_off_option_error --project-shell;; - --container-shell) container_shell=1; shift;; - --container_shell=*) on_off_option_error --container-shell;; - - # Operating mode - --quiet) quiet=1; shift;; - --quiet=*) on_off_option_error --quiet;; - -j|--jobs) jobs="$2"; check_v "$1" "$jobs"; shift;shift;; - -j=*|--jobs=*) jobs="${1#*=}"; check_v "$1" "$jobs"; shift;; - -j*) jobs=$(echo "$1" | sed -e's/-j//'); check_v "$1" "$jobs"; shift;; - --build-only) build_only=1; shift;; - --build-only=*) on_off_option_error --build-only;; - --shm-size) shm_size="$2"; check_v "$1" "$shm_size"; shift;shift;; - --shm-size=*) shm_size="${1#*=}"; check_v "$1" "$shm_size"; shift;; - -'?'|--help) print_help; exit 0;; - -'?'*|--help=*) on_off_option_error --help -?;; - - # Output file - --image-file) image_file="$2"; check_v "$1" "$image_file"; shift;shift;; - --image-file=*) image_file="${1#*=}"; check_v "$1" "$image_file"; shift;; - - # Unrecognized option: - -*) echo "$scriptname: unknown option '$1'"; exit 1;; - esac -done - - - - - -# Sanity checks -# ------------- -# -# Make sure that the build directory is given and that it exists. -if [ x$build_dir = x ]; then - printf "$scriptname: '--build-dir' not provided, this is the location " - printf "that all built analysis files will be kept on the host OS\n" - exit 1; -else - if ! [ -d $build_dir ]; then - printf "$scriptname: '$build_dir' (value to '--build-dir') doesn't " - printf "exist\n"; exit 1; - fi -fi - -# The temporary directory to place the Dockerfile. -tmp_dir="$build_dir"/temporary-docker-container-dir - - - - -# Directory preparations -# ---------------------- -# -# If the host operating system has '/dev/shm', then give Docker access -# to it also for improved speed in some scenarios (like configuration). -if [ -d /dev/shm ]; then shm_mnt="-v /dev/shm:/dev/shm"; -else shm_mnt=""; fi - -# If the following directories do not exist within the build directory, -# create them to make sure the '--mount' commands always work and -# that any file. Ideally, the 'input' directory should not be under the 'build' -# directory, but if the user hasn't given it then they don't care about -# potentially deleting it later (Maneage will download the inputs), so put -# it in the build directory. -analysis_dir="$build_dir"/analysis -if ! [ -d $analysis_dir ]; then mkdir $analysis_dir; fi - -# If no '--source-dir' was given, set it to the output of 'pwd -P' (to get -# the path without potential symbolic links) in the running directory. -if [ x"$source_dir" = x ]; then source_dir=$(pwd -P); fi - -# Only when an an input directory is given, we need the respective 'mount' -# option for the 'docker run' command. -input_dir_mnt="" -if ! [ x"$input_dir" = x ]; then - input_dir_mnt="-v $input_dir:/home/maneager/input" -fi - -# If no '--jobs' has been specified, use the maximum available jobs to the -# operating system. -if [ x$jobs = x ]; then jobs=$(nproc); fi - -# [DOCKER-ONLY] Make sure the user is a member of the 'docker' group: -glist=$(groups $(whoami) | awk '/docker/') -if [ x"$glist" = x ]; then - printf "$scriptname: you are not a member of the 'docker' group " - printf "You can run the following command as root to fix this: " - printf "'usermod -aG docker $(whoami)'\n" - exit 1 -fi - -# [DOCKER-ONLY] Function to check the temporary directory for building the -# base operating system docker image. It is necessary that this directory -# be empty because Docker will inherit the sub-directories of the directory -# that the Dockerfile is located in. -tmp_dir_check () { - if [ -d $tmp_dir ]; then - printf "$scriptname: '$tmp_dir' already exists, please " - printf "delete it and re-run this script. This is a temporary " - printf "directory only necessary when building a Docker image " - printf "and gets deleted automatically after a successful " - printf "build. The fact that it remains hints at a problem " - printf "in a previous attempt to build a Docker image\n" - exit 1 - else - mkdir $tmp_dir - fi -} - - - - - -# Base operating system -# --------------------- -# -# If the base image does not exist, then create it. If it does, inform the -# user that it will be used. -if docker image list | grep $base_name &> /dev/null; then - if [ $quiet = 0 ]; then - printf "$scriptname: info: base OS docker image ('$base_name') " - printf "already exists and will be used. If you want to build a " - printf "new base OS image, give a new name to '--base-name'. " - printf "To remove this message run with '--quiet'\n" - fi -else - - # In case an image file is given, load the environment from that (no - # need to build the environment from scratch). - if ! [ x"$image_file" = x ] && [ -f "$image_file" ]; then - docker load --input $image_file - else - - # Build the temporary directory. - tmp_dir_check - - # Build the Dockerfile. - uid=$(id -u) - cat <<EOF > $tmp_dir/Dockerfile -FROM $base_os -RUN useradd -ms /bin/sh --uid $uid maneager; \\ - printf '123\n123' | passwd maneager; \\ - printf '456\n456' | passwd root -RUN apt update; apt install -y gcc g++ wget; echo 'export PS1="[\[\033[01;31m\]\u@\h \W\[\033[32m\]\[\033[00m\]]# "' >> ~/.bashrc -USER maneager -WORKDIR /home/maneager -RUN mkdir build; mkdir build/analysis; echo 'export PS1="[\[\033[01;35m\]\u@\h \W\[\033[32m\]\[\033[00m\]]$ "' >> ~/.bashrc -EOF - - # Build the base-OS container and delete the temporary directory. - curdir="$(pwd)" - cd $tmp_dir - docker build ./ \ - -t $base_name \ - --shm-size=$shm_size - cd "$curdir" - rm -rf $tmp_dir - fi -fi - - - - - -# Maneage software configuration -# ------------------------------ -# -# Having the base operating system in place, we can now construct the -# project's docker file. -if docker image list | grep $project_name &> /dev/null; then - if [ $quiet = 0 ]; then - printf "$scriptname: info: project's image ('$project_name') " - printf "already exists and will be used. If you want to build a " - printf "new project image, give a new name to '--project-name'. " - printf "To remove this message run with '--quiet'\n" - fi -else - - # Build the temporary directory. - tmp_dir_check - df=$tmp_dir/Dockerfile - - # The only way to mount a directory inside the Docker build environment - # is the 'RUN --mount' command. But Docker doesn't recognize things - # like symbolic links. So we need to copy the project's source under - # this temporary directory. - sdir=source - mkdir $tmp_dir/$sdir - dsr=/home/maneager/source-raw - cp -r $source_dir/* $source_dir/.git $tmp_dir/$sdir - - # Start constructing the Dockerfile. - # - # Note on the printf's '\x5C\n' part: this will print out as a - # backslash at the end of the line to allow easy human readability of - # the Dockerfile (necessary for debugging!). - echo "FROM $base_name" > $df - printf "RUN --mount=type=bind,source=$sdir,target=$dsr \x5C\n" >> $df - - # If a software directory was given, copy it and add its line. - tsdir=tarballs-software - dts=/home/maneager/tarballs-software - if ! [ x"$software_dir" = x ]; then - - # Make the directory to host the software and copy the contents - # that the user gave there. - mkdir $tmp_dir/$tsdir - cp -r "$software_dir"/* $tmp_dir/$tsdir/ - printf " --mount=type=bind,source=$tsdir,target=$dts \x5C\n" >> $df - fi - - # Construct the rest of the 'RUN' command. - printf " cp -r $dsr /home/maneager/source; \x5C\n" >> $df - printf " cd /home/maneager/source; \x5C\n" >> $df - printf " ./project configure --jobs=$jobs \x5C\n" >> $df - printf " --build-dir=/home/maneager/build \x5C\n" >> $df - printf " --input-dir=/home/maneager/input \x5C\n" >> $df - printf " --software-dir=$dts; \x5C\n" >> $df - - # We are deleting the '.build/software/tarballs' directory because this - # directory is not relevant for the analysis of the project. But in - # case any tarball was downloaded, it will consume space within the - # container. - printf " rm -rf .build/software/tarballs; \x5C\n" >> $df - - # We are deleting the source directory becaues later (at 'docker run' - # time), the 'source' will be mounted directly from the host operating - # system. - printf " cd /home/maneager; \x5C\n" >> $df - printf " rm -rf source\n" >> $df - - # Build the Maneage container and delete the temporary directory. The - # '--progress plain' option is for Docker to print all the outputs - # (otherwise, it will only print a very small part!). - cd $tmp_dir - docker build ./ -t $project_name \ - --progress=plain \ - --shm-size=$shm_size \ - --no-cache \ - 2>&1 | tee build.log - cd .. - rm -rf $tmp_dir -fi - -# If the user wants to save the container (into a file that does not -# exist), do it here. If the file exists, it will only be used for creating -# the container in the previous stages. -if ! [ x"$image_file" = x ] && ! [ -f "$image_file" ]; then - - # Save the image into a tarball - tarname=$(echo $image_file | sed -e's|.gz$||') - if [ $quiet = 0 ]; then - printf "$scriptname: info: saving docker image to '$tarname'" - fi - docker save -o $tarname $project_name - - # Compress the saved image - if [ $quiet = 0 ]; then - printf "$scriptname: info: compressing to '$image_file' (can " - printf "take +10 minutes, but volume decreases by more than half!)" - fi - gzip --best $tarname -fi - -# If the user just wanted to build the base operating system, abort the -# script here. -if ! [ x"$build_only" = x ]; then - if [ $quiet = 0 ]; then - printf "$scriptname: info: Maneaged project has been configured " - printf "successfully in the '$project_name' image" - fi - exit 0 -fi - - - - - -# Run the analysis within the Maneage'd container -# ----------------------------------------------- -# -# The startup command of the container is managed though the 'shellopt' -# variable that starts here. -shellopt="" -if [ $container_shell = 1 ] || [ $project_shell = 1 ]; then - - # If the user wants to start the project shell within the container, - # add the necessary command. - if [ $project_shell = 1 ]; then - shellopt="/bin/bash -c 'cd source; ./project shell;'" - fi - - # Finish the 'shellop' string with a single quote (necessary in any - # case) and run Docker. - interactiveopt="-it" - -# No interactive shell requested, just run the project. -else - interactiveopt="" - shellopt="/bin/bash -c 'cd source; ./project make --jobs=$jobs;'" -fi - -# Execute Docker. The 'eval' is because the 'shellopt' variable contains a -# single-quote that the shell should "evaluate". -eval docker run \ - -v "$analysis_dir":/home/maneager/build/analysis \ - -v "$source_dir":/home/maneager/source \ - $input_dir_mnt \ - $shm_mnt \ - $interactiveopt \ - $project_name \ - $shellopt |