diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 481 |
1 files changed, 130 insertions, 351 deletions
@@ -1,7 +1,7 @@ Reproducible source for XXXXXXXXXXXXXXXXX ------------------------------------------------------------------------- -Copyright (C) 2018-2021 Mohammad Akhlaghi <mohammad@akhlaghi.org>\ +Copyright (C) 2018-2025 Mohammad Akhlaghi <mohammad@akhlaghi.org>\ See the end of the file for license conditions. This is the reproducible project source for the paper titled "**XXX XXXXX @@ -188,411 +188,190 @@ analysis and finally create the final paper). -### Building in Docker containers - -Docker containers are a common way to build projects in an independent -filesystem, and an almost independent operating system. Containers thus -allow using GNU/Linux operating systems within proprietary operating -systems like macOS or Windows. But without the overhead and huge file size -of virtual machines. Furthermore containers allow easy movement of built -projects from one system to another without rebuilding. Just note that -Docker images are large binary files (+1 Gigabytes) and may not be usable -in the future (for example with new Docker versions not reading old -images). Containers are thus good for temporary/testing phases of a -project, but shouldn't be what you archive for the long term! - -Hence if you want to save and move your maneaged project within a Docker -image, be sure to commit all your project's source files and push them to -your external Git repository (you can do these within the Docker image as -explained below). This way, you can always recreate the container with -future technologies too. Generally, if you are developing within a -container, its good practice to recreate it from scratch every once in a -while, to make sure you haven't forgot to include parts of your work in -your project's version-controlled source. In the sections below we also -describe how you can use the container **only for the software -environment** and keep your data and project source on your host. - -#### Dockerfile for a Maneaged project, and building a Docker image - -Below is a series of recommendations on the various components of a -`Dockerfile` optimized to store the *built state of a maneaged project* as -a Docker image. Each component is also accompanied with -explanations. Simply copy the code blocks under each item into a plain-text -file called `Dockerfile`, in the same order of the items. Don't forget to -implement the suggested corrections (in particular step 4). - -**NOTE: Internet for TeXLive installation:** If you have the project -software tarballs and input data (optional features described below) you -can disable internet. In this situation, the configuration and analysis -will be exactly reproduced, the final LaTeX macros will be created, and all -results will be verified successfully. However, no final `paper.pdf` will -be created to visualize/combine everything in one easy-to-read file. Until -[task 15267](https://savannah.nongnu.org/task/?15267) is complete, we need -internet to install TeXLive packages (using TeXLive's own package manager -`tlmgr`) in the `./project configure` phase. This won't stop the -configuration, and it will finish successfully (since all the analysis can -still be reproduced). We are working on completing this task as soon as -possible, but until then, if you want to disable internet *and* you want to -build the final PDF, please disable internet after the configuration -phase. Note that only the necessary TeXLive packages are installed (~350 -MB), not the full TeXLive collection! - - 0. **Summary:** If you are already familiar with Docker, then the full - Dockerfile to get the project environment setup is shown here (without - any comments or explanations, because explanations are done in the next - items). Note that the last two `COPY` lines (to copy the directory - containing software tarballs used by the project and the possible input - databases) are optional because they will be downloaded if not - available. You can also avoid copying over all, and simply mount your - host directories within the image, we have a separate section on doing - this below ("Only software environment in the Docker image"). Once you - build the Docker image, your project's environment is setup and you can - go into it to run `./project make` manually. - - ```shell - FROM debian:stable-slim - RUN apt-get update && apt-get install -y gcc g++ wget - RUN useradd -ms /bin/sh maneager - USER maneager - WORKDIR /home/maneager - RUN mkdir build - RUN mkdir software - COPY --chown=maneager:maneager ./project-source /home/maneager/source - COPY --chown=maneager:maneager ./software-dir /home/maneager/software - COPY --chown=maneager:maneager ./data-dir /home/maneager/data - RUN cd /home/maneager/source \ - && ./project configure --build-dir=/home/maneager/build \ - --software-dir=/home/maneager/software \ - --input-dir=/home/maneager/data - ``` - - 1. **Choose the base operating system:** The first step is to select the - operating system that will be used in the docker image. Note that your - choice of operating system also determines the commands of the next - step to install core software. - - ```shell - FROM debian:stable-slim - ``` - - 2. **Maneage dependencies:** By default the "slim" versions of the - operating systems don't contain a compiler (needed by Maneage to - compile precise versions of all the tools). You thus need to use the - selected operating system's package manager to import them (below is - the command for Debian). Optionally, if you don't have the project's - software tarballs, and want the project to download them automatically, - you also need a downloader. - - ```shell - # C and C++ compiler. - RUN apt-get update && apt-get install -y gcc g++ - - # Uncomment this if you don't have 'software-XXXX.tar.gz' (below). - #RUN apt-get install -y wget - ``` - - 3. **Define a user:** Some core software packages will complain if you try - to install them as the default (root) user. Generally, it is also good - practice to avoid being the root user. After building the Docker image, - you can always run it as root with this command: `docker run -u 0 -it - XXXXXXX` (where `XXXXXXX` is the image identifier). Hence with the - commands below we define a `maneager` user and activate it for the next - steps. - - ```shell - RUN useradd -ms /bin/sh maneager - USER maneager - WORKDIR /home/maneager - ``` - - 4. **Copy project files into the container:** these commands make the - assumptions listed below. IMPORTANT: you can also avoid copying over - all, and simply mount your host directories within the image, we have a - separate section on doing this below ("Only software environment in the - Docker image"). - - * The project's source is in the `maneaged/` sub-directory and this - directory is in the same directory as the `Dockerfile`. The source - can either be from cloned from Git (highly recommended!) or from a - tarball. Both are described above (note that arXiv's tarball needs to - be corrected as mentioned above). - - * (OPTIONAL) By default the project's necessary software source - tarballs will be downloaded when necessary during the `./project - configure` phase. But if you already have the sources, its better to - use them and not waste network traffic (and resulting carbon - footprint!). Maneaged projects usually come with a - `software-XXXX.tar.gz` file that is published on Zenodo (link above). - If you have this file, put it in the same directory as your - `Dockerfile` and include the relevant lines below. - - * (OPTIONAL) The project's input data. The `INPUT-FILES` depends on the - project, please look into the project's - `reproduce/analysis/config/INPUTS.conf` for the URLs and the file - names of input data. Similar to the software source files mentioned - above, if you don't have them, the project will attempt to download - its necessary data automatically in the `./project make` phase. - - ```shell - # Make the project's build directory and copy the project source - RUN mkdir build - COPY --chown=maneager:maneager ./maneaged /home/maneager/source - - # Optional (for software) - COPY --chown=maneager:maneager ./software-XXXX.tar.gz /home/maneager/ - RUN tar xf software-XXXX.tar.gz && mv software-XXXX software && rm software-XXXX.tar.gz - - # Optional (for data) - RUN mkdir data - COPY --chown=maneager:maneager ./INPUT-FILES /home/maneager/data - ``` - - 5. **Configure the project:** With this line, the Docker image will - configure the project (build all its necessary software). This will - usually take about an hour on an 8-core system. You can also optionally - avoid putting this step (and the next) in the `Dockerfile` and simply - execute them in the Docker image in interactive mode (as explained in - the sub-section below, in this case don't forget to preserve the build - container after you are done). - - ```shell - # Configure project (build full software environment). - RUN cd /home/maneager/source \ - && ./project configure --build-dir=/home/maneager/build \ - --software-dir=/home/maneager/software \ - --input-dir=/home/maneager/data - ``` - - 6. **Project's analysis:** With this line, the Docker image will do the - project's analysis and produce the final `paper.pdf`. The time it takes - for this step to finish, and the storage/memory requirements highly - depend on the particular project. - - ```shell - # Run the project's analysis - RUN cd /home/maneager/source && ./project make - ``` - - 7. **Build the Docker image:** The `Dockerfile` is now ready! In the - terminal, go to its directory and run the command below to build the - Docker image. We recommend to keep the `Dockerfile` in **an empty - directory** and run it from inside that directory too. This is because - Docker considers that directories contents to be part of the - environment. Finally, just set a `NAME` for your project and note that - Docker only runs as root. - - ```shell - sudo su - docker build -t NAME ./ - ``` - - - -#### Interactive tests on built container - -If you later want to start a container with the built image and enter it in -interactive mode (for example for temporary tests), please run the -following command. Just replace `NAME` with the same name you specified -when building the project. You can always exit the container with the -`exit` command (note that all your changes will be discarded once you exit, -see below if you want to preserve your changes after you exit). -```shell -docker run -it NAME -``` -#### Running your own project's shell for same analysis environment -The default operating system only has minimal features: not having many of -the tools you are accustomed to in your daily command-line operations. But -your maneaged project has a very complete (for the project!) environment -which is fully built and ready to use interactively with the commands -below. For example the project also builds Git within itself, as well as -many other high-level tools that are used in your project and aren't -present in the container's operating system. +### Building on older systems (+10 year old compilers) -```shell -# Once you are in the docker container -cd source -./project shell -``` +Maneage builds almost all its software itself. But to do that, it needs a C +and C++ compiler on the host. The C++ standard in particular is updated +regularly. Therefore, gradually the basic software packages (that are used +to build the internal Maneage C compiler and other necessary tools) will +start using the newer language features in their newer versions. As a +result, if a host doesn't update its compiler for more than a decade, some +of the basic software may not get built. +Note that this is only a problem for the "basic" software of Maneage (that +are used to build Maneage's own C compiler), not the high-level (or +science) software. On GNU/Linux systems, the high-level software get built +with Maneage's own C compiler. Therefore once Maneage's C compiler is +built, you don't need to worry about the versions of the high-level +software. +One solution to such cases is to downgrade the versions of the basic +software that can't be built. For example, when building Maneage in August +2022 on a old Debian GNU/Linux system from 2010 (with GCC 4.4.5 and GNU C +Library 2.11.3 and Linux kernel 2.6.32-5 on an amd64 architecture), the +following low-level packages needed to be downgraded to slightly earlier +versions. -#### Preserving the state of a built container +| Program name | 2022-08 version | Version for old system | +|:------------------------------|:---------------:|:----------------------:| +| PatchELF | 0.13 | 0.9 | +| GNU Binutils | 2.39 | 2.37 | +| GNU Compiler Collection (GCC) | 12.1.0 | 10.2.0 | -All interactive changes in a container will be deleted as soon as you exit -it. THIS IS A VERY GOOD FEATURE IN GENERAL! If you want to make persistent -changes, you should do it in the project's plain-text source and commit -them into your project's online Git repository. As described in the Docker -introduction above, we strongly recommend to **not rely on a built container -for archival purposes**. +As you can see above, fortunately most basic software in Maneage respect ++10 year old compilers and are build-able there. So your higher-level +science software should be buildable with out changing their versions. It +is _highly improbable_ that these downgrades will affect your final science +result. -But for temporary tests it is sometimes good to preserve the state of an -interactive container. To do this, you need to `commit` the container (and -thus save it as a Docker "image"). To do this, while the container is still -running, open another terminal and run these commands: -```shell -# These two commands should be done in another terminal -docker container list -# Get 'XXXXXXX' of your desired container from the first column above. -# Give the new image a name by replacing 'NEW-IMAGE-NAME'. -docker commit XXXXXXX NEW-IMAGE-NAME -``` -#### Copying files from the Docker image to host operating system -The Docker environment's file system is completely indepenent of your host -operating system. One easy way to copy files to and from an open container -is to use the `docker cp` command (very similar to the shell's `cp` -command). -```shell -docker cp CONTAINER:/file/path/within/container /host/path/target -``` +### Building on ARM -#### Only software environment in the Docker image +As of 2021-10-13, very little testing of Maneage has been done on arm64 +(tested in [aarch64](https://en.wikipedia.org/wiki/AArch64)). However, +_some_ testing has been done on [the +PinePhone](https://en.wikipedia.org/wiki/PinePhone), running +[Debian/Mobian](https://wiki.mobian-project.org/doku.php?id=pinephone). In +principle default Maneage branch (not all high-level software have been +tested) should run fully (configure + make) from the raw source to the +final verified pdf. Some issues that you might need to be aware of are +listed below. -You can set the docker image to only contain the software environment and -keep the project source and built analysis files (data and PDF) on your -host operating system. This enables you to keep the size of the Docker -image to a minimum (only containing the built software environment) to -easily move it from one computer to another. Below we'll summarize the -steps. +#### Older packages -1. Get your user ID with this command: `id -u`. +In old packages that may be still needed and that have an old +`config.guess` file (e.g. from 2002, such as fftw2-2.1.5-4.2, that are not +in the base Maneage branch) may crash during the build. A workaround is to +provide an updated (e.g. 2018) 'config.guess' file (automake --add-missing +--force-missing --copy) in 'reproduce/software/patches/' and copy it over +the old file during the build of the package. -2. Put the following lines into a `Dockerfile` of an otherwise empty -directory. Just replacing `UID` with your user ID (found in the step -above). This will build the basic directory structure. for the next steps. +#### An un-killable running job -```shell -FROM debian:stable-slim -RUN apt-get update && apt-get install -y gcc g++ wget -RUN useradd -ms /bin/sh --uid UID maneager -USER maneager -WORKDIR /home/maneager -RUN mkdir build -``` +Vampires may be a problem on the pinephone/aarch64. A "vampire" is defined +here as a job that is in the "R" (running) state, using nearly 95-100% of a +cpu, for an extremely long time (hours), without producing any output to +its log file, and is immune to being killed by the user or root with 'kill +-9'. A reboot and relaunch of the './project configure --existing-conf' +command is the only solution currently known (as of 2021-10-13) for +vampires. These are known to have occurred with linux-image-5.13-sunxi64. -3. Create an image based on the `Dockerfile` above. Just replace `PROJECT` -with your desired name. -```shell -docker build -t PROJECT ./ -``` +#### RAM/swap space -4. Run the command below to create a container based on the image and mount -the desired directories on your host into the special directories of your -container. Just don't forget to replace `PROJECT` and set the `/PATH`s to -the respective paths in your host operating system. +Adding atleast 3 Gb of swap space (man swapon, man mkswap, man dd) on the +eMMC may help to reduce the chance of having errors due to the lack of RAM. -```shell -docker run -v /PATH/TO/PROJECT/SOURCE:/home/maneager/source \ - -v /PATH/TO/PROJECT/ANALYSIS/OUTPUTS:/home/maneager/build/analysis \ - -v /PATH/TO/SOFTWARE/SOURCE/CODE/DIR:/home/maneager/software \ - -v /PATH/TO/RAW/INPUT/DATA:/home/maneager/data \ - -it PROJECT -``` -5. After running the command above, you are within the container. Go into -the project source directory and run these commands to build the software -environment. +#### Time scale -```shell -cd /home/maneager/source -./project configure --build-dir=/home/maneager/build \ - --software-dir=/home/maneager/software \ - --input-dir=/home/maneager/data -``` +On the PinePhone v1.2b, apart from the time wasted by vampires, expect +roughly 24 hours' wall time in total for the full 'configure' phase. The +default 'maneage' example calculations, diagrams and pdf production are +light and should be very fast. -6. After the configuration finishes successfully, it will say so and ask -you to run `./project make`. But don't do that yet. Keep this Docker -container open and don't exit the container or terminal. Open a new -terminal, and follow the steps described in the sub-section above to -preserve the built container as a Docker image. Let's assume you call it -`PROJECT-ENV`. After the new image is made, you should be able to see the -new image in the list of images with this command (in the same terminal -that you created the image): -```shell -docker image list # In the other terminal. -``` -7. Now you can run `./project make` in the initial container. You will see -that all the built products (temporary or final datasets or PDFs), will be -written in the `/PATH/TO/PROJECT/ANALYSIS/OUTPUTS` directory of your -host. You can even change the source of your project on your host operating -system an re-run Make to see the effect on the outputs and add/commit the -changes to your Git history within your host. You can also exit the -container any time. You can later load the `PROJECT-ENV` environment image -into a new container with the same `docker run -v ...` command above, just -use `PROJECT-ENV` instead of `PROJECT`. -8. In case you want to store the image as a single file as backup or to -move to another computer, you can run the commands below. They will produce -a single `project-env.tar.gz` file. -```shell -docker save -o project-env.tar PROJECT-ENV -gzip --best project-env.tar -``` -9. To load the tarball above into a clean docker environment (either on the -same system or in another system), and create a new container from the -image like above (the `docker run -v ...` command). Just don't forget that -if your `/PATH/TO/PROJECT/ANALYSIS/OUTPUTS` directory is empty on the -new/clean system, you should first run `./project configure -e` in the -docker image so it builds the core file structure there. Don't worry, it -won't build any software and should finish in a second or two. Afterwards, -you can safely run `./project make`. -```shell -docker load --input project-env.tar.gz -``` -#### Deleting all Docker images +### Building in containers -After doing your tests/work, you may no longer need the multi-gigabyte -files images, so its best to just delete them. To do this, just run the two -commands below to first stop all running containers and then to delete all -the images: +Containers are a common way to build projects in an independent filesystem +and an almost independent operating system without the overhead (in size +and speed) of a virtual machine. As a result, containers allow easy +movement of built projects from one system to another without +rebuilding. However, they are still large binary files (+1 Gigabytes) and +may not be usable in the future (for example with new software versions not +reading old images or old/new kernel issues). Containers are thus good for +execution/testing phases of a project, but shouldn't be what you archive +for the long term! -```shell -docker ps -a -q | xargs docker rm -docker images -a -q | xargs docker rmi -f -``` +It is therefore very important that if you want to save and move your +maneaged project within containers, be sure to commit all your project's +source files and push them to your external Git repository (you can do +these within the container as explained below). This way, you can always +recreate the container with future technologies too. Generally, if you are +developing within a container, its good practice to recreate it from +scratch every once in a while, to make sure you haven't forgot to include +parts of your work in your project's version-controlled source. In the +sections below we also describe how you can use the container **only for +the software environment** and keep your data and project source on your +host. + +If you have the necessary software tarballs and input data (optional +features described below) you can disable internet. In this situation, the +configuration and analysis will be exactly reproduced, the final LaTeX +macros will be created, and all results will be verified +successfully. However, no final `paper.pdf` will be created to +visualize/combine everything in one easy-to-read file. Until [task +15267](https://savannah.nongnu.org/task/?15267) is complete, Maneage only +needs internet to install TeXLive packages (using TeXLive's own package +manager `tlmgr`) in the `./project configure` phase. This won't stop the +configuration (since all the analysis can still be reproduced). We are +working on completing this task as soon as possible, but until then, if you +want to disable internet *and* you want to build the final PDF, please +disable internet after the configuration phase. Note that only the +necessary TeXLive packages are installed (~350 MB), not the full TeXLive +collection! + +The container technologies that Maneage has a high-level interface for +(with the `reproduce/software/shell` directory) are listed below. Each has +a dedicated shell script in that directory with an (almost) identical +interface. See the respective `*-README.md` file in that directory for more +details, as well as running your desired script with `--help` or reading +its comments at the top of the file. + + - [Apptainer](https://apptainer.org): useful in high performance + computing (HPC) facilities (where you do not have root + permissions). Apptainer is fully free and open source software. + Apptainer containers can only be created and used on GNU/Linux + operating systems, but are stored as a single file (very easy to + manage). + - [Docker](https://www.docker.com): requires root access, but useful on + virtual private servers (VPSs). Docker images are stored and managed by + a root-level daemon, so you can only manage them through its own + interface (making containers by all users visible and accessible to all + other users of a system by default). A docker container build on a + GNU/Linux host can also be executed on Windows or macOS. However, while + the Docker engine and its command-line interface on GNU/Linux are free + and open source software, its desktop application (with a GUI and + components necessary for Windows or macOS) is not (requires payment for + large companies). -### Copyright information -This file and `.file-metadata` (a binary file, used by Metastore to store -file dates when doing Git checkouts) are part of the reproducible project -mentioned above and share the same copyright notice (at the start of this -file) and license notice (below). +## Copyright information -This project is free software: you can redistribute it and/or modify it -under the terms of the GNU General Public License as published by the Free +This file is free software: you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. -This project is distributed in the hope that it will be useful, but WITHOUT +This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along -with this project. If not, see <https://www.gnu.org/licenses/>. +with this file. If not, see <https://www.gnu.org/licenses/>. |