README.md: improved explanation on running Docker

With the new features in Maneage to install the necessry Xorg libraries, the explanations of the Docker image creation also needed to be updated.
author: Mohammad Akhlaghi <mohammad@akhlaghi.org> 2020-06-30 15:21:02 +0100
committer: Mohammad Akhlaghi <mohammad@akhlaghi.org> 2020-06-30 15:21:02 +0100
commit: 5b8bf4bcd0aee71b3d8ca76fca7dc7bd6b51713b (patch)
tree: 8f9b5e4626e12ecf3bdffc0547216cb131779766 /README.md
parent: da5678f7d23cc1142b9a1236389eb1f52e0a5678 (diff)
1 files changed, 124 insertions, 57 deletions
diff --git a/README.md b/README.md
index 3de9300..909ab8a 100644
--- a/README.md
+++ b/README.md
@@ -196,63 +196,74 @@ finally create the final paper).
 ### Building in Docker containers
 
 Docker containers are a common way to build projects in an almost
-independent filesystem and operating system. They also allow using a
-minimal GNU/Linux operating system for each project within proprietary
-operating systems like macOS or Windows. The steps below describe the
-necessary components of a `Dockerfile` to build this project in a Docker
-image with some explanations on each. You can just copy the code parts of
-each item into a plain-text file called `Dockerfile` and apply the
-necessary corrections in the copying phase (step 4), then run this command
-to build the Docker image (note that Docker only runs as root!):
+independent filesystem, and almost independent operating system without the
+overheads of a virtual machine. They also allow using a minimal GNU/Linux
+operating system for each project within proprietary operating systems like
+macOS or Windows. Furthermore they allow easy movement of built project
+from one system to another. Just please note that Docker images are large
+binary files (+1 Gigabytes) and may not be usable in the future. They are
+mainly good for temporary/testing phases of a project. Hence if you want to
+save and move your maneaged project as a Docker image, be sure to commit
+all your project's source files and push them to your external Git
+repository (you can do these within the Docker image as explained below).
+
+#### Constructing the Dockerfile for Maneaged project and building it
+
+Below is a series of recommendations on the various components of a
+`Dockerfile` optimised to store the *built state of a maneaged project* as
+a Docker image. Each component is also accompanied with
+explanations. Simply copy the code blocks under each item into a plain-text
+file in the same order and implement the corrections mentioned in each step
+(in particular step 4). Then save the plain-text file as `Dockerfile` and
+run the following command to build the Docker image. Just set a `NAME` for
+your project and note that Docker only runs as root.
 
 ```shell
-docker build ./
+docker build -t NAME ./
 ```
 
-**NOTE: Internet necessary for TeXLive:** With the commands below in your
-`Dockerfile`, you can disable the image's internet just after downloading
-the necessary packages (step 2). However, until [task
-15267](https://savannah.nongnu.org/task/?15267) is complete, the project
-will need internet access to download the necessary TeXLive packages (in
-the `./project configure` phase) to build the final PDF. Without TeXLive,
-the analysis will be exactly reproduced, LaTeX macros will be created and
-everything will be verified successfully (all in the build directory),
-however, no PDF will be built to visualize/combine them in one file.
+**NOTE: Internet necessary for TeXLive:** You can optionally disable the
+image's internet just after downloading the necessary packages (step
+2). However, until [task 15267](https://savannah.nongnu.org/task/?15267) is
+complete, the project will need internet access to download the necessary
+TeXLive packages in the `./project configure` phase. TeXLive is needed to
+build the final PDF. Without TeXLive, the analysis will be exactly
+reproduced, LaTeX macros will be created and everything will be verified
+successfully (all in the build directory). However, no PDF will be built to
+visualize/combine them in one easy-to-read file.
 
  1. **Choose the base operating system:** The first step is to select the
     operating system that will be used in the docker image. Note that your
     choice of operating system also determines the commands of the next
-    step.
+    step to install core software.
 
-    ```
+    ```shell
     FROM debian:stable-slim
     ```
 
- 2. **Necessary packages:** By default the "slim" versions of the operating
-    systems don't contain a compiler, so you need to use their package
-    managers to get them. Also, currently (until [task
-    15481](https://savannah.nongnu.org/task/?15481) is complete), Maneage
-    doesn't yet build Xorg libraries that are necessary in tools like
-    Ghostscript to build PDFs (not related to the project's analysis).
+ 2. **The C/C++ compiler:** By default the "slim" versions of the operating
+    systems don't contain a compiler, so you need to use the selected
+    operating system's package manager to include them. It is also
+    recommended to include your favorite text editor so you can modify the
+    project's source files if necessary.
 
-    ```
+    ```shell
     # C and C++ compiler.
     RUN apt-get update && apt-get install -y gcc g++
 
-    # Necessary Xorg libraries (which aren't yet installed, see task 15481).
-    RUN apt-get install -y libxext-dev libxt-dev libsm-dev libice-dev
-
-    # Uncomment this for a text editor (to modify files after image is built).
+    # Uncomment this to add a text editor (to modify files later).
     #RUN apt-get install -y nano
     ```
 
- 3. **Define a user:** Some packages will complain if you try to install
-    them as the default (root) container user. Generally, its also good
-    practice to avoid being the root user. After building the Docker image,
-    you can always run it as root with this command: `docker run -u 0 -it
-    XXXXXXX` (where `XXXXXXX` is the image identifier).
+ 3. **Define a user:** Some core software packages will complain if you try
+    to install them as the default (root) container user. Generally, it is
+    also good practice to avoid being the root user. After building the
+    Docker image, you can always run it as root with this command: `docker
+    run -u 0 -it XXXXXXX` (where `XXXXXXX` is the image identifier). With
+    the commands below we define a `maneager` user and activate it for the
+    next steps.
 
-    ```
+    ```shell
     RUN useradd -ms /bin/sh maneager
     USER maneager
     WORKDIR /home/maneager
@@ -263,37 +274,45 @@ however, no PDF will be built to visualize/combine them in one file.
 
     * The project's source is in the `maneaged-project/` subdirectory of
       the directory that you will run `docker build` in. The source can
-      either from Git or from a tarball, both described above (note that
-      arXiv's tarball needs to be corrected as mentioned above).
-
-    * (OPTIONAL, with internet) The project's software tarball (packaged in
-      `software-XXXX.tar.gz` and downloadable from the Zenodo link above,
-      just correct the `XXXX` part manually) is the same directory that you
-      will run `docker build` in. This is not mandatory: if you have
-      internet, the project will download its necessary software
-      automatically.
+      either be from cloned from Git (recommended!) or from a tarball. Both
+      are described above (note that arXiv's tarball needs to be corrected
+      as mentioned above).
+
+    * (OPTIONAL, with internet) By default the project's necessary software
+      source tarballs will be downloaded when necessary during the
+      `./project configure` phase. But if you already have the sources, its
+      better to use them and not waste network traffic (and resulting
+      carbon footprint!). Maneaged projects usually come with a
+      `software-XXXX.tar.gz` file that is published on Zenodo (link above).
+      If you have this file, you put it in the same directory as your
+      `Dockerfile` and include the relevant lines below.
 
     * (OPTIONAL, with internet) The project's input data. The `INPUT-FILES`
       depends on the project, please look into the project's
       `reproduce/analysis/config/INPUTS.conf` for the URLs and file
-      names. This is not mandatory: if you have internet, the project will
-      download its necessary software automatically.
+      names. Similar to the software source files, this is not mandatory:
+      if you have internet, the project will download its necessary
+      software automatically in the `./project make` phase.
 
-    ```
+    ```shell
+    # Make the project's build directory and copy the project source
     RUN mkdir build
     COPY --chown=maneager:maneager ./maneaged-project /home/maneager/source
 
-    # Optional (for software and data, if internet is available)
-    RUN mkdir data
-    COPY --chown=maneager:maneager ./INPUT-FILES /home/maneager/data
+    # Optional (for software)
     COPY --chown=maneager:maneager ./software-XXXX.tar.gz /home/maneager/
     RUN tar xf software-XXXX.tar.gz && mv software-XXXX software && rm software-XXXX.tar.gz
+
+    # Optional (for data)
+    RUN mkdir data
+    COPY --chown=maneager:maneager ./INPUT-FILES /home/maneager/data
     ```
 
- 5. **Configure the project:** The Docker image will configure the project
-    (let the project build all its necessary software).
+ 5. **Configure the project:** With this line, the Docker image will
+    configure the project (let the project build all its necessary
+    software). This will usually take about an hour on an 8-core system.
 
-    ```
+    ```shell
     RUN cd /home/maneager/source \
            && ./project configure --build-dir=/home/maneager/build \
                                   --software-dir=/home/maneager/software \
@@ -301,14 +320,62 @@ however, no PDF will be built to visualize/combine them in one file.
     ```
 
  6. **Do the project's analysis:** You are now ready to add the instruction
-    to automatically reproduce the project's analysis.
+    to automatically reproduce the project's analysis. The length of this
+    step and the storage/memory requirements highly depend on the
+    prarticular project.
 
-    ```
+    ```shell
     RUN cd /home/maneager/source && ./project make
     ```
 
+#### Interactive tests on built container
 
+If you later want to start a container with the built image and enter it in
+interactive mode (for example for temporary tests), please run the
+following command. Just replace `NAME` with the same name you specified
+when building the project. You can always exit the container with the
+`exit` command.
 
+```shell
+docker run -it NAME
+```
+
+#### Running your own project's shell for same analysis environment
+
+But the default operating system has minimal features. You can enter the
+maneaged project's source directory and use the project's environment to
+have the same environment as your running project (with easy access to all
+the software built in the project). For example the project builds Git
+within itself as well as many other tools that aren't present in the core
+operating system.
+
+```shell
+# Once you are in the docker container
+cd source
+./project shell
+```
+
+#### Preserving the state of a built container
+
+All changes you do in interactive mode will be deleted as soon as you exit
+the container. THIS IS A VERY GOOD FEATURE! In general, if you want to make
+persistant changes, you should do it in the project's plain-text source and
+commit it into your project's online Git repository. But you can also do
+this within the built container.
+
+If you want to preserve the state of your changes after your `exit`, you
+need to `commit` the container (and thus save it as a Docker "image"). To
+do this, while the container is still running, in another terminal, run
+these commands:
+
+```shell
+# These two commands should be done in another terminal
+docker container list
+
+# Get 'XXXXXXX' from the first column of output above.
+# Give the new image a name by replacing 'NEW-IMAGE-NAME'.
+docker commit XXXXXXX NEW-IMAGE-NAME
+```
 
 
 ### Copyright information
author	Mohammad Akhlaghi <mohammad@akhlaghi.org>	2020-06-30 15:21:02 +0100
committer	Mohammad Akhlaghi <mohammad@akhlaghi.org>	2020-06-30 15:21:02 +0100
commit	5b8bf4bcd0aee71b3d8ca76fca7dc7bd6b51713b (patch)
tree	8f9b5e4626e12ecf3bdffc0547216cb131779766 /README.md
parent	da5678f7d23cc1142b9a1236389eb1f52e0a5678 (diff)