Edited the Docker container explanations in README.md

The explanations are now more clear for someone that is less familiar with Docker.
author: Mohammad Akhlaghi <mohammad@akhlaghi.org> 2020-06-30 22:44:16 +0100
committer: Mohammad Akhlaghi <mohammad@akhlaghi.org> 2020-07-01 01:20:08 +0100
commit: 34406fda1c132d8bee85e7b0e94e76905a9b6553 (patch)
tree: 65782fdae3bdb19a3aff4ce4eef70e8daaa542d1 /README.md
parent: ede22ee5721690ac5782efdfe15dedbf9a300469 (diff)
1 files changed, 99 insertions, 79 deletions
diff --git a/README.md b/README.md
index 909ab8a..3b533b8 100644
--- a/README.md
+++ b/README.md
@@ -132,7 +132,7 @@ necessary built products are already included).
 2. No matter how you got the tarball, if you just want to build the PDF
    paper from the tarball, simply run the command below. Note that this
    won't actually install any software or do any analysis, it will just use
-   your host operating system to build the PDF and assums you already have
+   your host operating system to build the PDF and assumes you already have
    all the necessary LaTeX packages.
 
    ```shell
@@ -195,32 +195,33 @@ finally create the final paper).
 
 ### Building in Docker containers
 
-Docker containers are a common way to build projects in an almost
-independent filesystem, and almost independent operating system without the
-overheads of a virtual machine. They also allow using a minimal GNU/Linux
-operating system for each project within proprietary operating systems like
-macOS or Windows. Furthermore they allow easy movement of built project
-from one system to another. Just please note that Docker images are large
-binary files (+1 Gigabytes) and may not be usable in the future. They are
-mainly good for temporary/testing phases of a project. Hence if you want to
-save and move your maneaged project as a Docker image, be sure to commit
-all your project's source files and push them to your external Git
-repository (you can do these within the Docker image as explained below).
-
-#### Constructing the Dockerfile for Maneaged project and building it
+Docker containers are a common way to build projects in an independent
+filesystem, and an almost independent operating system. Containers thus
+allow using GNU/Linux operating systems within proprietary operating
+systems like macOS or Windows. But without the overhead and huge file size
+of virtual machines. Furthermore containers allow easy movement of built
+projects from one system to another without rebuilding. Just note that
+Docker images are large binary files (+1 Gigabytes) and may not be usable
+in the future (for example with new Docker versions not reading old
+images). Containers are thus good for temporary/testing phases of a
+project, but shouldn't be what you archive! Hence if you want to save and
+move your maneaged project within a Docker image, be sure to commit all
+your project's source files and push them to your external Git repository
+(you can do these within the Docker image as explained below). This way,
+you can always recreate the container with future technologies
+too. Generally, if you are developing within a container, its good practice
+to recreate it from scratch every once in a while, to make sure you haven't
+forgot to include parts of your work in your project's version-controlled
+source.
+
+#### Dockerfile for a Maneaged project, and building a Docker image
 
 Below is a series of recommendations on the various components of a
-`Dockerfile` optimised to store the *built state of a maneaged project* as
+`Dockerfile` optimized to store the *built state of a maneaged project* as
 a Docker image. Each component is also accompanied with
 explanations. Simply copy the code blocks under each item into a plain-text
-file in the same order and implement the corrections mentioned in each step
-(in particular step 4). Then save the plain-text file as `Dockerfile` and
-run the following command to build the Docker image. Just set a `NAME` for
-your project and note that Docker only runs as root.
-
-```shell
-docker build -t NAME ./
-```
+file called `Dockerfile`, in the same order of the items. Don't forget to
+implement the suggested corrections (in particular step 4).
 
 **NOTE: Internet necessary for TeXLive:** You can optionally disable the
 image's internet just after downloading the necessary packages (step
@@ -229,8 +230,8 @@ complete, the project will need internet access to download the necessary
 TeXLive packages in the `./project configure` phase. TeXLive is needed to
 build the final PDF. Without TeXLive, the analysis will be exactly
 reproduced, LaTeX macros will be created and everything will be verified
-successfully (all in the build directory). However, no PDF will be built to
-visualize/combine them in one easy-to-read file.
+successfully (all in the build directory). However, no final `paper.pdf`
+will be created to visualize/combine everything in one easy-to-read file.
 
  1. **Choose the base operating system:** The first step is to select the
     operating system that will be used in the docker image. Note that your
@@ -241,27 +242,32 @@ visualize/combine them in one easy-to-read file.
     FROM debian:stable-slim
     ```
 
- 2. **The C/C++ compiler:** By default the "slim" versions of the operating
-    systems don't contain a compiler, so you need to use the selected
-    operating system's package manager to include them. It is also
-    recommended to include your favorite text editor so you can modify the
-    project's source files if necessary.
+ 2. **Maneage dependencies:** By default the "slim" versions of the
+    operating systems don't contain a compiler, so you need to use the
+    selected operating system's package manager to import them. You can
+    optionally install two other programs: 1) To inspect/edit the project's
+    source files later, install your favorite text editor. 2) If you don't
+    have the project's software tarballs, and want the project to download
+    them automatically, you also need a downloader.
 
     ```shell
     # C and C++ compiler.
     RUN apt-get update && apt-get install -y gcc g++
 
-    # Uncomment this to add a text editor (to modify files later).
+    # Uncomment this to add a text editor (to modify source files later).
     #RUN apt-get install -y nano
+
+    # Uncomment this if you don't have 'software-XXXX.tar.gz'
+    #RUN apt-get install -y wget
     ```
 
  3. **Define a user:** Some core software packages will complain if you try
-    to install them as the default (root) container user. Generally, it is
-    also good practice to avoid being the root user. After building the
-    Docker image, you can always run it as root with this command: `docker
-    run -u 0 -it XXXXXXX` (where `XXXXXXX` is the image identifier). With
-    the commands below we define a `maneager` user and activate it for the
-    next steps.
+    to install them as the default (root) user. Generally, it is also good
+    practice to avoid being the root user. After building the Docker image,
+    you can always run it as root with this command: `docker run -u 0 -it
+    XXXXXXX` (where `XXXXXXX` is the image identifier). Hence with the
+    commands below we define a `maneager` user and activate it for the next
+    steps.
 
     ```shell
     RUN useradd -ms /bin/sh maneager
@@ -272,32 +278,32 @@ visualize/combine them in one easy-to-read file.
  4. **Copy project files into the container:** these commands make the
     following assumptions:
 
-    * The project's source is in the `maneaged-project/` subdirectory of
-      the directory that you will run `docker build` in. The source can
-      either be from cloned from Git (recommended!) or from a tarball. Both
-      are described above (note that arXiv's tarball needs to be corrected
-      as mentioned above).
-
-    * (OPTIONAL, with internet) By default the project's necessary software
-      source tarballs will be downloaded when necessary during the
-      `./project configure` phase. But if you already have the sources, its
-      better to use them and not waste network traffic (and resulting
-      carbon footprint!). Maneaged projects usually come with a
+    * The project's source is in the `maneaged/` sub-directory and this
+      directory is in the same directory as the `Dockerfile`. The source
+      can either be from cloned from Git (highly recommended!) or from a
+      tarball. Both are described above (note that arXiv's tarball needs to
+      be corrected as mentioned above).
+
+    * (OPTIONAL) By default the project's necessary software source
+      tarballs will be downloaded when necessary during the `./project
+      configure` phase. But if you already have the sources, its better to
+      use them and not waste network traffic (and resulting carbon
+      footprint!). Maneaged projects usually come with a
       `software-XXXX.tar.gz` file that is published on Zenodo (link above).
-      If you have this file, you put it in the same directory as your
+      If you have this file, put it in the same directory as your
       `Dockerfile` and include the relevant lines below.
 
-    * (OPTIONAL, with internet) The project's input data. The `INPUT-FILES`
-      depends on the project, please look into the project's
-      `reproduce/analysis/config/INPUTS.conf` for the URLs and file
-      names. Similar to the software source files, this is not mandatory:
-      if you have internet, the project will download its necessary
-      software automatically in the `./project make` phase.
+    * (OPTIONAL) The project's input data. The `INPUT-FILES` depends on the
+      project, please look into the project's
+      `reproduce/analysis/config/INPUTS.conf` for the URLs and the file
+      names of input data. Similar to the software source files mentioned
+      above, if you don't have them, the project will attempt to download
+      its necessary data automatically in the `./project make` phase.
 
     ```shell
     # Make the project's build directory and copy the project source
     RUN mkdir build
-    COPY --chown=maneager:maneager ./maneaged-project /home/maneager/source
+    COPY --chown=maneager:maneager ./maneaged /home/maneager/source
 
     # Optional (for software)
     COPY --chown=maneager:maneager ./software-XXXX.tar.gz /home/maneager/
@@ -309,32 +315,44 @@ visualize/combine them in one easy-to-read file.
     ```
 
  5. **Configure the project:** With this line, the Docker image will
-    configure the project (let the project build all its necessary
-    software). This will usually take about an hour on an 8-core system.
+    configure the project (build all its necessary software). This will
+    usually take about an hour on an 8-core system.
 
     ```shell
+    # Configure project (build full software environment).
     RUN cd /home/maneager/source \
            && ./project configure --build-dir=/home/maneager/build \
                                   --software-dir=/home/maneager/software \
                                   --input-dir=/home/maneager/data
     ```
 
- 6. **Do the project's analysis:** You are now ready to add the instruction
-    to automatically reproduce the project's analysis. The length of this
-    step and the storage/memory requirements highly depend on the
-    prarticular project.
+ 6. **Project's analysis:** With this line, the Docker image will do the
+    project's analysis and produce the final `paper.pdf`. The time it takes
+    for this step to finish, and the storage/memory requirements highly
+    depend on the particular project.
 
     ```shell
+    # Run the project's analysis
     RUN cd /home/maneager/source && ./project make
     ```
 
+ 7. **Build the Docker image:** The `Dockerfile` is now ready! In the
+    terminal, go to its directory and run the command below to build the
+    Docker image. Just set a `NAME` for your project and note that Docker
+    only runs as root.
+
+    ```shell
+    docker build -t NAME ./
+    ```
+
 #### Interactive tests on built container
 
 If you later want to start a container with the built image and enter it in
 interactive mode (for example for temporary tests), please run the
 following command. Just replace `NAME` with the same name you specified
 when building the project. You can always exit the container with the
-`exit` command.
+`exit` command (note that all your changes will be discarded once you exit,
+see below if you want to preserve your changes after you exit).
 
 ```shell
 docker run -it NAME
@@ -342,12 +360,13 @@ docker run -it NAME
 
 #### Running your own project's shell for same analysis environment
 
-But the default operating system has minimal features. You can enter the
-maneaged project's source directory and use the project's environment to
-have the same environment as your running project (with easy access to all
-the software built in the project). For example the project builds Git
-within itself as well as many other tools that aren't present in the core
-operating system.
+The default operating system only has minimal features: not having many of
+the tools you are accustomed to in your daily command-line operations. But
+your maneaged project has a very complete (for the project!) environment
+which is fully built and ready to use interactively with the commands
+below. For example the project also builds Git within itself, as well as
+many other high-level tools that are used in your project and aren't
+present in the container's operating system.
 
 ```shell
 # Once you are in the docker container
@@ -357,22 +376,23 @@ cd source
 
 #### Preserving the state of a built container
 
-All changes you do in interactive mode will be deleted as soon as you exit
-the container. THIS IS A VERY GOOD FEATURE! In general, if you want to make
-persistant changes, you should do it in the project's plain-text source and
-commit it into your project's online Git repository. But you can also do
-this within the built container.
+All interactive changes in a container will be deleted as soon as you exit
+it. THIS IS A VERY GOOD FEATURE IN GENERAL! If you want to make persistent
+changes, you should do it in the project's plain-text source and commit
+them into your project's online Git repository. As described in the Docker
+introduction above, we strongly recommend to **not rely on a built container
+for archival purposes**.
 
-If you want to preserve the state of your changes after your `exit`, you
-need to `commit` the container (and thus save it as a Docker "image"). To
-do this, while the container is still running, in another terminal, run
-these commands:
+But for temporary tests it is sometimes good to preserve the state of an
+interactive container. To do this, you need to `commit` the container (and
+thus save it as a Docker "image"). To do this, while the container is still
+running, open another terminal and run these commands:
 
 ```shell
 # These two commands should be done in another terminal
 docker container list
 
-# Get 'XXXXXXX' from the first column of output above.
+# Get 'XXXXXXX' of your desired container from the first column above.
 # Give the new image a name by replacing 'NEW-IMAGE-NAME'.
 docker commit XXXXXXX NEW-IMAGE-NAME
 ```
author	Mohammad Akhlaghi <mohammad@akhlaghi.org>	2020-06-30 22:44:16 +0100
committer	Mohammad Akhlaghi <mohammad@akhlaghi.org>	2020-07-01 01:20:08 +0100
commit	34406fda1c132d8bee85e7b0e94e76905a9b6553 (patch)
tree	65782fdae3bdb19a3aff4ce4eef70e8daaa542d1 /README.md
parent	ede22ee5721690ac5782efdfe15dedbf9a300469 (diff)