diff options
-rw-r--r-- | README.md | 145 | ||||
-rw-r--r-- | reproduce/analysis/make/initialize.mk | 53 | ||||
-rw-r--r-- | reproduce/analysis/make/prepare.mk | 2 | ||||
-rw-r--r-- | reproduce/software/make/basic.mk | 2 | ||||
-rw-r--r-- | reproduce/software/make/high-level.mk | 2 | ||||
-rwxr-xr-x | reproduce/software/shell/configure.sh | 73 |
6 files changed, 211 insertions, 66 deletions
@@ -199,15 +199,18 @@ projects from one system to another without rebuilding. Just note that Docker images are large binary files (+1 Gigabytes) and may not be usable in the future (for example with new Docker versions not reading old images). Containers are thus good for temporary/testing phases of a -project, but shouldn't be what you archive! Hence if you want to save and -move your maneaged project within a Docker image, be sure to commit all -your project's source files and push them to your external Git repository -(you can do these within the Docker image as explained below). This way, -you can always recreate the container with future technologies -too. Generally, if you are developing within a container, its good practice -to recreate it from scratch every once in a while, to make sure you haven't -forgot to include parts of your work in your project's version-controlled -source. +project, but shouldn't be what you archive for the long term! + +Hence if you want to save and move your maneaged project within a Docker +image, be sure to commit all your project's source files and push them to +your external Git repository (you can do these within the Docker image as +explained below). This way, you can always recreate the container with +future technologies too. Generally, if you are developing within a +container, its good practice to recreate it from scratch every once in a +while, to make sure you haven't forgot to include parts of your work in +your project's version-controlled source. In the sections below we also +describe how you can use the container **only for the software +environment** and keep your data and project source on your host. #### Dockerfile for a Maneaged project, and building a Docker image @@ -240,8 +243,11 @@ MB), not the full TeXLive collection! items). Note that the last two `COPY` lines (to copy the directory containing software tarballs used by the project and the possible input databases) are optional because they will be downloaded if not - available. Once you build the Docker image, your project's environment - is setup and you can go into it to run `./project make` manually. + available. You can also avoid copying over all, and simply mount your + host directories within the image, we have a separate section on doing + this below ("Only software environment in the Docker image"). Once you + build the Docker image, your project's environment is setup and you can + go into it to run `./project make` manually. ```shell FROM debian:stable-slim @@ -300,7 +306,10 @@ MB), not the full TeXLive collection! ``` 4. **Copy project files into the container:** these commands make the - following assumptions: + assumptions listed below. IMPORTANT: you can also avoid copying over + all, and simply mount your host directories within the image, we have a + separate section on doing this below ("Only software environment in the + Docker image"). * The project's source is in the `maneaged/` sub-directory and this directory is in the same directory as the `Dockerfile`. The source @@ -377,6 +386,8 @@ MB), not the full TeXLive collection! docker build -t NAME ./ ``` + + #### Interactive tests on built container If you later want to start a container with the built image and enter it in @@ -390,6 +401,8 @@ see below if you want to preserve your changes after you exit). docker run -it NAME ``` + + #### Running your own project's shell for same analysis environment The default operating system only has minimal features: not having many of @@ -406,6 +419,8 @@ cd source ./project shell ``` + + #### Preserving the state of a built container All interactive changes in a container will be deleted as soon as you exit @@ -429,6 +444,8 @@ docker container list docker commit XXXXXXX NEW-IMAGE-NAME ``` + + #### Copying files from the Docker image to host operating system The Docker environment's file system is completely indepenent of your host @@ -440,6 +457,110 @@ command). docker cp CONTAINER:/file/path/within/container /host/path/target ``` + + +#### Only software environment in the Docker image + +You can set the docker image to only contain the software environment and +keep the project source and built analysis files (data and PDF) on your +host operating system. This enables you to keep the size of the Docker +image to a minimum (only containing the built software environment) to +easily move it from one computer to another. Below we'll summarize the +steps. + +1. Get your user ID with this command: `id -u`. + +2. Put the following lines into a `Dockerfile` of an otherwise empty +directory. Just replacing `UID` with your user ID (found in the step +above). This will build the basic directory structure. for the next steps. + +```shell +FROM debian:stable-slim +RUN apt-get update && apt-get install -y gcc g++ wget +RUN useradd -ms /bin/sh --uid UID maneager +USER maneager +WORKDIR /home/maneager +RUN mkdir build +``` + +3. Create an image based on the `Dockerfile` above. Just replace `PROJECT` +with your desired name. + +```shell +docker build -t PROJECT ./ +``` + +4. Run the command below to create a container based on the image and mount +the desired directories on your host into the special directories of your +container. Just don't forget to replace `PROJECT` and set the `/PATH`s to +the respective paths in your host operating system. + +```shell +docker run -v /PATH/TO/PROJECT/SOURCE:/home/maneager/source \ + -v /PATH/TO/PROJECT/ANALYSIS/OUTPUTS:/home/maneager/build/analysis \ + -v /PATH/TO/SOFTWARE/SOURCE/CODE/DIR:/home/maneager/software \ + -v /PATH/TO/RAW/INPUT/DATA:/home/maneager/data \ + -it PROJECT +``` + +5. After running the command above, you are within the container. Go into +the project source directory and run these commands to build the software +environment. + +```shell +cd /home/maneager/source +./project configure --build-dir=/home/maneager/build \ + --software-dir=/home/maneager/software \ + --input-dir=/home/maneager/data +``` + +6. After the configuration finishes successfully, it will say so and ask +you to run `./project make`. But don't do that yet. Keep this Docker +container open and don't exit the container or terminal. Open a new +terminal, and follow the steps described in the sub-section above to +preserve the built container as a Docker image. Let's assume you call it +`PROJECT-ENV`. After the new image is made, you should be able to see the +new image in the list of images with this command (in the same terminal +that you created the image): + +```shell +docker image list # In the other terminal. +``` + +7. Now you can run `./project make` in the initial container. You will see +that all the built products (temporary or final datasets or PDFs), will be +written in the `/PATH/TO/PROJECT/ANALYSIS/OUTPUTS` directory of your +host. You can even change the source of your project on your host operating +system an re-run Make to see the effect on the outputs and add/commit the +changes to your Git history within your host. You can also exit the +container any time. You can later load the `PROJECT-ENV` environment image +into a new container with the same `docker run -v ...` command above, just +use `PROJECT-ENV` instead of `PROJECT`. + +8. In case you want to store the image as a single file as backup or to +move to another computer, you can run the commands below. They will produce +a single `project-env.tar.gz` file. + +```shell +docker save -o project-env.tar PROJECT-ENV +gzip --best project-env.tar +``` + +9. To load the tarball above into a clean docker environment (either on the +same system or in another system), and create a new container from the +image like above (the `docker run -v ...` command). Just don't forget that +if your `/PATH/TO/PROJECT/ANALYSIS/OUTPUTS` directory is empty on the +new/clean system, you should first run `./project configure -e` in the +docker image so it builds the core file structure there. Don't worry, it +won't build any software and should finish in a second or two. Afterwards, +you can safely run `./project make`. + +```shell +docker load --input project-env.tar.gz +``` + + + #### Deleting all Docker images After doing your tests/work, you may no longer need the multi-gigabyte diff --git a/reproduce/analysis/make/initialize.mk b/reproduce/analysis/make/initialize.mk index a5d5b92..3b1ffe5 100644 --- a/reproduce/analysis/make/initialize.mk +++ b/reproduce/analysis/make/initialize.mk @@ -30,14 +30,24 @@ # parallel. Also, some programs may not be thread-safe, therefore it will # be necessary to put a lock on them. This project uses the `flock' program # to achieve this. -texdir = $(BDIR)/tex -lockdir = $(BDIR)/locks -indir = $(BDIR)/inputs -prepdir = $(BDIR)/prepare +# +# To help with modularity and clarity of the build directory (not mixing +# software-environment built-products with products built by the analysis), +# it is recommended to put all your analysis outputs in the 'analysis' +# subdirectory of the top-level build directory. +badir=$(BDIR)/analysis +bsdir=$(BDIR)/software + +# Derived directories (the locks directory can be shared with software +# which already has this directory.). +texdir = $(badir)/tex +lockdir = $(bsdir)/locks +indir = $(badir)/inputs +prepdir = $(padir)/prepare mtexdir = $(texdir)/macros +installdir = $(bsdir)/installed bashdir = reproduce/analysis/bash pconfdir = reproduce/analysis/config -installdir = $(BDIR)/software/installed @@ -56,7 +66,7 @@ installdir = $(BDIR)/software/installed ifeq (x$(project-phase),xprepare) $(prepdir):; mkdir $@ else -include $(BDIR)/software/preparation-done.mk +include $(bsdir)/preparation-done.mk ifeq (x$(include-prepare-results),xyes) include $(prepdir)/*.mk endif @@ -193,7 +203,7 @@ export MPI_PYTHON3_SITEARCH := # option: they add too many extra checks that make it hard to find what you # are looking for in the outputs. .SUFFIXES: -$(lockdir): | $(BDIR); mkdir $@ +$(lockdir): | $(bsdir); mkdir $@ @@ -228,8 +238,8 @@ clean-mmap:; rm -f reproduce/config/gnuastro/mmap* texclean: rm *.pdf - rm -rf $(BDIR)/tex/build/* - mkdir $(BDIR)/tex/build/tikz # 'tikz' is assumed to already exist. + rm -rf $(texdir)/build/* + mkdir $(texdir)/build/tikz # 'tikz' is assumed to already exist. clean: clean-mmap # Delete the top-level PDF file. @@ -241,10 +251,10 @@ clean: clean-mmap # features like ignoring the listing of a file with `!()' that we # are using afterwards. shopt -s extglob - rm -rf $(BDIR)/tex/macros/!(dependencies.tex|dependencies-bib.tex|hardware-parameters.tex) - rm -rf $(BDIR)/!(software|tex) $(BDIR)/tex/!(macros|$(texbtopdir)) - rm -rf $(BDIR)/tex/build/!(tikz) $(BDIR)/tex/build/tikz/* - rm -rf $(BDIR)/software/preparation-done.mk + rm -rf $(texdir)/macros/!(dependencies.tex|dependencies-bib.tex|hardware-parameters.tex) + rm -rf $(badir)/!(tex) $(texdir)/!(macros|$(texbtopdir)) + rm -rf $(texdir)/build/!(tikz) $(texdir)/build/tikz/* + rm -rf $(bsdir)/preparation-done.mk distclean: clean # Without cleaning the Git hooks, we won't be able to easily @@ -403,14 +413,15 @@ dist-zip: $(project-package-contents) dist-software: curdir=$$(pwd) dirname=software-$(project-commit-hash) - cd $(BDIR) + cd $(bsdir) + if [ -d $$dirname ]; then rm -rf $$dirname; fi mkdir $$dirname - cp -L software/tarballs/* $$dirname/ + cp -L tarballs/* $$dirname/ tar -cf $$dirname.tar $$dirname gzip -f --best $$dirname.tar rm -rf $$dirname cd $$curdir - mv $(BDIR)/$$dirname.tar.gz ./ + mv $(bsdir)/$$dirname.tar.gz ./ @@ -427,9 +438,11 @@ dist-software: # # 1. Those data that also go into LaTeX (for example to give to LateX's # PGFPlots package to create the plot internally) should be under the -# '$(BDIR)/tex' directory (because other LaTeX producers may also need -# it for example when using './project make dist'). The contents of -# this directory are directly taken into the tarball. +# '$(texdir)' directory (because other LaTeX producers may also need it +# for example when using './project make dist', or you may want to +# publish the raw data behind the plots, like: +# https://zenodo.org/record/4291207/files/tools-per-year.txt). The +# contents of this directory are also directly taken into the tarball. # # 2. The data that aren't included directly in the LaTeX run of the paper, # can be seen as supplements. A good place to keep them is under your @@ -441,7 +454,7 @@ dist-software: # (or paper's tex/appendix), you will put links to the dataset on servers # like Zenodo (see the "Publication checklist" in 'README-hacking.md'). tex-publish-dir = $(texdir)/to-publish -data-publish-dir = $(BDIR)/data-to-publish +data-publish-dir = $(badir)/data-to-publish $(tex-publish-dir):; mkdir $@ $(data-publish-dir):; mkdir $@ diff --git a/reproduce/analysis/make/prepare.mk b/reproduce/analysis/make/prepare.mk index 995132c..d0b61d9 100644 --- a/reproduce/analysis/make/prepare.mk +++ b/reproduce/analysis/make/prepare.mk @@ -23,7 +23,7 @@ # # Without this file, `./project make' won't work. prepare-dep = $(subst prepare, ,$(makesrc)) -$(BDIR)/software/preparation-done.mk: \ +$(bsdir)/preparation-done.mk: \ $(foreach s, $(prepare-dep), $(mtexdir)/$(s).tex) # If you need to add preparations define targets above to do the diff --git a/reproduce/software/make/basic.mk b/reproduce/software/make/basic.mk index 58ebdb2..9217ee9 100644 --- a/reproduce/software/make/basic.mk +++ b/reproduce/software/make/basic.mk @@ -48,7 +48,7 @@ include reproduce/software/config/checksums.conf include reproduce/software/config/urls.conf # Basic directories -lockdir = $(BDIR)/locks +lockdir = $(BDIR)/software/locks tdir = $(BDIR)/software/tarballs ddir = $(BDIR)/software/build-tmp idir = $(BDIR)/software/installed diff --git a/reproduce/software/make/high-level.mk b/reproduce/software/make/high-level.mk index 948b23a..d69722e 100644 --- a/reproduce/software/make/high-level.mk +++ b/reproduce/software/make/high-level.mk @@ -43,7 +43,7 @@ include reproduce/software/config/TARGETS.conf include reproduce/software/config/texlive-packages.conf # Basic directories (similar to 'basic.mk'). -lockdir = $(BDIR)/locks +lockdir = $(BDIR)/software/locks tdir = $(BDIR)/software/tarballs ddir = $(BDIR)/software/build-tmp idir = $(BDIR)/software/installed diff --git a/reproduce/software/shell/configure.sh b/reproduce/software/shell/configure.sh index 24e8409..812f3d3 100755 --- a/reproduce/software/shell/configure.sh +++ b/reproduce/software/shell/configure.sh @@ -44,8 +44,8 @@ need_gfortran=0 -# Internal directories -# -------------------- +# Internal source directories +# --------------------------- # # These are defined to help make this script more readable. topdir="$(pwd)" @@ -679,14 +679,14 @@ EOF fi # Then, see if the Fortran compiler works - testsource=$compilertestdir/test.f + testsourcef=$compilertestdir/test.f echo; echo; echo "Checking host Fortran compiler..."; - echo " PRINT *, \"... Fortran Compiler works.\"" > $testsource - echo " END" >> $testsource - if gfortran $testsource -o$testprog && $testprog; then - rm $testsource $testprog + echo " PRINT *, \"... Fortran Compiler works.\"" > $testsourcef + echo " END" >> $testsourcef + if gfortran $testsourcef -o$testprog && $testprog; then + rm $testsourcef $testprog else - rm $testsource + rm $testsourcef cat <<EOF ______________________________________________________ @@ -1165,8 +1165,8 @@ rm -f "$finaltarget" -# Project's top-level directories -# ------------------------------- +# Project's top-level built software directories +# ---------------------------------------------- # # These directories are possibly needed by many steps of process, so to # avoid too many directory dependencies throughout the software and @@ -1200,15 +1200,41 @@ if ! [ -d "$ictdir" ]; then mkdir "$ictdir"; fi itidir="$verdir"/tex if ! [ -d "$itidir" ]; then mkdir "$itidir"; fi +# Temporary software un-packing/build directory: if the host has the +# standard `/dev/shm' mounting-point, we'll do it in shared memory (on the +# RAM), to avoid harming/over-using the HDDs/SSDs. The RAM of most systems +# today (>8GB) is large enough for the parallel building of the software. +# +# For the name of the directory under `/dev/shm' (for this project), we'll +# use the names of the two parent directories to the current/running +# directory, separated by a `-' instead of `/'. We'll then appended that +# with the user's name (in case multiple users may be working on similar +# project names). Maybe later, we can use something like `mktemp' to add +# random characters to this name and make it unique to every run (even for +# a single user). +tmpblddir="$sdir"/build-tmp +rm -rf "$tmpblddir"/* "$tmpblddir" # If its a link, we need to empty its + # contents first, then itself. + + + + + +# Project's top-level built analysis directories +# ---------------------------------------------- + +# Top-level built analysis directories. +badir="$bdir"/analysis +if ! [ -d "$badir" ]; then mkdir "$badir"; fi + # Top-level LaTeX. -texdir="$bdir"/tex +texdir="$badir"/tex if ! [ -d "$texdir" ]; then mkdir "$texdir"; fi # LaTeX macros. mtexdir="$texdir"/macros if ! [ -d "$mtexdir" ]; then mkdir "$mtexdir"; fi - # TeX build directory. If built in a group scenario, the TeX build # directory must be separate for each member (so they can work on their # relevant parts of the paper without conflicting with each other). @@ -1224,7 +1250,6 @@ if ! [ -d "$texbdir" ]; then mkdir "$texbdir"; fi tikzdir="$texbdir"/tikz if ! [ -d "$tikzdir" ]; then mkdir "$tikzdir"; fi - # If 'tex/build' and 'tex/tikz' are symbolic links then 'rm -f' will delete # them and we can continue. However, when the project is being built from # the tarball, these two are not symbolic links but actual directories with @@ -1239,7 +1264,6 @@ else mv tex/build tex/build-from-tarball fi - # Set the symbolic links for easy access to the top project build # directories. Note that these are put in each user's source/cloned # directory, not in the build directory (which can be shared between many @@ -1247,7 +1271,9 @@ fi # # Note: if we don't delete them first, it can happen that an extra link # will be created in each directory that points to its parent. So to be -# safe, we are deleting all the links on each re-configure of the project. +# safe, we are deleting all the links on each re-configure of the +# project. Note that at this stage, we are using the host's 'ln', not our +# own, so its best not to assume anything (like 'ln -sf'). rm -f .build .local ln -s "$bdir" .build @@ -1260,21 +1286,6 @@ rm -f .gnuastro # ------------------------------------------ -# Temporary software un-packing/build directory: if the host has the -# standard `/dev/shm' mounting-point, we'll do it in shared memory (on the -# RAM), to avoid harming/over-using the HDDs/SSDs. The RAM of most systems -# today (>8GB) is large enough for the parallel building of the software. -# -# For the name of the directory under `/dev/shm' (for this project), we'll -# use the names of the two parent directories to the current/running -# directory, separated by a `-' instead of `/'. We'll then appended that -# with the user's name (in case multiple users may be working on similar -# project names). Maybe later, we can use something like `mktemp' to add -# random characters to this name and make it unique to every run (even for -# a single user). -tmpblddir="$sdir"/build-tmp -rm -rf "$tmpblddir"/* "$tmpblddir" # If its a link, we need to empty its - # contents first, then itself. # Set the top-level shared memory location. if [ -d /dev/shm ]; then shmdir=/dev/shm @@ -1300,7 +1311,7 @@ fi # symbolic link to it. Otherwise, just build the temporary build # directory under the project build directory. if [ x"$tbshmdir" = x ]; then mkdir "$tmpblddir"; -else ln -s "$tbshmdir" "$tmpblddir"; +else ln -s "$tbshmdir" "$tmpblddir"; fi |