From d9a6855948fad17fa0fbc2017ab2be0238ca8b72 Mon Sep 17 00:00:00 2001 From: Mohammad Akhlaghi Date: Sat, 9 Jan 2021 01:34:15 +0000 Subject: IMPORTANT: analysis outputs written in BDIR/analysis Until now, the build directory contained a 'software/' directory (that hosted all the built software), a 'tex/' subdirectory for the final building of the paper, and many other directories containing intermediate/final data of the specific project. But this mixing of built software and data is against our modularity and minimal complexity principles: built software and built data are separate things and keeping them separate will enable many optimizations. With this commit, the build directory of the core Maneage branch will only contain two sub-directories: 'software/' and 'analysis/'. The 'software/' directory has the same contents as before and is not touched in this commit. However, the 'analysis/' directory is new and everything created in the './project make' phase of the project will be created inside of this directory. To facilitate easy access to these top-level built directories, two new variables are defined at the top of 'initialize.mk': 'badir', which is short for "built-analysis directory" and 'bsdir', which is short for "built-software directory". HOW TO IMPLEMENT THIS CHANGE IN YOUR PROJECT. It is easy: simply replace all occurances of '$(BDIR)' in your project's subMakefiles (except the ones below) to '$(badir)'. To confirm if everything is fine before building your project from scratch after merging, you can run the following command to see where 'BDIR' is used and confirm the only remaning cases. $ grep -r BDIR reproduce/analysis/* --> make/verify.mk: innobdir=$$(echo $$infile | sed -e's|$(BDIR)/||g'); \ --> make/initialize.mk:badir=$(BDIR)/analysis --> make/initialize.mk:bsdir=$(BDIR)/software --> make/initialize.mk: $$sys_rm -rf $(BDIR) --> make/top-prepare.mk:all: $(BDIR)/software/preparation-done.mk 'BDIR' should only be present in lines of the files above. If you see '$(BDIR)' used anywhere else, simply change it to '$(badir)'. Ofcourse, if your project assumes BDIR in other contexts, feel free to keep it, it will not conflict. If anything un-expected happens, please post a comment on the link below (you need to be registered on Savannah to post a comment): https://savannah.nongnu.org/task/?15855 One consequence of this change is that the 'analysis/' subdirectory can be optionally mounted on a separate partition. The need for this actually came up for some new users of Maneage in a Docker image. Docker can fix portability problems on systems that we haven't yet supported (even Windows!), or had a chance to fix low-level issues on. However, Docker doesn't have a GUI interface. So to see the built PDF or intermediate data, it was necessary to copy the built data to the host system after every change, which is annoying during working on a project. It would also need two copies of the source: one in the host, one in the container. All these frustrations can be fixed with this new feature. To describe this scenario, README.md now has a new section titled "Only software environment in the Docker image". It explains step-by-step how you can make a Docker image to only host the built software environment. While your project's source, software tarballs and 'BDIR/analysis' directories are on your host operating system. It has been tested before this commit and works very nicely. --- README.md | 145 +++++++++++++++++++++++++++++++--- reproduce/analysis/make/initialize.mk | 53 ++++++++----- reproduce/analysis/make/prepare.mk | 2 +- reproduce/software/make/basic.mk | 2 +- reproduce/software/make/high-level.mk | 2 +- reproduce/software/shell/configure.sh | 73 +++++++++-------- 6 files changed, 211 insertions(+), 66 deletions(-) diff --git a/README.md b/README.md index 98ba390..07f900f 100644 --- a/README.md +++ b/README.md @@ -199,15 +199,18 @@ projects from one system to another without rebuilding. Just note that Docker images are large binary files (+1 Gigabytes) and may not be usable in the future (for example with new Docker versions not reading old images). Containers are thus good for temporary/testing phases of a -project, but shouldn't be what you archive! Hence if you want to save and -move your maneaged project within a Docker image, be sure to commit all -your project's source files and push them to your external Git repository -(you can do these within the Docker image as explained below). This way, -you can always recreate the container with future technologies -too. Generally, if you are developing within a container, its good practice -to recreate it from scratch every once in a while, to make sure you haven't -forgot to include parts of your work in your project's version-controlled -source. +project, but shouldn't be what you archive for the long term! + +Hence if you want to save and move your maneaged project within a Docker +image, be sure to commit all your project's source files and push them to +your external Git repository (you can do these within the Docker image as +explained below). This way, you can always recreate the container with +future technologies too. Generally, if you are developing within a +container, its good practice to recreate it from scratch every once in a +while, to make sure you haven't forgot to include parts of your work in +your project's version-controlled source. In the sections below we also +describe how you can use the container **only for the software +environment** and keep your data and project source on your host. #### Dockerfile for a Maneaged project, and building a Docker image @@ -240,8 +243,11 @@ MB), not the full TeXLive collection! items). Note that the last two `COPY` lines (to copy the directory containing software tarballs used by the project and the possible input databases) are optional because they will be downloaded if not - available. Once you build the Docker image, your project's environment - is setup and you can go into it to run `./project make` manually. + available. You can also avoid copying over all, and simply mount your + host directories within the image, we have a separate section on doing + this below ("Only software environment in the Docker image"). Once you + build the Docker image, your project's environment is setup and you can + go into it to run `./project make` manually. ```shell FROM debian:stable-slim @@ -300,7 +306,10 @@ MB), not the full TeXLive collection! ``` 4. **Copy project files into the container:** these commands make the - following assumptions: + assumptions listed below. IMPORTANT: you can also avoid copying over + all, and simply mount your host directories within the image, we have a + separate section on doing this below ("Only software environment in the + Docker image"). * The project's source is in the `maneaged/` sub-directory and this directory is in the same directory as the `Dockerfile`. The source @@ -377,6 +386,8 @@ MB), not the full TeXLive collection! docker build -t NAME ./ ``` + + #### Interactive tests on built container If you later want to start a container with the built image and enter it in @@ -390,6 +401,8 @@ see below if you want to preserve your changes after you exit). docker run -it NAME ``` + + #### Running your own project's shell for same analysis environment The default operating system only has minimal features: not having many of @@ -406,6 +419,8 @@ cd source ./project shell ``` + + #### Preserving the state of a built container All interactive changes in a container will be deleted as soon as you exit @@ -429,6 +444,8 @@ docker container list docker commit XXXXXXX NEW-IMAGE-NAME ``` + + #### Copying files from the Docker image to host operating system The Docker environment's file system is completely indepenent of your host @@ -440,6 +457,110 @@ command). docker cp CONTAINER:/file/path/within/container /host/path/target ``` + + +#### Only software environment in the Docker image + +You can set the docker image to only contain the software environment and +keep the project source and built analysis files (data and PDF) on your +host operating system. This enables you to keep the size of the Docker +image to a minimum (only containing the built software environment) to +easily move it from one computer to another. Below we'll summarize the +steps. + +1. Get your user ID with this command: `id -u`. + +2. Put the following lines into a `Dockerfile` of an otherwise empty +directory. Just replacing `UID` with your user ID (found in the step +above). This will build the basic directory structure. for the next steps. + +```shell +FROM debian:stable-slim +RUN apt-get update && apt-get install -y gcc g++ wget +RUN useradd -ms /bin/sh --uid UID maneager +USER maneager +WORKDIR /home/maneager +RUN mkdir build +``` + +3. Create an image based on the `Dockerfile` above. Just replace `PROJECT` +with your desired name. + +```shell +docker build -t PROJECT ./ +``` + +4. Run the command below to create a container based on the image and mount +the desired directories on your host into the special directories of your +container. Just don't forget to replace `PROJECT` and set the `/PATH`s to +the respective paths in your host operating system. + +```shell +docker run -v /PATH/TO/PROJECT/SOURCE:/home/maneager/source \ + -v /PATH/TO/PROJECT/ANALYSIS/OUTPUTS:/home/maneager/build/analysis \ + -v /PATH/TO/SOFTWARE/SOURCE/CODE/DIR:/home/maneager/software \ + -v /PATH/TO/RAW/INPUT/DATA:/home/maneager/data \ + -it PROJECT +``` + +5. After running the command above, you are within the container. Go into +the project source directory and run these commands to build the software +environment. + +```shell +cd /home/maneager/source +./project configure --build-dir=/home/maneager/build \ + --software-dir=/home/maneager/software \ + --input-dir=/home/maneager/data +``` + +6. After the configuration finishes successfully, it will say so and ask +you to run `./project make`. But don't do that yet. Keep this Docker +container open and don't exit the container or terminal. Open a new +terminal, and follow the steps described in the sub-section above to +preserve the built container as a Docker image. Let's assume you call it +`PROJECT-ENV`. After the new image is made, you should be able to see the +new image in the list of images with this command (in the same terminal +that you created the image): + +```shell +docker image list # In the other terminal. +``` + +7. Now you can run `./project make` in the initial container. You will see +that all the built products (temporary or final datasets or PDFs), will be +written in the `/PATH/TO/PROJECT/ANALYSIS/OUTPUTS` directory of your +host. You can even change the source of your project on your host operating +system an re-run Make to see the effect on the outputs and add/commit the +changes to your Git history within your host. You can also exit the +container any time. You can later load the `PROJECT-ENV` environment image +into a new container with the same `docker run -v ...` command above, just +use `PROJECT-ENV` instead of `PROJECT`. + +8. In case you want to store the image as a single file as backup or to +move to another computer, you can run the commands below. They will produce +a single `project-env.tar.gz` file. + +```shell +docker save -o project-env.tar PROJECT-ENV +gzip --best project-env.tar +``` + +9. To load the tarball above into a clean docker environment (either on the +same system or in another system), and create a new container from the +image like above (the `docker run -v ...` command). Just don't forget that +if your `/PATH/TO/PROJECT/ANALYSIS/OUTPUTS` directory is empty on the +new/clean system, you should first run `./project configure -e` in the +docker image so it builds the core file structure there. Don't worry, it +won't build any software and should finish in a second or two. Afterwards, +you can safely run `./project make`. + +```shell +docker load --input project-env.tar.gz +``` + + + #### Deleting all Docker images After doing your tests/work, you may no longer need the multi-gigabyte diff --git a/reproduce/analysis/make/initialize.mk b/reproduce/analysis/make/initialize.mk index a5d5b92..3b1ffe5 100644 --- a/reproduce/analysis/make/initialize.mk +++ b/reproduce/analysis/make/initialize.mk @@ -30,14 +30,24 @@ # parallel. Also, some programs may not be thread-safe, therefore it will # be necessary to put a lock on them. This project uses the `flock' program # to achieve this. -texdir = $(BDIR)/tex -lockdir = $(BDIR)/locks -indir = $(BDIR)/inputs -prepdir = $(BDIR)/prepare +# +# To help with modularity and clarity of the build directory (not mixing +# software-environment built-products with products built by the analysis), +# it is recommended to put all your analysis outputs in the 'analysis' +# subdirectory of the top-level build directory. +badir=$(BDIR)/analysis +bsdir=$(BDIR)/software + +# Derived directories (the locks directory can be shared with software +# which already has this directory.). +texdir = $(badir)/tex +lockdir = $(bsdir)/locks +indir = $(badir)/inputs +prepdir = $(padir)/prepare mtexdir = $(texdir)/macros +installdir = $(bsdir)/installed bashdir = reproduce/analysis/bash pconfdir = reproduce/analysis/config -installdir = $(BDIR)/software/installed @@ -56,7 +66,7 @@ installdir = $(BDIR)/software/installed ifeq (x$(project-phase),xprepare) $(prepdir):; mkdir $@ else -include $(BDIR)/software/preparation-done.mk +include $(bsdir)/preparation-done.mk ifeq (x$(include-prepare-results),xyes) include $(prepdir)/*.mk endif @@ -193,7 +203,7 @@ export MPI_PYTHON3_SITEARCH := # option: they add too many extra checks that make it hard to find what you # are looking for in the outputs. .SUFFIXES: -$(lockdir): | $(BDIR); mkdir $@ +$(lockdir): | $(bsdir); mkdir $@ @@ -228,8 +238,8 @@ clean-mmap:; rm -f reproduce/config/gnuastro/mmap* texclean: rm *.pdf - rm -rf $(BDIR)/tex/build/* - mkdir $(BDIR)/tex/build/tikz # 'tikz' is assumed to already exist. + rm -rf $(texdir)/build/* + mkdir $(texdir)/build/tikz # 'tikz' is assumed to already exist. clean: clean-mmap # Delete the top-level PDF file. @@ -241,10 +251,10 @@ clean: clean-mmap # features like ignoring the listing of a file with `!()' that we # are using afterwards. shopt -s extglob - rm -rf $(BDIR)/tex/macros/!(dependencies.tex|dependencies-bib.tex|hardware-parameters.tex) - rm -rf $(BDIR)/!(software|tex) $(BDIR)/tex/!(macros|$(texbtopdir)) - rm -rf $(BDIR)/tex/build/!(tikz) $(BDIR)/tex/build/tikz/* - rm -rf $(BDIR)/software/preparation-done.mk + rm -rf $(texdir)/macros/!(dependencies.tex|dependencies-bib.tex|hardware-parameters.tex) + rm -rf $(badir)/!(tex) $(texdir)/!(macros|$(texbtopdir)) + rm -rf $(texdir)/build/!(tikz) $(texdir)/build/tikz/* + rm -rf $(bsdir)/preparation-done.mk distclean: clean # Without cleaning the Git hooks, we won't be able to easily @@ -403,14 +413,15 @@ dist-zip: $(project-package-contents) dist-software: curdir=$$(pwd) dirname=software-$(project-commit-hash) - cd $(BDIR) + cd $(bsdir) + if [ -d $$dirname ]; then rm -rf $$dirname; fi mkdir $$dirname - cp -L software/tarballs/* $$dirname/ + cp -L tarballs/* $$dirname/ tar -cf $$dirname.tar $$dirname gzip -f --best $$dirname.tar rm -rf $$dirname cd $$curdir - mv $(BDIR)/$$dirname.tar.gz ./ + mv $(bsdir)/$$dirname.tar.gz ./ @@ -427,9 +438,11 @@ dist-software: # # 1. Those data that also go into LaTeX (for example to give to LateX's # PGFPlots package to create the plot internally) should be under the -# '$(BDIR)/tex' directory (because other LaTeX producers may also need -# it for example when using './project make dist'). The contents of -# this directory are directly taken into the tarball. +# '$(texdir)' directory (because other LaTeX producers may also need it +# for example when using './project make dist', or you may want to +# publish the raw data behind the plots, like: +# https://zenodo.org/record/4291207/files/tools-per-year.txt). The +# contents of this directory are also directly taken into the tarball. # # 2. The data that aren't included directly in the LaTeX run of the paper, # can be seen as supplements. A good place to keep them is under your @@ -441,7 +454,7 @@ dist-software: # (or paper's tex/appendix), you will put links to the dataset on servers # like Zenodo (see the "Publication checklist" in 'README-hacking.md'). tex-publish-dir = $(texdir)/to-publish -data-publish-dir = $(BDIR)/data-to-publish +data-publish-dir = $(badir)/data-to-publish $(tex-publish-dir):; mkdir $@ $(data-publish-dir):; mkdir $@ diff --git a/reproduce/analysis/make/prepare.mk b/reproduce/analysis/make/prepare.mk index 995132c..d0b61d9 100644 --- a/reproduce/analysis/make/prepare.mk +++ b/reproduce/analysis/make/prepare.mk @@ -23,7 +23,7 @@ # # Without this file, `./project make' won't work. prepare-dep = $(subst prepare, ,$(makesrc)) -$(BDIR)/software/preparation-done.mk: \ +$(bsdir)/preparation-done.mk: \ $(foreach s, $(prepare-dep), $(mtexdir)/$(s).tex) # If you need to add preparations define targets above to do the diff --git a/reproduce/software/make/basic.mk b/reproduce/software/make/basic.mk index 58ebdb2..9217ee9 100644 --- a/reproduce/software/make/basic.mk +++ b/reproduce/software/make/basic.mk @@ -48,7 +48,7 @@ include reproduce/software/config/checksums.conf include reproduce/software/config/urls.conf # Basic directories -lockdir = $(BDIR)/locks +lockdir = $(BDIR)/software/locks tdir = $(BDIR)/software/tarballs ddir = $(BDIR)/software/build-tmp idir = $(BDIR)/software/installed diff --git a/reproduce/software/make/high-level.mk b/reproduce/software/make/high-level.mk index 948b23a..d69722e 100644 --- a/reproduce/software/make/high-level.mk +++ b/reproduce/software/make/high-level.mk @@ -43,7 +43,7 @@ include reproduce/software/config/TARGETS.conf include reproduce/software/config/texlive-packages.conf # Basic directories (similar to 'basic.mk'). -lockdir = $(BDIR)/locks +lockdir = $(BDIR)/software/locks tdir = $(BDIR)/software/tarballs ddir = $(BDIR)/software/build-tmp idir = $(BDIR)/software/installed diff --git a/reproduce/software/shell/configure.sh b/reproduce/software/shell/configure.sh index 24e8409..812f3d3 100755 --- a/reproduce/software/shell/configure.sh +++ b/reproduce/software/shell/configure.sh @@ -44,8 +44,8 @@ need_gfortran=0 -# Internal directories -# -------------------- +# Internal source directories +# --------------------------- # # These are defined to help make this script more readable. topdir="$(pwd)" @@ -679,14 +679,14 @@ EOF fi # Then, see if the Fortran compiler works - testsource=$compilertestdir/test.f + testsourcef=$compilertestdir/test.f echo; echo; echo "Checking host Fortran compiler..."; - echo " PRINT *, \"... Fortran Compiler works.\"" > $testsource - echo " END" >> $testsource - if gfortran $testsource -o$testprog && $testprog; then - rm $testsource $testprog + echo " PRINT *, \"... Fortran Compiler works.\"" > $testsourcef + echo " END" >> $testsourcef + if gfortran $testsourcef -o$testprog && $testprog; then + rm $testsourcef $testprog else - rm $testsource + rm $testsourcef cat <8GB) is large enough for the parallel building of the software. +# +# For the name of the directory under `/dev/shm' (for this project), we'll +# use the names of the two parent directories to the current/running +# directory, separated by a `-' instead of `/'. We'll then appended that +# with the user's name (in case multiple users may be working on similar +# project names). Maybe later, we can use something like `mktemp' to add +# random characters to this name and make it unique to every run (even for +# a single user). +tmpblddir="$sdir"/build-tmp +rm -rf "$tmpblddir"/* "$tmpblddir" # If its a link, we need to empty its + # contents first, then itself. + + + + + +# Project's top-level built analysis directories +# ---------------------------------------------- + +# Top-level built analysis directories. +badir="$bdir"/analysis +if ! [ -d "$badir" ]; then mkdir "$badir"; fi + # Top-level LaTeX. -texdir="$bdir"/tex +texdir="$badir"/tex if ! [ -d "$texdir" ]; then mkdir "$texdir"; fi # LaTeX macros. mtexdir="$texdir"/macros if ! [ -d "$mtexdir" ]; then mkdir "$mtexdir"; fi - # TeX build directory. If built in a group scenario, the TeX build # directory must be separate for each member (so they can work on their # relevant parts of the paper without conflicting with each other). @@ -1224,7 +1250,6 @@ if ! [ -d "$texbdir" ]; then mkdir "$texbdir"; fi tikzdir="$texbdir"/tikz if ! [ -d "$tikzdir" ]; then mkdir "$tikzdir"; fi - # If 'tex/build' and 'tex/tikz' are symbolic links then 'rm -f' will delete # them and we can continue. However, when the project is being built from # the tarball, these two are not symbolic links but actual directories with @@ -1239,7 +1264,6 @@ else mv tex/build tex/build-from-tarball fi - # Set the symbolic links for easy access to the top project build # directories. Note that these are put in each user's source/cloned # directory, not in the build directory (which can be shared between many @@ -1247,7 +1271,9 @@ fi # # Note: if we don't delete them first, it can happen that an extra link # will be created in each directory that points to its parent. So to be -# safe, we are deleting all the links on each re-configure of the project. +# safe, we are deleting all the links on each re-configure of the +# project. Note that at this stage, we are using the host's 'ln', not our +# own, so its best not to assume anything (like 'ln -sf'). rm -f .build .local ln -s "$bdir" .build @@ -1260,21 +1286,6 @@ rm -f .gnuastro # ------------------------------------------ -# Temporary software un-packing/build directory: if the host has the -# standard `/dev/shm' mounting-point, we'll do it in shared memory (on the -# RAM), to avoid harming/over-using the HDDs/SSDs. The RAM of most systems -# today (>8GB) is large enough for the parallel building of the software. -# -# For the name of the directory under `/dev/shm' (for this project), we'll -# use the names of the two parent directories to the current/running -# directory, separated by a `-' instead of `/'. We'll then appended that -# with the user's name (in case multiple users may be working on similar -# project names). Maybe later, we can use something like `mktemp' to add -# random characters to this name and make it unique to every run (even for -# a single user). -tmpblddir="$sdir"/build-tmp -rm -rf "$tmpblddir"/* "$tmpblddir" # If its a link, we need to empty its - # contents first, then itself. # Set the top-level shared memory location. if [ -d /dev/shm ]; then shmdir=/dev/shm @@ -1300,7 +1311,7 @@ fi # symbolic link to it. Otherwise, just build the temporary build # directory under the project build directory. if [ x"$tbshmdir" = x ]; then mkdir "$tmpblddir"; -else ln -s "$tbshmdir" "$tmpblddir"; +else ln -s "$tbshmdir" "$tmpblddir"; fi -- cgit v1.2.1