From 42d3cef11bd9a84d11eb48a4ff9686d2e0ce5436 Mon Sep 17 00:00:00 2001 From: Mohammad Akhlaghi Date: Fri, 18 Jan 2019 20:36:11 +0000 Subject: Sanity check to run the Make with proper group permissions If the `./for-group' script is not used properly, it can lead to the whole pipeline being re-run. Therefore it is important to do a sanity check immediately at the start of Make's processing and inform the user if there is a problem. With this commit, `./for-group' exports the `reproducible_paper_for_group' variable which is used by both the initial `./configure' script, and later in each call to Make. The `./configure' script will use it to write a value in `reproduce/config/pipeline/LOCAL.mk' and Make will use it to compare with the value in `reproduce/config/pipeline/LOCAL.mk'. If there is an inconsistency, Make will not even attempt to build anything and will just print a message and abort. --- README-pipeline.md | 146 ++++++++++++++++++++-------------- configure | 40 ++++++++++ for-group | 16 +++- reproduce/config/pipeline/LOCAL.mk.in | 24 ++++++ reproduce/src/make/top.mk | 72 ++++++++++++++--- 5 files changed, 225 insertions(+), 73 deletions(-) diff --git a/README-pipeline.md b/README-pipeline.md index f98a7ee..2d288da 100644 --- a/README-pipeline.md +++ b/README-pipeline.md @@ -245,50 +245,51 @@ pipeline (described in `README.md`: first run `./configure`, then `.local/bin/make -j8`) without any change, just to see how it works. In order to obtain a reproducible result it is important to have an -identical environment (for example same versions the programs that it will -use). This also has the added advantage that in your separate research -projects, you can use different versions of a single software and they -won't interfere. Therefore, the pipeline builds its own dependencies during -the `./configure` step. Building of the dependencies is managed by +identical environment (for example same versions of the programs that it +will use). Therefore, the pipeline builds its own dependencies during the +`./configure` step. Building of the dependencies is managed by `reproduce/src/make/dependencies-basic.mk` and `reproduce/src/make/dependencies.mk`. These Makefiles are called by the -`./configure` script. The first is intended for downloading and building -the most basic tools like GNU Bash, GNU Make, and GNU Tar. Therefore it -must only contain very basic and portable Make and shell features. The -second is called after the first, thus enabling usage of the modern and -advanced features of GNU Bash and GNU Make, similar to the rest of the -pipeline. Later, if you add a new program/library for your research, you -will need to include a rule on how to download and build it (in -`reproduce/src/make/dependencies.mk`). - -After configuring, the `.local/bin/make` command will start the processing -with the custom version of Make that was locally installed during -configuration. The first file that is read is the top-level -`Makefile`. Therefore, we'll start our navigation/discussion with this -file. This file is relatively short and heavily commented so hopefully the -descriptions in each comment will be enough to understand the general +`./configure` script and not used afterwards. The first is intended for +downloading and building the most basic tools like GNU Bash, GNU Make, and +GNU Tar. Therefore it must only contain very basic and portable Make and +shell features. The second is called after the first, thus enabling usage +of the modern and advanced features of GNU Bash and GNU Make, similar to +the rest of the pipeline. Later, if you add a new program/library for your +research, you will need to include a rule on how to download and build it +(in `reproduce/src/make/dependencies.mk`). + +After it finishes, `./configure` will create a `Makefile` in the top +directory (a symbolic link to `reproduce/src/make/top.mk`) and a `.local` +directory (a link for easy access to the custom built software +packages). The `.local/bin/make` command will then use our custom version +of GNU Make to do the analysis. The first file that is read by Make is the +top-level `Makefile`. Therefore, we'll start our navigation/discussion with +this file. This file is relatively short and heavily commented so hopefully +the descriptions in each comment will be enough to understand the general details. As you read this section, please also look at the contents of the mentioned files and directories to fully understand what is going on. Before starting to look into the top `Makefile`, it is important to recall -that Make defines dependencies by files. Therefore, the input and output of -every step must be a file. Also recall that Make will use the modification -date of the prerequisite and target files to see if the target must be -re-built or not. Therefore during the processing, _many_ intermediate files -will be created (see the tips section below on a good strategy to deal with -large/huge files). - -To keep the source and (intermediate) built files separate, at -configuration time, the user _must_ define a top-level build directory -variable (or `$(BDIR)`) to host all the intermediate files. This directory +that Make defines dependencies by files. Therefore, the input/prerequisite +and output of every step/rule must be a file. Also recall that Make will +use the modification date of the prerequisite and target files to see if +the target must be re-built or not. Therefore during the processing, _many_ +intermediate files will be created (see the tips section below on a good +strategy to deal with large/huge files). + +To keep the source and (intermediate) built files separate, you _must_ +define a top-level build directory variable (or `$(BDIR)`) to host all the +intermediate files (it was defined in `./configure`). This directory doesn't need to be version controlled or even synchronized, or backed-up in other servers: its contents are all products of the pipeline, and can be easily re-created any time. As you define targets for your new rules, it is thus important to place them all under sub-directories of `$(BDIR)`. In this architecture, we have two types of Makefiles that are loaded into -one: _configuration-Makefiles_ (only independent variables/configurations) -and _workhorse-Makefiles_ (Makefiles that actually contain rules). +the top `Makefile`: _configuration-Makefiles_ (only independent +variables/configurations) and _workhorse-Makefiles_ (Makefiles that +actually contain rules). The configuration-Makefiles are those that satisfy this wildcard: `reproduce/config/pipeline/*.mk`. These Makefiles don't actually have any @@ -297,41 +298,68 @@ analysis/processing. Open a few of them to see for your self. These Makefiles must only contain raw Make variables (pipeline configurations). By raw we mean that the Make variables in these files must not depend on variables in any other configuration-Makefile. This is -because we don't want to assume any order in reading them. It is very -important to *not* define any rule or other Make construct in any of these -configuration-Makefiles. This will enable you to set the respective -Makefiles in this directory as a prerequisite to any target that depends on -their variable values. Therefore, if you change any of their values, all -targets that depend on those values will be re-built. - -The workhorse-Makefiles are those within the `reproduce/src/make` -directory. They contain the details of the processing steps (Makefiles -containing rules). But in this phase *order is important*, because the -prerequisites of most rules will be the targets of other rules that will be -defined prior to them (not a fixed name like `paper.pdf`). The lower-level -rules must be imported into Make before the higher-level ones. Hence, we -can't use a simple wildcard like when we imported configuration-Makefiles -above. +because we don't want to assume any order in reading them. It is also very +important to *not* define any rule, or other Make construct in any of these +configuration-Makefiles. + +These conditions will enable you to set these configure-Makefiles as a +prerequisite to any target that depends on their variable +values. Therefore, if you change any of their values, all targets that +depend on those values will be re-built. This is very convenient as your +project scales up and gets more complex. + +The workhorse-Makefiles are those satisfying this wildcard +`reproduce/src/make/*.mk'. They contain the details of the processing steps +(Makefiles containing rules). Therefore, in this phase *order is +important*, because the prerequisites of most rules will be the targets of +other rules that will be defined prior to them (not a fixed name like +`paper.pdf`). The lower-level rules must be imported into Make before the +higher-level ones. All processing steps are assumed to ultimately (usually after many rules) end up in some number, image, figure, or table that are to be included in the paper. The writing of these results into the final report/paper is managed through separate LaTeX files that only contain macros (a name given to a number/string to be used in the LaTeX source, which will be replaced -when compiling it to the final PDF). So usually the last target in a +when compiling it to the final PDF). So the last target in a workhorse-Makefile is a `.tex` file (with the same base-name as the Makefile, but in `$(BDIR)/tex/macros`). As a result, if the targets in a workhorse-Makefile aren't directly a prerequisite of other -workhorse-Makefile targets, they should be a pre-requisite of that -intermediate LaTeX macro file. Otherwise, they will be ignored by Make. +workhorse-Makefile targets, they can be a pre-requisite of that +intermediate LaTeX macro file and thus be called when necessary. Otherwise, +they will be ignored by Make. + +This pipeline also has a mode to share the build directory between several +users of a Unix group (when working on large computer clusters). In this +scenario, each user can have their own cloned pipeline source, but share +the large built files between each other. To do this, it is necessary for +all built files to give full permission to group members while not allowing +any other users access to the contents. Therefore the `./configure` and +Make steps must be called with special conditions which are managed in the +`for-group` file. Let's see how this design is implemented. When the `./configure` finishes, it makes a `Makefile` in the top directory. This Makefile is just a symbolic link to `reproduce/src/make/top.mk`. Please open and inspect it as -we go along here. The first step (un-commented line) defines the ultimate -target (`paper.pdf`). You shouldn't modify this line. The rule to build -`paper.pdf` is in `reproduce/src/make/paper.mk` that will be imported into -this top Makefile later. +we go along here. The first step (un-commented line) is to import the local +configuration (answers to the questions `./configure` asked you). They are +defined in the configuration-Makefile `reproduce/config/pipeline/LOCAL.mk` +which was also built by `./configure` (based on the `LOCAL.mk.in` +template). + +The next non-commented set of lines define the ultimate target of the whole +pipeline (`paper.pdf'). But a sanity check is necessary for situations when +the user is not careful (for example has configured the pipeline for group +access but forgets to run the pipeline with `./for-group`, or the +opposite). Therefore we use a Make conditional to define the `all` target +based on the group permissions being consistent between the initial +configuration and the current run. + +If there is a problem `all` will not depend on anything and will just print +a warning to inform you of the problem. When the group conditions are fine, +`all` will depend on `paper.pdf` (which is defined in +`reproduce/src/make/paper.mk` and will be imported into this top Makefile +later). Having defined the top target, our next step is to include all the other necessary Makefiles. But order matters in the importing of @@ -339,10 +367,12 @@ workhorse-Makefiles and each must also have a TeX macro file with the same base name (without a suffix). Therefore, the next step in the top-level Makefile is to define a `makesrc` variable to keep the base names (without a `.mk` suffix) of the workhorse-Makefiles that must be imported, in the -proper order. Having defined `makesrc`, in the next step, we'll just import -all the configuration-Makefiles with a wildcard and all workhorse-Makefiles -using a Make `foreach` loop to preserve the order. This finishes the -general view of the pipeline's implementation. +proper order. + +Finally, we'll just import all the configuration-Makefiles with a wildcard +(while ignoring `LOCAL.mk` that was imported before). Also, all +workhorse-Makefiles are imported in the proper order using a Make `foreach` +loop. This finishes the general view of the pipeline's implementation. In short, to keep things modular, readable and managable, follow these recommendations: 1) Set clear-to-understand names for the diff --git a/configure b/configure index c2ab5e7..df87435 100755 --- a/configure +++ b/configure @@ -162,6 +162,34 @@ fi +# Make sure the group permissions satisfy the previous configuration (if it +# exists and we don't want to re-write it). +if [ $rewritepconfig = no ]; then + oldforgroup=$(awk '/FOR-GROUP/ && c==0 {c=1; print $3}' $pconf) + if [ "x$oldforgroup" = xyes ]; then + if [ "x$reproducible_paper_for_group" = x ]; then + echo "-----------------------------" + echo "!!!!!!!! ERROR !!!!!!!!" + echo "-----------------------------" + echo "Previous pipeline was configured for groups." + echo "Either enable re-write, or use './for-group'." + exit 1 + fi + else + if [ "x$reproducible_paper_for_group" = xyes ]; then + echo "-----------------------------" + echo "!!!!!!!! ERROR !!!!!!!!" + echo "-----------------------------" + echo "Previous pipeline was not configured for groups." + echo "Either enable re-write, or don't use './for-group'." + exit 1 + fi + fi +fi + + + + # Identify the downloader tool # ---------------------------- @@ -376,11 +404,23 @@ fi # Write the parameters into the local configuration file. if [ $rewritepconfig = yes ]; then + + # Make the pipeline configuration's initial comments. create_file_with_notice $pconf + + # Fix the group settings. + if [ "x$reproducible_paper_for_group" = xyes ]; then + for_group=yes + else + for_group=no + fi + + # Write the values. sed -e's|@bdir[@]|'"$bdir"'|' \ -e's|@indir[@]|'"$indir"'|' \ -e's|@ddir[@]|'"$ddir"'|' \ -e's|@downloader[@]|'"$downloader"'|' \ + -e's|@forgroup[@]|'"$for_group"'|' \ $pconf.in >> $pconf else # Read the values from existing configuration file. diff --git a/for-group b/for-group index 66eebf4..7484a09 100755 --- a/for-group +++ b/for-group @@ -40,7 +40,7 @@ # Desired group -thisgroup=ourgroup +thisgroup=YOUR-GROUP @@ -65,5 +65,15 @@ else echo "$0: argument must be 'configure' or 'make'" exit 1 fi -echo; echo $script; echo -sg $thisgroup "umask g+w && $script" + + + + + +# Define the group, and set the permission so the user and group both have +# read and write permissions. Then run the respective script. +# +# We are also exporting a special variable so `./configure' and Make can +# prepare for sanity checks and avoid re-doing the whole analysis with a +# typo (not using this script properly after configuration). +sg $thisgroup "umask u+r,u+w,g+r,g+w,o-r,o-w,o-x && export reproducible_paper_for_group=yes && $script" diff --git a/reproduce/config/pipeline/LOCAL.mk.in b/reproduce/config/pipeline/LOCAL.mk.in index 89e3e23..846a5b8 100644 --- a/reproduce/config/pipeline/LOCAL.mk.in +++ b/reproduce/config/pipeline/LOCAL.mk.in @@ -6,3 +6,27 @@ BDIR = @bdir@ INDIR = @indir@ DEPENDENCIES-DIR = @ddir@ DOWNLOADER = @downloader@ +FOR-GROUP = @forgroup@ + + + + + +# In the top Makefile (which is created after running `./configure' and is +# actually a symbolic link to `reproduce/src/make/top.mk'), we need to +# start by checking if there is no conflict with the running and configured +# group configuration of the pipeline. +good-group-configuration := $(shell \ + if [ "x$(FOR-GROUP)" = xyes ]; then \ + if [ "x$(reproducible_paper_for_group)" = xyes ]; then \ + echo "yes"; \ + else \ + echo "no"; \ + fi; \ + else \ + if [ "x$(reproducible_paper_for_group)" = xyes ]; then \ + echo "no"; \ + else \ + echo "yes"; \ + fi; \ + fi) diff --git a/reproduce/src/make/top.mk b/reproduce/src/make/top.mk index 5d4c210..25c4f0b 100644 --- a/reproduce/src/make/top.mk +++ b/reproduce/src/make/top.mk @@ -23,17 +23,59 @@ +# Load the local configuration (created after running `./configure'). +include reproduce/config/pipeline/LOCAL.mk + + + + + # Ultimate target of this pipeline # -------------------------------- # -# The final paper (in PDF format) is the main target of this whole -# reproduction pipeline. So as defined in the Make paradigm, we are -# defining it here. +# The final paper/report (`paper.pdf') is the main target of this whole +# reproduction pipeline. So as defined in the Make paradigm, it is the +# first target that we define (immediately after loading the local +# configuration settings, necessary for a group building scenario mentioned +# next). +# +# Group build +# ----------- # -# Note that if you don't have LaTeX to build the PDF, or generally are just -# interested in the processing, you can skip create the final PDF creation -# with `pdf-build-final' of `reproduce/config/pipeline/pdf-build.mk'. +# This pipeline can also be configured to have a shared build directory +# between multiple users. In this scenario, many users (on a server) can +# have their own/separate version controlled pipeline source of the +# pipeline, but share the same build outputs (in a common directory). This +# will allow a group to work separately, on parallel parts of the analysis. +# It is thus very useful in cases were special storage requirements or CPU +# power is necessary and its not possible/efficient for each user to have a +# fully separate copy of the build directory. +# +# `FOR-GROUP': from `LOCAL.mk' (which was built by `./configure'). +# `reproducible_paper_for_group': from the `./for-group' script. +# +# The final paper is only built when both have a value of `yes', or when +# `FOR-GROUP' is no and `./for-group' wasn't called (if `./for-group' is +# called before `make', then `reproducible_paper_for_group==yes'). +# +# Only processing, no LaTeX PDF +# ----------------------------- +# +# If you are just interested in the processing and don't want to build the +# PDF, you can skip the creatation of the final PDF by removing the value +# of `pdf-build-final' in `reproduce/config/pipeline/pdf-build.mk'. +ifeq ($(good-group-configuration),yes) all: paper.pdf +else +all: + @if [ "x$(reproducible_paper_for_group)" = xyes ]; then \ + echo "Pipeline is NOT configured for groups, please run"; \ + echo " $$ .local/bin/make"; \ + else \ + echo "Pipeline is configured for groups, please run"; \ + echo " $$ ./for-group make"; \ + fi +endif @@ -77,11 +119,17 @@ makesrc = initialize \ -# Include necessary Makefiles -# --------------------------- +# Include all Makefiles +# --------------------- +# +# We have two classes of Makefiles, separated by context and their location: +# +# 1) First, we'll include all the configuration-Makefiles. These +# Makefiles only define variables with no rules or order. We just +# won't include `LOCAL.mk' because it has already been included +# above. # -# First, we'll include all the configuration-Makefiles (only defining -# variables with no rules or order), then the workhorse Makefiles which -# contain rules and order matters for them. -include reproduce/config/pipeline/*.mk +# 2) Then, we'll import the workhorse-Makefiles which contain rules to +# actually do the processing of this pipeline. +include $(filter-out %LOCAL.mk, reproduce/config/pipeline/*.mk) include $(foreach s,$(makesrc), reproduce/src/make/$(s).mk) -- cgit v1.2.1