From 7caa2845304c40540a336f840b3ca468bf6c8697 Mon Sep 17 00:00:00 2001 From: Mohammad Akhlaghi Date: Tue, 1 Oct 2019 16:17:59 +0100 Subject: Preparation phase added before final building In many real-world scenarios, `./project make' can really benefit from having some basic information about the data before being run. For example when quering a server. If we know how many datasets were downloaded and their general properties, it can greatly optmize the process when we are designing the solution to be run in `./project make'. Therefore with this commit, a new phase has been added to the template's design: `./project prepare'. In the raw template this is empty, because the simple analysis done in the template doesn't warrant it. But everything is ready for projects using the template to add preparation phases prior to the analysis. --- .file-metadata | Bin 6250 -> 6573 bytes README-hacking.md | 88 ++++++++++++--------- README.md | 23 ++++-- project | 128 +++++++++++++++++++++++-------- reproduce/analysis/make/prepare.mk | 35 +++++++++ reproduce/analysis/make/top-make.mk | 136 +++++++++++++++++++++++++++++++++ reproduce/analysis/make/top-prepare.mk | 91 ++++++++++++++++++++++ reproduce/analysis/make/top.mk | 136 --------------------------------- reproduce/software/bash/configure.sh | 6 +- 9 files changed, 433 insertions(+), 210 deletions(-) create mode 100644 reproduce/analysis/make/prepare.mk create mode 100644 reproduce/analysis/make/top-make.mk create mode 100644 reproduce/analysis/make/top-prepare.mk delete mode 100644 reproduce/analysis/make/top.mk diff --git a/.file-metadata b/.file-metadata index b9fb074..f77bb41 100644 Binary files a/.file-metadata and b/.file-metadata differ diff --git a/README-hacking.md b/README-hacking.md index 30065c2..338f03a 100644 --- a/README-hacking.md +++ b/README-hacking.md @@ -245,11 +245,11 @@ In order to customize this template to your research, it is important to first understand its architecture so you can navigate your way in the directories and understand how to implement your research project within its framework: where to add new files and which existing files to modify -for what purpose. But before reading this theoretical discussion, please -run the template (described in `README.md`: first run `./project -configure`, then `./project make -j8`) without any change, just to see how -it works (note that the configure step builds all necessary software, so it -can take long, but you can read along while its working). +for what purpose. But if this the first time you are using this template, +before reading this theoretical discussion, please run the template once +from scratch without any chages (described in `README.md`). You will see +how it works (note that the configure step builds all necessary software, +so it can take long, but you can continue reading while its working). The project has two top-level directories: `reproduce` and `tex`. `reproduce` hosts all the software building and analysis @@ -266,28 +266,44 @@ do your project's analysis. After it finishes, `./project configure` will create the following symbolic links in the project's top source directory: `.build` which points to the top build directory and `.local` for easy access to the custom built -software installation directory. - -Once the project is configured for your system, `./project make` will doing -the project's analysis with its own custom version of software. The process -is managed through Make and `./project make` will start with -`reproduce/analysis/make/top.mk` (called `top.mk` from now on). - -Let's continue the template's architecture with this file. `top.mk` is -relatively short and heavily commented so hopefully the descriptions in -each comment will be enough to understand the general details. As you read -this section, please also look at the contents of the mentioned files and -directories to fully understand what is going on. - -Before starting to look into the top `Makefile`, it is important to recall -that Make defines dependencies by files. Therefore, the input/prerequisite -and output of every step/rule must be a file. Also recall that Make will -use the modification date of the prerequisite(s) and target files to see if -the target must be re-built or not. Therefore during the processing, _many_ -intermediate files will be created (see the tips section below on a good -strategy to deal with large/huge files). - -To keep the source and (intermediate) built files separate, you _must_ +software installation directory. With these you can easily access the build +directory and project-specific software from your top source directory. For +example if you run `.local/bin/ls` you will be using the `ls` of the +template, which is problably different from your system's `ls` (run them +both with `--version` to check). + +Once the project is configured for your system, `./project prepare` and +`./project make` will do the basic preparations and run the project's +analysis with the custom version of software. The `project` script is just +a wrapper, and with the commands above, it will call `top-prepare.mk` and +`top-make.mk` (both are in the `reproduce/analysis/make` directory). + +In the template, no particular preparation is necessary, so it will +immediately finish and instruct you to run `./project make`. But in some +projects, it can be very useful to do some very basic preparatory steps on +the input data that can greatly optimize running of `./project make`. For +example, you may need to query a server, to find how many input files there +are. Once that number is known in the preparation phase, `./project make` +can parallelize the analysis much more effectively. + +In terms of organization, `top-prepare.mk` and `top-make.mk` have an +identical design, only a minor difference. So, let's continue the +template's architecture with `top-make.mk`. Once you understand that, +you'll clearly understand `top-prepare.mk` also. These very high-level +files are relatively short and heavily commented so hopefully the +descriptions in each comment will be enough to understand the general +details. As you read this section, please also look at the contents of the +mentioned files and directories to fully understand what is going on. + +Before starting to look into the top `top-make.mk`, it is important to +recall that Make defines dependencies by files. Therefore, the +input/prerequisite and output of every step/rule must be a file. Also +recall that Make will use the modification date of the prerequisite(s) and +target files to see if the target must be re-built or not. Therefore during +the processing, _many_ intermediate files will be created (see the tips +section below on a good strategy to deal with large/huge files). + +To keep the source and (intermediate) built files separate, the user _must_ define a top-level build directory variable (or `$(BDIR)`) to host all the intermediate files (you defined it during `./project configure`). This directory doesn't need to be version controlled or even synchronized, or @@ -295,7 +311,9 @@ backed-up in other servers: its contents are all products, and can be easily re-created any time. As you define targets for your new rules, it is thus important to place them all under sub-directories of `$(BDIR)`. As mentioned above, you always have fast access to this "build"-directory with -the `.build` symbolic link. +the `.build` symbolic link. Also, beware to *never* make any manual change +in the files of the build-directory, just delete them (so they are +re-built). In this architecture, we have two types of Makefiles that are loaded into the top `Makefile`: _configuration-Makefiles_ (only independent @@ -350,10 +368,10 @@ other users access to the contents. Therefore the `./project configure` and `./project make` steps must be called with special conditions which are managed in the `--group` option. -Let's see how this design is implemented. Please open and inspect `top.mk` -it as we go along here. The first step (un-commented line) is to import the -local configuration (your answers to the questions of `./project -configure`). They are defined in the configuration-Makefile +Let's see how this design is implemented. Please open and inspect +`top-make.mk` it as we go along here. The first step (un-commented line) is +to import the local configuration (your answers to the questions of +`./project configure`). They are defined in the configuration-Makefile `reproduce/software/config/installation/LOCAL.mk` which was also built by `./project configure` (based on the `LOCAL.mk.in` template of the same directory). @@ -607,9 +625,9 @@ First custom commit grants. Since you are using it in your work, it is necessary to acknowledge them in your work also. - - `reproduce/analysis/make/top.mk`: Delete the `delete-me` line in the - `makesrc` definition. Just make sure there is no empty line between - the `download \` and `paper` lines. + - `reproduce/analysis/make/top-make.mk`: Delete the `delete-me` line + in the `makesrc` definition. Just make sure there is no empty line + between the `download \` and `paper` lines. - Delete all `delete-me*` files in the following directories: diff --git a/README.md b/README.md index f0f6acc..7b319aa 100644 --- a/README.md +++ b/README.md @@ -21,6 +21,7 @@ received this source from arXiv, please see the respective section below. $ git clone XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX $ cd XXXXXXXXXXXXXXXXXX $ ./project configure +$ ./project prepare $ ./project make ``` @@ -76,11 +77,23 @@ requiring root/administrator permissions. $ ./project configure ``` -3. Run the following command (local build of the Make software) to - reproduce all the analysis and build the final `paper.pdf` on `8` - threads. If your CPU has a different number of threads, change the - number (you can see the number of threads available to your operating - system by running `./.local/bin/nproc`) +3. In some cases, the project's analysis may need some preparations to + optimize its processing. This is usually mainly related to input data, + and some very basic calculations that can help the management of the + overall lproject in the main/next step. To do the basic preparations, + please run this command to do the preparation on `8` threads. If your + CPU has a different number of threads, change the number (you can see + the number of threads available to your operating system by running + `./.local/bin/nproc`) + + ```shell + $ ./project prepare -j8 + ``` + +4. Run the following command to reproduce all the analysis and build the + final `paper.pdf` on `8` threads. If your CPU has a different number of + threads, change the number (you can see the number of threads available + to your operating system by running `./.local/bin/nproc`) ```shell $ ./project make -j8 diff --git a/project b/project index 14fc272..fcf32fd 100755 --- a/project +++ b/project @@ -65,12 +65,14 @@ print_help() { # Print the output. cat < +# +# This Makefile is free software: you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the +# Free Software Foundation, either version 3 of the License, or (at your +# option) any later version. +# +# This Makefile is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General +# Public License for more details. See . + + + + + +# Final-target +# +# Without this file, `./project make' won't work. +$(BDIR)/software/preparation-done.txt: + + # If you need to add preparations define targets above to do the + # preparations. Recall that before this file, `top-prepare.mk' + # loads `initialize.mk' and `download.mk', so you can safely assume + # everything that is defined there in this Makefile. + # + # TIP: the targets can actually be automatically generated + # Makefiles that are used by `./project make'. They can include + # variables, or actual rules. Just make sure that those Makefiles + # aren't written in the source directory! Even though they are + # Makefiles, they are automatically built, so they should be + # somewhere under $(BDIR). + @touch $@ diff --git a/reproduce/analysis/make/top-make.mk b/reproduce/analysis/make/top-make.mk new file mode 100644 index 0000000..7d20800 --- /dev/null +++ b/reproduce/analysis/make/top-make.mk @@ -0,0 +1,136 @@ +# Top-level Makefile (first to be loaded). +# +# Copyright (C) 2018-2019 Mohammad Akhlaghi +# +# This Makefile is free software: you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the +# Free Software Foundation, either version 3 of the License, or (at your +# option) any later version. +# +# This Makefile is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General +# Public License for more details. +# +# A copy of the GNU General Public License is available at +# . + + + + + +# Load the local configuration (created after running +# `./project configure'). +include reproduce/software/config/installation/LOCAL.mk + + + + + +# Ultimate target of this project +# ------------------------------- +# +# The final paper/report (`paper.pdf') is the main target of this +# project. As defined in the Make paradigm, it must be the first target +# that Make encounters (immediately after loading the local configuration +# settings, necessary for a group building scenario mentioned next). +# +# +# Group build +# ----------- +# +# This project can also be configured to have a shared build directory +# between multiple users. In this scenario, many users (on a server) can +# have their own/separate version controlled project source, but share the +# same build outputs (in a common directory). This will allow a group to +# work separately, on parallel parts of the analysis that don't +# interfere. It is thus very useful in cases were special storage +# requirements or CPU power is necessary and its not possible/efficient for +# each user to have a fully separate copy of the build directory. +# +# Controlling this requires two variables that are available at this stage: +# +# - `GROUP-NAME': from `LOCAL.mk' (which was built by `./project configure'). +# - `reproducible_paper_group_name': value to the `--group' option. +# +# The analysis is only done when both have the same group name. Note that +# when the project isn't being built for a group, both variables will be an +# empty string. +# +# +# Only processing, no LaTeX PDF +# ----------------------------- +# +# If you are just interested in the processing and don't want to build the +# PDF, you can skip the creatation of the final PDF by removing the value +# of `pdf-build-final' in `reproduce/analysis/config/pdf-build.mk'. +ifeq (x$(reproducible_paper_group_name),x$(GROUP-NAME)) +all: paper.pdf +else +all: + @if [ "x$(GROUP-NAME)" = x ]; then \ + echo "Project is NOT configured for groups, please run"; \ + echo " $$ ./project make"; \ + else \ + echo "Project is configured for groups, please run"; \ + echo " $$ ./project make --group=$(GROUP-NAME) -j8"; \ + fi +endif + + + + + +# Define source Makefiles +# ----------------------- +# +# To keep things clean, managable and readable, each set of operations +# is (and must be) classified (modularized) by context into separate +# Makefiles: the more the better. These modular steps are then +# included in this top-level Makefile through the `include' command of +# the next step. Each Makefile should also produce a LaTeX macro file +# with the same fixed name (used to keep all the parameters and +# relevant outputs of the steps in it for the final paper). +# +# In the rare case that no special LaTeX macros are necessary in a +# workhorse Makefile, you can simply make an empty file with `touch +# $@'. This will not add any lines to the final combined LaTeX macros +# file, but will create the file that is a prerequisite to the final +# paper generation. +# +# To (significantly) help in readability, this top-level Makefile should be +# the only one in charge of including Makefiles. So if you care about easy +# maintainence and understandability (even for your self, in one year! It +# is VERY IMPORTANT and as a scientist, you MUST care about it!), do not +# include Makefiles from any other Makefile. +# +# IMPORTANT NOTE: order matters in the inclusion of the processing +# Makefiles. As the project grows, some Makefiles will define +# variables/dependencies that later Makefiles need. Therefore we are using +# a `foreach' loop in the next step to explicitly request loading them in +# the same order that they are defined here (we aren't just using a +# wild-card like the configuration Makefiles). +makesrc = initialize \ + download \ + delete-me \ + paper + + + + + +# Include all analysis Makefiles +# ------------------------------ +# +# 1) All the analysis configuration-Makefiles (Makefiles that only define +# variables with no rules or order). +# +# 2) From the software configuration-Makefiles, we only include the one +# containing software versions, just incase its necessary to +# use/report outside of the acknowledgments section of the paper. +# +# 3) Finally, we'll import all the analysis workhorse-Makefiles which +# contain rules to actually do this project's processing. +include reproduce/analysis/config/*.mk +include reproduce/software/config/installation/versions.mk +include $(foreach s,$(makesrc), reproduce/analysis/make/$(s).mk) diff --git a/reproduce/analysis/make/top-prepare.mk b/reproduce/analysis/make/top-prepare.mk new file mode 100644 index 0000000..3353638 --- /dev/null +++ b/reproduce/analysis/make/top-prepare.mk @@ -0,0 +1,91 @@ +# Do basic preparations to optimize the project's running. +# +# NOTE: This file is very similar to `top-make.mk', so the large comments +# are not included here. Please see that file for thorough comments on each +# step. +# +# Copyright (C) 2019 Mohammad Akhlaghi +# +# This Makefile is free software: you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the +# Free Software Foundation, either version 3 of the License, or (at your +# option) any later version. +# +# This Makefile is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General +# Public License for more details. +# +# A copy of the GNU General Public License is available at +# . + + + + + +# Load the local configuration (created after running +# `./project configure'). +include reproduce/software/config/installation/LOCAL.mk + + + + + +# Ultimate target of this project +# ------------------------------- +# +# See `top-make.mk' for complete explanation. +ifeq (x$(reproducible_paper_group_name),x$(GROUP-NAME)) +all: $(BDIR)/software/preparation-done.txt + @echo ""; + echo "----------------" + echo "Project preparation has been completed without any errors." + echo "" + echo "Please run the following command to start building the project." + echo "(Replace '8' with the number of CPU threads on your system)" + echo "" + if [ "x$(GROUP-NAME)" = x ]; then + echo " $$ ./project make" + else + echo " $$ ./project make --group=$(GROUP-NAME) -j8" + fi + echo "" +else +all: + @if [ "x$(GROUP-NAME)" = x ]; then + echo "Project is NOT configured for groups, please run" + echo " $$ ./project prepare" + else + echo "Project is configured for groups, please run" + echo " $$ ./project prepare --group=$(GROUP-NAME) -j8" + fi +endif + + + + + +# Define source Makefiles +# ----------------------- +# +# See `top-make.mk' for complete explanation. +# +# To ensure that `prepare' and `make' have the same basic definitions and +# environment and that all `downloads' are managed in one place, both +# `./project prepare' and `./project make' will first read `initialize.mk' +# and `downloads.mk'. +makesrc = initialize \ + download \ + prepare + + + + + +# Include all analysis Makefiles +# ------------------------------ +# +# See `top-make.mk' for complete explanation. +include reproduce/analysis/config/*.mk +include reproduce/software/config/installation/versions.mk +include $(foreach s,$(makesrc), reproduce/analysis/make/$(s).mk) diff --git a/reproduce/analysis/make/top.mk b/reproduce/analysis/make/top.mk deleted file mode 100644 index 7d20800..0000000 --- a/reproduce/analysis/make/top.mk +++ /dev/null @@ -1,136 +0,0 @@ -# Top-level Makefile (first to be loaded). -# -# Copyright (C) 2018-2019 Mohammad Akhlaghi -# -# This Makefile is free software: you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by the -# Free Software Foundation, either version 3 of the License, or (at your -# option) any later version. -# -# This Makefile is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General -# Public License for more details. -# -# A copy of the GNU General Public License is available at -# . - - - - - -# Load the local configuration (created after running -# `./project configure'). -include reproduce/software/config/installation/LOCAL.mk - - - - - -# Ultimate target of this project -# ------------------------------- -# -# The final paper/report (`paper.pdf') is the main target of this -# project. As defined in the Make paradigm, it must be the first target -# that Make encounters (immediately after loading the local configuration -# settings, necessary for a group building scenario mentioned next). -# -# -# Group build -# ----------- -# -# This project can also be configured to have a shared build directory -# between multiple users. In this scenario, many users (on a server) can -# have their own/separate version controlled project source, but share the -# same build outputs (in a common directory). This will allow a group to -# work separately, on parallel parts of the analysis that don't -# interfere. It is thus very useful in cases were special storage -# requirements or CPU power is necessary and its not possible/efficient for -# each user to have a fully separate copy of the build directory. -# -# Controlling this requires two variables that are available at this stage: -# -# - `GROUP-NAME': from `LOCAL.mk' (which was built by `./project configure'). -# - `reproducible_paper_group_name': value to the `--group' option. -# -# The analysis is only done when both have the same group name. Note that -# when the project isn't being built for a group, both variables will be an -# empty string. -# -# -# Only processing, no LaTeX PDF -# ----------------------------- -# -# If you are just interested in the processing and don't want to build the -# PDF, you can skip the creatation of the final PDF by removing the value -# of `pdf-build-final' in `reproduce/analysis/config/pdf-build.mk'. -ifeq (x$(reproducible_paper_group_name),x$(GROUP-NAME)) -all: paper.pdf -else -all: - @if [ "x$(GROUP-NAME)" = x ]; then \ - echo "Project is NOT configured for groups, please run"; \ - echo " $$ ./project make"; \ - else \ - echo "Project is configured for groups, please run"; \ - echo " $$ ./project make --group=$(GROUP-NAME) -j8"; \ - fi -endif - - - - - -# Define source Makefiles -# ----------------------- -# -# To keep things clean, managable and readable, each set of operations -# is (and must be) classified (modularized) by context into separate -# Makefiles: the more the better. These modular steps are then -# included in this top-level Makefile through the `include' command of -# the next step. Each Makefile should also produce a LaTeX macro file -# with the same fixed name (used to keep all the parameters and -# relevant outputs of the steps in it for the final paper). -# -# In the rare case that no special LaTeX macros are necessary in a -# workhorse Makefile, you can simply make an empty file with `touch -# $@'. This will not add any lines to the final combined LaTeX macros -# file, but will create the file that is a prerequisite to the final -# paper generation. -# -# To (significantly) help in readability, this top-level Makefile should be -# the only one in charge of including Makefiles. So if you care about easy -# maintainence and understandability (even for your self, in one year! It -# is VERY IMPORTANT and as a scientist, you MUST care about it!), do not -# include Makefiles from any other Makefile. -# -# IMPORTANT NOTE: order matters in the inclusion of the processing -# Makefiles. As the project grows, some Makefiles will define -# variables/dependencies that later Makefiles need. Therefore we are using -# a `foreach' loop in the next step to explicitly request loading them in -# the same order that they are defined here (we aren't just using a -# wild-card like the configuration Makefiles). -makesrc = initialize \ - download \ - delete-me \ - paper - - - - - -# Include all analysis Makefiles -# ------------------------------ -# -# 1) All the analysis configuration-Makefiles (Makefiles that only define -# variables with no rules or order). -# -# 2) From the software configuration-Makefiles, we only include the one -# containing software versions, just incase its necessary to -# use/report outside of the acknowledgments section of the paper. -# -# 3) Finally, we'll import all the analysis workhorse-Makefiles which -# contain rules to actually do this project's processing. -include reproduce/analysis/config/*.mk -include reproduce/software/config/installation/versions.mk -include $(foreach s,$(makesrc), reproduce/analysis/make/$(s).mk) diff --git a/reproduce/software/bash/configure.sh b/reproduce/software/bash/configure.sh index 5c46496..7ef576a 100755 --- a/reproduce/software/bash/configure.sh +++ b/reproduce/software/bash/configure.sh @@ -1387,9 +1387,9 @@ echo `.local/bin/date` > $finaltarget # The configuration is now complete, we can inform the user on the next # step(s) to take. if [ x$reproducible_paper_group_name = x ]; then - buildcommand="./project make -j8" + buildcommand="./project prepare -j8" else - buildcommand="./project make --group=$reproducible_paper_group_name -j8" + buildcommand="./project prepare --group=$reproducible_paper_group_name -j8" fi cat <