Next: Customization checklist, Previous: Why Make?, Up: About
In order to customize Maneage to your research, it is important to first
understand its architecture so you can navigate your way in the directories
and understand how to implement your research project within its framework:
where to add new files and which existing files to modify for what
purpose. But if this the first time you are using Maneage, before reading
this theoretical discussion, please run Maneage once from scratch without
any changes (described in README.md
). You will see how it works (note that
the configure step builds all necessary software, so it can take long, but
you can continue reading while its working).
The project has two top-level directories: reproduce
and
tex
. reproduce
hosts all the software building and analysis
steps. tex
contains all the final paper's components to be compiled into
a PDF using LaTeX.
The reproduce
directory has two sub-directories: software
and
analysis
. As the name says, the former contains all the instructions to
download, build and install (independent of the host operating system) the
necessary software (these are called by the ./project configure
command). The latter contains instructions on how to use those software to
do your project's analysis.
After it finishes, ./project configure
will create the following symbolic
links in the project's top source directory: .build
which points to the
top build directory and .local
for easy access to the custom built
software installation directory. With these you can easily access the build
directory and project-specific software from your top source directory. For
example if you run .local/bin/ls
you will be using the ls
of Maneage,
which is probably different from your system's ls
(run them both with
--version
to check).
Once the project is configured for your system, ./project make
will do
the basic preparations and run the project's analysis with the custom
version of software. The project
script is just a wrapper, and with the
make
argument, it will first call top-prepare.mk
and top-make.mk
(both are in the reproduce/analysis/make
directory).
In terms of organization, top-prepare.mk
and top-make.mk
have an
identical design, only minor differences. So, let's continue Maneage's
architecture with top-make.mk
. Once you understand that, you'll clearly
understand top-prepare.mk
also. These very high-level files are
relatively short and heavily commented so hopefully the descriptions in
each comment will be enough to understand the general details. As you read
this section, please also look at the contents of the mentioned files and
directories to fully understand what is going on.
Before starting to look into the top top-make.mk
, it is important to
recall that Make defines dependencies by files. Therefore, the
input/prerequisite and output of every step/rule must be a file. Also
recall that Make will use the modification date of the prerequisite(s) and
target files to see if the target must be re-built or not. Therefore during
the processing, many intermediate files will be created (see the tips
section below on a good strategy to deal with large/huge files).
To keep the source and (intermediate) built files separate, the user must
define a top-level build directory variable (or $(BDIR)
) to host all the
intermediate files (you defined it during ./project configure
). This
directory doesn't need to be version controlled or even synchronized, or
backed-up in other servers: its contents are all products, and can be
easily re-created any time. As you define targets for your new rules, it is
thus important to place them all under sub-directories of $(BDIR)
. As
mentioned above, you always have fast access to this "build"-directory with
the .build
symbolic link. Also, beware to never make any manual change
in the files of the build-directory, just delete them (so they are
re-built).
In this architecture, we have two types of Makefiles that are loaded into
the top Makefile
: configuration-Makefiles (only independent
variables/configurations) and workhorse-Makefiles (Makefiles that
actually contain analysis/processing rules).
The configuration-Makefiles are those that satisfy these two wildcards:
reproduce/software/config/*.conf
(for building the necessary software
when you run ./project configure
) and reproduce/analysis/config/*.conf
(for the high-level analysis, when you run ./project make
). These
Makefiles don't actually have any rules, they just have values for various
free parameters throughout the configuration or analysis. Open a few of
them to see for yourself. These Makefiles must only contain raw Make
variables (project configurations). By "raw" we mean that the Make
variables in these files must not depend on variables in any other
configuration-Makefile. This is because we don't want to assume any order
in reading them. It is also very important to not define any rule, or
other Make construct, in these configuration-Makefiles.
Following this rule-of-thumb enables you to set these configure-Makefiles as a prerequisite to any target that depends on their variable values. Therefore, if you change any of their values, all targets that depend on those values will be re-built. This is very convenient as your project scales up and gets more complex.
The workhorse-Makefiles are those satisfying this wildcard
reproduce/software/make/*.mk
and reproduce/analysis/make/*.mk
. They
contain the details of the processing steps (Makefiles containing
rules). Therefore, in this phase order is important, because the
prerequisites of most rules will be the targets of other rules that will be
defined prior to them (not a fixed name like paper.pdf
). The lower-level
rules must be imported into Make before the higher-level ones.
All processing steps are assumed to ultimately (usually after many rules)
end up in some number, image, figure, or table that will be included in the
paper. The writing of these results into the final report/paper is managed
through separate LaTeX files that only contain macros (a name given to a
number/string to be used in the LaTeX source, which will be replaced when
compiling it to the final PDF). So the last target in a workhorse-Makefile
is a .tex
file (with the same base-name as the Makefile, but in
$(BDIR)/tex/macros
). As a result, if the targets in a workhorse-Makefile
aren't directly a prerequisite of other workhorse-Makefile targets, they
can be a prerequisite of that intermediate LaTeX macro file and thus be
called when necessary. Otherwise, they will be ignored by Make.
Maneage also has a mode to share the build directory between several
users of a Unix group (when working on large computer clusters). In this
scenario, each user can have their own cloned project source, but share the
large built files between each other. To do this, it is necessary for all
built files to give full permission to group members while not allowing any
other users access to the contents. Therefore the ./project configure
and
./project make
steps must be called with special conditions which are
managed in the --group
option.
Let's see how this design is implemented. Please open and inspect
top-make.mk
it as we go along here. The first step (un-commented line) is
to import the local configuration (your answers to the questions of
./project configure
). They are defined in the configuration-Makefile
reproduce/software/config/LOCAL.conf
which was also built by ./project
configure
(based on the LOCAL.conf.in
template of the same directory).
The next non-commented set of the top Makefile
defines the ultimate
target of the whole project (paper.pdf
). But to avoid mistakes, a sanity
check is necessary to see if Make is being run with the same group settings
as the configure script (for example when the project is configured for
group access using the ./for-group
script, but Make isn't). Therefore we
use a Make conditional to define the all
target based on the group
permissions.
Having defined the top/ultimate target, our next step is to include all the
other necessary Makefiles. However, order matters in the importing of
workhorse-Makefiles and each must also have a TeX macro file with the same
base name (without a suffix). Therefore, the next step in the top-level
Makefile is to define the makesrc
variable to keep the base names
(without a .mk
suffix) of the workhorse-Makefiles that must be imported,
in the proper order.
Finally, we import all the necessary remaining Makefiles: 1) All the
analysis configuration-Makefiles with a wildcard. 2) The software
configuration-Makefile that contains their version (just in case its
necessary). 3) All workhorse-Makefiles in the proper order using a Make
foreach
loop.
In short, to keep things modular, readable and manageable, follow these recommendations: 1) Set clear-to-understand names for the configuration-Makefiles, and workhorse-Makefiles, 2) Only import other Makefiles from top Makefile. These will let you know/remember generally which step you are taking before or after another. Projects will scale up very fast. Thus if you don't start and continue with a clean and robust convention like this, in the end it will become very dirty and hard to manage/understand (even for yourself). As a general rule of thumb, break your rules into as many logically-similar but independent steps as possible.
The reproduce/analysis/make/paper.mk
Makefile must be the final Makefile
that is included. This workhorse Makefile ends with the rule to build
paper.pdf
(final target of the whole project). If you look in it, you
will notice that this Makefile starts with a rule to create
$(mtexdir)/project.tex
(mtexdir
is just a shorthand name for
$(BDIR)/tex/macros
mentioned before). As you see, the only dependency of
$(mtexdir)/project.tex
is $(mtexdir)/verify.tex
(which is the last
analysis step: it verifies all the generated results). Therefore,
$(mtexdir)/project.tex
is the connection between the
processing/analysis steps of the project, and the steps to build the final
PDF.
During the research, it often happens that you want to test a step that is not a prerequisite of any higher-level operation. In such cases, you can (temporarily) define that processing as a rule in the most relevant workhorse-Makefile and set its target as a prerequisite of its TeX macro. If your test gives a promising result and you want to include it in your research, set it as prerequisites to other rules and remove it from the list of prerequisites for TeX macro file. In fact, this is how a project is designed to grow in this framework.
While Git does an excellent job at keeping a history of the contents of files, it makes no effort in keeping the file meta data, and in particular the dates of files. Therefore when you checkout to a different branch, files that are re-written by Git will have a newer date than the other project files. However, file dates are important in the current design of Maneage: Make checks the dates of the prerequisite files and target files to see if the target should be re-built.
To fix this problem, for Maneage we use a forked version of
Metastore. Metastore use
a binary database file (which is called .file-metadata
) to keep the
modification dates of all the files under version control. This file is
also under version control, but is hidden (because it shouldn't be modified
by hand). During the project's configuration, Maneage installs to Git hooks
to run Metastore 1) before making a commit to update its database with the
file dates in a branch, and 2) after doing a checkout, to reset the
file-dates after the checkout is complete and re-set the file dates back to
what they were.
In practice, Metastore should work almost fully invisibly within your
project. The only place you might notice its presence is that you'll see
.file-metadata
in the list of modified/staged files (commonly after
merging your branches). Since its a binary file, Git also won't show you
the changed contents. In a merge, you can simply accept any changes with
git add -u
. But if Git is telling you that it has changed without a merge
(for example if you started a commit, but canceled it in the middle), you
can just do git checkout .file-metadata
and set it back to its original
state.
Based on the explanation above, some major design points you should have in mind are listed below.
Define new reproduce/analysis/make/XXXXXX.mk
workhorse-Makefile(s)
with good and human-friendly name(s) replacing XXXXXX
.
Add XXXXXX
, as a new line, to the values in makesrc
of the top-level
Makefile
.
Do not use any constant numbers (or important names like filter names)
in the workhorse-Makefiles or paper's LaTeX source. Define such
constants as logically-grouped, separate configuration-Makefiles in
reproduce/analysis/config/XXXXX.conf
. Then set this
configuration-Makefiles file as a prerequisite to any rule that uses
the variable defined in it.
Through any number of intermediate prerequisites, all processing steps
should end in (be a prerequisite of) $(mtexdir)/verify.tex
(defined in
reproduce/analysis/make/verify.mk
). $(mtexdir)/verify.tex
is the sole
dependency of $(mtexdir)/project.tex
, which is the bridge between the
processing steps and PDF-building steps of the project.
Next: Customization checklist, Previous: Why Make?, Up: About