Age | Commit message (Collapse) | Author | Lines |
|
Many projects use Python so it is necessary include it in the
pipeline.
|
|
In the example running code of the wrapper script, I had just written
`./download-multi-try', but this script is meant to be run from the top of
the project directory. This could cause confusion.
So the example script now starts with `/path/to/download-multi-try'.
|
|
We don't have a `.sh' suffix in the other scripts of `reproduce/src/bash',
so it was also removed from this script.
|
|
Until now, downloading was treated similar to any other operation in the
Makefile: if it crashes, the pipeline would crash. But network errors
aren't like processing errors: attempting to download a second time will
probably not crash (network relays are very complex and not reproducible
and packages get lost all the time)!
This is usually not felt in downloading one or two files, but when
downloading many thousands of files, it will happen every once and a while
and its a real waste of time until you check to just press enter again!
With this commit we have the `reproduce/src/bash/download-multi-try.sh'
script in the pipeline which will repeat the downoad several times (with
incrasing time intervals) before crashing and thus fix the problem.
|
|
In order to collaborate effectively in the project, even project members
that don't necessarily want (or have the capacity) to do the whole analysis
must be able to contribute to the project. Until now, the users of the
distributed tarball could only modify the text and not the figures (built
with PGFPlots) of the paper.
With this commit, the management of TeX source files in the pipeline was
slightly modified to allow this as cleanly as I could think of now! In
short, the hand-written TeX files are now kept in `tex/src' and for the
pipeline's generated TeX files (in particular the old `tex/pipeline.tex'),
we now have a `tex/pipeline' symbolic-link/directory that points to the
`tex' directory under the build directory.
When packaging the project, `tex/pipeline' will be a full directory with a
copy of all the necessary files. Therefore as far as LaTeX is concerned,
having a build-directory is no longer relevant. Many other small changes
were made to do this job cleanly which will just make this commit message
too long!
Also, the old `tarball' and `zip' targets are now `dist' and `dist-zip' (as
in the standard GNU Build system).
|
|
With this commit, it is now possible to package the project into a tarball
or zip file, ready to be distributed to collaborators who only want to
modify the final paper (and not do the analysis technicalities), or for
uploading to sites like arXiv, or online LaTeX sharing pages.
|
|
A few issues came up while testing the `for-group' script in one of the
projects based on this pipeline that are being fixed with this commit:
1) We are ultimately using the `sg' command to use the specified group,
not `chgrp'. So in cases where `chgrp' has problems, this would cause
a wrong error. So for the test of the given group's existance, we are
now directly calling `sg'.
2) In the call to `make' we were mistakenly giving make the `$2' (which
is `make' on the command-line) argument. Since `./for-group' now takes
the group name as its first argument, this should have been `$3'.
3) To help in readability, and also allow for group names with a space,
`reproducible_paper_group_name' is now defined and exported before the
final call to `sg'.
|
|
Until now, the group name to build the project actually went into the Git
source of the project! This doesn't allow exact reproducibility on
different machines (where the group name may be different).
With this commit, the `for-group' script has been modified to accept the
group name as its first argument and pass that onto `configure' and
Make. This is much better now, because not only the existance of a group
installation is checked, but also the name of the group. It also made
things simpler (in particular in `LOCAL.mk.in').
|
|
Until now, the `./configure' script would only print the `.local/bin/make
-j8' command. But when configured for groups, a different command should be
used. It now does a check just before running and suggests the proper
command.
|
|
Until now it was `/bin/sh', but on Debian systems, this can cause problems
because by default they use a much weaker shell (dash) which doesn't
recognize functions.
|
|
I recently found another fork of metastore that allows its build on macOS
systems (https://github.com/mpctx/metastore). So I forked it into my own
fork with several other corrections (mostly cosmetic!), so it is now much
better suited for this pipeline.
Raul Infante-Sainz has already tested the building of metastore on his
macOS. In a previous test, we also noticed that libbsd should not be built
on Mac systems, so it is now a conditional prerequisite to metastore.
|
|
While editing files, some editors create temporary `~' files that can cause
problems in metastore's ability to delete their host directory if its not
on the other branch. With this commit, a `find' call was added to the post
checkout Git hook to remove such temporary files before metastore is
called.
Also, some comments were added to both git hooks to make them easier to
understand for a beginner.
|
|
I needed to take these steps in a few occasions on a project I am building
over this pipeline. This will commonly happen when a team starts using this
pipeline, so it was added to make things easier.
|
|
To be more generic and recognizable, the `README-pipeline.md' script was
renamed to `README-hacking.md'. In essence, it is just that: to hack the
existing pipeline for your own project. We follow a similar naming
convention in many GNU software.
|
|
Two corrections were made in the Git hooks of Metastore.
1) The shebang at the start of the scripts now uses the absolute adress of
our installed bash, not the relative `.local/bin/bash'. Note that it is
possible to use Git within subdirectories and in that scenario, the
`.local' will fail.
2) The `$$user' section was removed from the command to find the user's
group. With the user as an argument, `groups' may print the user's name
first, then their list of groups. When this happens, the script would
be just repeating the user's name. But the raw `groups' command will
list the groups of the running user.
|
|
Until now, the check to see if the patchelf program should be used or not
(for GNU/Linux vs. Mac installations) was mistakenly added over the step
that we define the `sh' symbolic link, not over the call to patchelf. This
is corrected with this commit.
|
|
In this version, too many extra notices (just regarding a change from
branch to branch) are not printed with `-q'. Instead only a one line
statement is printed that it is saved or applied.
|
|
Until we see what happens with the pull request of our suggested features
in metastore, its version isn't written directly into the executable, so we
won't actually check it, but write the version directly into the paper.
|
|
After testing the built of Metastore on a server, I noticed that because
its `/etc/passwd' doesn't have the list of users, the `getpwuid' call
within metastore failed and wouldn't let it finish.
So I looked into the code and was able to implement a solution to this
problem by adding two options to it for default values for the user and
group. Also, file attributes are not necessary in our (current) use case of
metastore and caused crashes on our server, so they are also disabled.
|
|
Metastore depends on `bsd/string.h' to work properly (atleast on GNU/Linux
systems). The first system I tried building with had that library, so I
didn't notice! With this commit, we also build `libbsd' as part of the
pipeline.
Also, I couldn't find libbsd's version in any of its installed headers, so
like Libjpeg, we can't actually check and will directly write our internal
version into the paper.
|
|
The pipeline heavily depends on file meta data (and in particular the
modification dates), for example the configuration-Makefiles within the
pipeline are set as prerequisites to the rules of the pipeline.
However, when Git checks out a branch, it doesn't preserve the meta-data of
the files unique to that branch (for example program source files or
configuration-Makefiles). As a result, the rules that depend on them will
be re-done.
This is especially troublesome in the scenario of this reproducible paper
project because we commonly need to switch between branches (for example to
import recent work in the pipeline into the projects). After some
searching, I think the Metastore program is the best solution. Metastore is
now built as part of the pipeline and through two Git hooks, it is called
by Git to store the original meta-data of files into a binary file that is
version controlled (and managed by Metastore).
|