Age | Commit message (Collapse) | Author | Lines |
|
Until now, we were using `flock' (file-lock) for downloading the input
datasets in series. But we couldn't do this when downloading the software
tarballs because `flock' wasn't yet available. Generally, unlike
processing, downloading is much better done in series than in parallel.
To enable serial downloads of the software also, with this commit we are
installing `flock' in the configure script (not in a Makefile). As a
result, besides `flock', we can also benefit from the other good features
of the `reproduce/src/bash/download-multi-try' script *(for example
attempting download again after some time).
Some GNU mirrors may have problems at the time of download, so with this
commit, we are using the main GNU FTP server for GNU programs.
|
|
Until now, once the Git hooks have been installed (after the
installation of Metastore), if metastore doesn't exist (for example by
manually deleting the build directory for a re-build with same
configurations as before) we can't run `git commit' and `git checkout'
will print an ugly warning.
With this commit, the two Git hooks check for the existance of Metastore
and if it doesn't exist, they won't do anything.
|
|
In the example running code of the wrapper script, I had just written
`./download-multi-try', but this script is meant to be run from the top of
the project directory. This could cause confusion.
So the example script now starts with `/path/to/download-multi-try'.
|
|
We don't have a `.sh' suffix in the other scripts of `reproduce/src/bash',
so it was also removed from this script.
|
|
Until now, downloading was treated similar to any other operation in the
Makefile: if it crashes, the pipeline would crash. But network errors
aren't like processing errors: attempting to download a second time will
probably not crash (network relays are very complex and not reproducible
and packages get lost all the time)!
This is usually not felt in downloading one or two files, but when
downloading many thousands of files, it will happen every once and a while
and its a real waste of time until you check to just press enter again!
With this commit we have the `reproduce/src/bash/download-multi-try.sh'
script in the pipeline which will repeat the downoad several times (with
incrasing time intervals) before crashing and thus fix the problem.
|
|
While editing files, some editors create temporary `~' files that can cause
problems in metastore's ability to delete their host directory if its not
on the other branch. With this commit, a `find' call was added to the post
checkout Git hook to remove such temporary files before metastore is
called.
Also, some comments were added to both git hooks to make them easier to
understand for a beginner.
|
|
Two corrections were made in the Git hooks of Metastore.
1) The shebang at the start of the scripts now uses the absolute adress of
our installed bash, not the relative `.local/bin/bash'. Note that it is
possible to use Git within subdirectories and in that scenario, the
`.local' will fail.
2) The `$$user' section was removed from the command to find the user's
group. With the user as an argument, `groups' may print the user's name
first, then their list of groups. When this happens, the script would
be just repeating the user's name. But the raw `groups' command will
list the groups of the running user.
|
|
After testing the built of Metastore on a server, I noticed that because
its `/etc/passwd' doesn't have the list of users, the `getpwuid' call
within metastore failed and wouldn't let it finish.
So I looked into the code and was able to implement a solution to this
problem by adding two options to it for default values for the user and
group. Also, file attributes are not necessary in our (current) use case of
metastore and caused crashes on our server, so they are also disabled.
|
|
The pipeline heavily depends on file meta data (and in particular the
modification dates), for example the configuration-Makefiles within the
pipeline are set as prerequisites to the rules of the pipeline.
However, when Git checks out a branch, it doesn't preserve the meta-data of
the files unique to that branch (for example program source files or
configuration-Makefiles). As a result, the rules that depend on them will
be re-done.
This is especially troublesome in the scenario of this reproducible paper
project because we commonly need to switch between branches (for example to
import recent work in the pipeline into the projects). After some
searching, I think the Metastore program is the best solution. Metastore is
now built as part of the pipeline and through two Git hooks, it is called
by Git to store the original meta-data of files into a binary file that is
version controlled (and managed by Metastore).
|