Age | Commit message (Collapse) | Author | Lines |
|
Raul Infante-Sainz added the building of Python (along with the Numpy and
Astropy packages) into the pipeline. That work is now being merged into the
main pipeline branch.
There was only this small problem that needed to be fixed: the Python
tarball's name after unpacking is actually `Python-X.X.X' (with a captial
P), not `python-X.X.X'. This has been corrected with this merge.
|
|
The zip program wasn't placed correctly (in alphabetical order) and its URL
command had the wrong indentation! Both have no effect at all on the
processing and are only cosmetic (to help in readability).
|
|
Astropy was added and one very important thing is that we have to
use the pypi tarball (https://pypi.org/) (which is bootstrapped)
and not the github tarball.
|
|
Python needs some packages to be really useful. Numpy is the most
important package for using Python and a lot of other packages
depend on it.
In this commit we add numpy to the pipeline. The tarball of numpy
right now is fossies.
|
|
Many projects use Python so it is necessary include it in the
pipeline.
|
|
In the example running code of the wrapper script, I had just written
`./download-multi-try', but this script is meant to be run from the top of
the project directory. This could cause confusion.
So the example script now starts with `/path/to/download-multi-try'.
|
|
We don't have a `.sh' suffix in the other scripts of `reproduce/src/bash',
so it was also removed from this script.
|
|
Until now, downloading was treated similar to any other operation in the
Makefile: if it crashes, the pipeline would crash. But network errors
aren't like processing errors: attempting to download a second time will
probably not crash (network relays are very complex and not reproducible
and packages get lost all the time)!
This is usually not felt in downloading one or two files, but when
downloading many thousands of files, it will happen every once and a while
and its a real waste of time until you check to just press enter again!
With this commit we have the `reproduce/src/bash/download-multi-try.sh'
script in the pipeline which will repeat the downoad several times (with
incrasing time intervals) before crashing and thus fix the problem.
|
|
In order to collaborate effectively in the project, even project members
that don't necessarily want (or have the capacity) to do the whole analysis
must be able to contribute to the project. Until now, the users of the
distributed tarball could only modify the text and not the figures (built
with PGFPlots) of the paper.
With this commit, the management of TeX source files in the pipeline was
slightly modified to allow this as cleanly as I could think of now! In
short, the hand-written TeX files are now kept in `tex/src' and for the
pipeline's generated TeX files (in particular the old `tex/pipeline.tex'),
we now have a `tex/pipeline' symbolic-link/directory that points to the
`tex' directory under the build directory.
When packaging the project, `tex/pipeline' will be a full directory with a
copy of all the necessary files. Therefore as far as LaTeX is concerned,
having a build-directory is no longer relevant. Many other small changes
were made to do this job cleanly which will just make this commit message
too long!
Also, the old `tarball' and `zip' targets are now `dist' and `dist-zip' (as
in the standard GNU Build system).
|
|
With this commit, it is now possible to package the project into a tarball
or zip file, ready to be distributed to collaborators who only want to
modify the final paper (and not do the analysis technicalities), or for
uploading to sites like arXiv, or online LaTeX sharing pages.
|
|
Until now, the group name to build the project actually went into the Git
source of the project! This doesn't allow exact reproducibility on
different machines (where the group name may be different).
With this commit, the `for-group' script has been modified to accept the
group name as its first argument and pass that onto `configure' and
Make. This is much better now, because not only the existance of a group
installation is checked, but also the name of the group. It also made
things simpler (in particular in `LOCAL.mk.in').
|
|
I recently found another fork of metastore that allows its build on macOS
systems (https://github.com/mpctx/metastore). So I forked it into my own
fork with several other corrections (mostly cosmetic!), so it is now much
better suited for this pipeline.
Raul Infante-Sainz has already tested the building of metastore on his
macOS. In a previous test, we also noticed that libbsd should not be built
on Mac systems, so it is now a conditional prerequisite to metastore.
|
|
While editing files, some editors create temporary `~' files that can cause
problems in metastore's ability to delete their host directory if its not
on the other branch. With this commit, a `find' call was added to the post
checkout Git hook to remove such temporary files before metastore is
called.
Also, some comments were added to both git hooks to make them easier to
understand for a beginner.
|
|
Two corrections were made in the Git hooks of Metastore.
1) The shebang at the start of the scripts now uses the absolute adress of
our installed bash, not the relative `.local/bin/bash'. Note that it is
possible to use Git within subdirectories and in that scenario, the
`.local' will fail.
2) The `$$user' section was removed from the command to find the user's
group. With the user as an argument, `groups' may print the user's name
first, then their list of groups. When this happens, the script would
be just repeating the user's name. But the raw `groups' command will
list the groups of the running user.
|
|
Until now, the check to see if the patchelf program should be used or not
(for GNU/Linux vs. Mac installations) was mistakenly added over the step
that we define the `sh' symbolic link, not over the call to patchelf. This
is corrected with this commit.
|
|
In this version, too many extra notices (just regarding a change from
branch to branch) are not printed with `-q'. Instead only a one line
statement is printed that it is saved or applied.
|
|
Until we see what happens with the pull request of our suggested features
in metastore, its version isn't written directly into the executable, so we
won't actually check it, but write the version directly into the paper.
|
|
After testing the built of Metastore on a server, I noticed that because
its `/etc/passwd' doesn't have the list of users, the `getpwuid' call
within metastore failed and wouldn't let it finish.
So I looked into the code and was able to implement a solution to this
problem by adding two options to it for default values for the user and
group. Also, file attributes are not necessary in our (current) use case of
metastore and caused crashes on our server, so they are also disabled.
|
|
Metastore depends on `bsd/string.h' to work properly (atleast on GNU/Linux
systems). The first system I tried building with had that library, so I
didn't notice! With this commit, we also build `libbsd' as part of the
pipeline.
Also, I couldn't find libbsd's version in any of its installed headers, so
like Libjpeg, we can't actually check and will directly write our internal
version into the paper.
|
|
The pipeline heavily depends on file meta data (and in particular the
modification dates), for example the configuration-Makefiles within the
pipeline are set as prerequisites to the rules of the pipeline.
However, when Git checks out a branch, it doesn't preserve the meta-data of
the files unique to that branch (for example program source files or
configuration-Makefiles). As a result, the rules that depend on them will
be re-done.
This is especially troublesome in the scenario of this reproducible paper
project because we commonly need to switch between branches (for example to
import recent work in the pipeline into the projects). After some
searching, I think the Metastore program is the best solution. Metastore is
now built as part of the pipeline and through two Git hooks, it is called
by Git to store the original meta-data of files into a binary file that is
version controlled (and managed by Metastore).
|
|
When building in group mode, users can manage them selves to work on
independent analysis steps and thus not cause conflicts. However, until
now, there was no way to avoid conflicts in building the final paper.
To fix this problem, when we are in group mode, the pipeline will create a
separate LaTeX build director for each user and also a separate PDF file
for each user. This will ensure that their compilations don't conflict.
|
|
With the current build system, Bash and AWK don't write RPATH into the
executables. This causes many problems in the pipeline (for example when
using the `$(shell)' function in Make which doesn't have
`LD_LIBRARY_PATH').
After consulting the Bash and Make mailing lists, so far, the best solution
was to use the Patchelf program to manually write RPATH in these
executables. With this commit, Patchelf is now installed in the pipeline
and used in Bash and AWK to fix this problem.
|
|
The build of bash has been made a little cleaner to help in readability and
management of the code.
|
|
The TIFF library can optionally depend on webp [1] and zstd [2]. But these
aren't commonly used in scientific datasets so to avoid a longer build and
managing of extra dependencies (atleast for now!), we are disabling
them. The problem is that they cause a dependency on the host system and if
they are updated/removed, the relevant pipeline programs will crash.
[1] https://en.wikipedia.org/wiki/WebP
[2] https://en.wikipedia.org/wiki/Zstandard
|
|
In the previous commit, the copyright year and owner were mistakenly
modified. They are corrected now.
|
|
While working on a pipeline based on this, I noticed many linking errors of
our installed Bash, complaining that it can't link with libreadline. This
was while readline was present in the proper directory and the Bash within
a recipe would work properly.
After some investigation, I found out that this is because Make's `foreach'
function (which was used to define the targets) was apparently calling Bash
without setting `LD_LIBRARY_PATH', causing this error.
To avoid such sitations, Bash now uses its internal build of readline and
we no longer ask it to link with the installed readline.
|
|
The targets of the links to have the extra common `ncurses' packages were
previously just `pkgconfig/*.pc'! But this would only work when run within
the `installed/lib' directory, not any other! So the targets for these
packages now use an absolute address.
|
|
If the `./for-group' script is not used properly, it can lead to the whole
pipeline being re-run. Therefore it is important to do a sanity check
immediately at the start of Make's processing and inform the user if there
is a problem.
With this commit, `./for-group' exports the `reproducible_paper_for_group'
variable which is used by both the initial `./configure' script, and later
in each call to Make. The `./configure' script will use it to write a value
in `reproduce/config/pipeline/LOCAL.mk' and Make will use it to compare
with the value in `reproduce/config/pipeline/LOCAL.mk'.
If there is an inconsistency, Make will not even attempt to build anything
and will just print a message and abort.
|
|
Wget and cURL depend on many network related libraries by default and if
they are present on the host operating system, they will be linked
with. This causes problems for the pipeline when these libraries are
updated on the host system.
With this commit, I went through the configure time options of both Wget
and cURL and removed any library that didn't seem related to merely
downloading of files (possibly with SSL, because we do build OpenSSL in the
pipeline).
Also, I noticed a new version of cURL has come, so that is also updated.
|
|
In a previous implementation, we were using a `target' variable to define
the final target of several links, but with the new `sov' solution, we just
used its base name. However, we had forgot to correct two instances of
`target'. This is corrected now.
Also, the step to clean all already built outputs of the NCURSES library
has been simplified to a platform independent wildcard.
|
|
On Mac OS systems, the full version number is not used in the filename
given to libncurses. For example for version 6.1, it is called
`libncursesw.6.dylib'. So a more generic and easier to maintain and read
script is now used to be able to make links for both Mac and GNU/Linux
systems.
In short, instead of checking if we are in Mac every time, we just set the
suffixes at the start based on the machine once as variables and use those
to define the links.
|
|
The call to SED in `dependencies-build-rules.mk' had the file name before
the options. On some verions of SED, this would cause problems. So the
filename is now given after the options.
|
|
The new `--colormap' option was added to the call to Gnuastro's ConvertType
program. Since Gnuastro 0.8, ConvertType needs this option for converting a
single-channel dataset to a color-supporting format.
|
|
Readline is a prerequisite of Bash and AWK, while NCURSES is a prerequisite
of Readline. With the recent update of GNU Bash (and thus GNU Readline) on
my host operating system, the pipeline crashed and I noticed this hole in
the pipeline. In particular, AWK (which linked with Readline 7.0) would
complain about not finding it and abort.
|
|
Git needs cURL in its build. Until now, by chance cURL was always built
before Git, but while building this pipeline on a system, Git was built
before cURL and we found the problem.
I also noticed that we hadn't added `Your name <your@email.address>' to the
`for-group' script. This has been corrected now.
|
|
ccache is a super annoying program in the context of the reproduction
pipeline. On systems that use it, the `gcc' and `g++' that are found in
PATH are actually calls to `ccache' (so it can manage their call)!
Two steps have been taken to ignore and disable ccache (if it isn't ignored
properly!): 1) when making symbolic links to compilers, if a directory
containing `ccache' is present in the PATH, it is first removed, then we
look for the low-level programs that we won't be building. 2) The
`CCACHE_DISABLE' environment variable is set to 1 where necessary (with the
other environment variables).
|
|
During the last month, several core GNU programs were updated, so their
versions in the pipeline have also been updated.
|
|
After installing Bash, we would just blindly try to build the $(ibdir)/sh'
symbolic link. But that could fail if it already existed. To make things
clean, we now remove any link first before attempting to make a new one.
|
|
Since the current implementation of this pipeline officially started in
2018, all the files only had 2018 in their copyright years. This has now
been corrected to 2018-2019.
|
|
By giving this option specifically at the build time of Pkg-config, we'll
ensure that any package that uses pkg-config will first look into our
locally installed build.
|
|
Both Gzip and Gnuastro were being bootstrapped personally from their Git
repository until now. But fortunately a new release of both came out last
week and so to make things standard we are now using their standard
tarballs.
I also noticed that we weren't checking the version of Gzip or mentioning
it in the acknowledgement section. This was also corrected.
|
|
Bzip2's verison is found differently from the other programs (because it
writes no standard error, not standard output!). So a custom function is
written for it which includes creating a temporary file. But we had forgot
to delete that file after the version is found.
|
|
While checking the build of the previous commit, a failure happened when
linking `reproduce/build/dependencies/installed/bin/sh' with the built Bash
(because the symbolic link already existed!). So a `-f' flag was added to
`ln' to just change it without complaining.
I also noticed that the Git build was also not in verbose mode. So this has
also been corrected.
|
|
While we were testing this pipeline on a Mac OS system, we found and
reported a problem in Gzip's build (bug #33689). However, since the Gzip
build is not verbose, it was necessary to run its `make' with
`V=1'. Generally, since almost all the programs are built in verbose mode
(where you can see the compilation commands), we have also set this flag in
any build to be clear and make it easier to spot bugs in the future.
|
|
A minor correction was made in the checklist (since we only have one
`foreach' loop in the top-level Makefile) and also the version of Gnuastro
was incremented.
|
|
Some problems with using the number of threads in dependency building were
fixed.
|
|
The version of Git was updated to the most recent version (2.20.0).
|
|
Some host Make systems may not allow automatic passing of the number of
threads to sub-Makes. So while building the basic dependencies, we'll need
to explicity add the `-j' option to the Make files that can benefit most
from it: those that are dependencies of many others (Tar & Make), or are
the last to build (Coreutils).
|
|
Gnuastro's BuildProgram is a little special: it is actually built during
the building of Gnuastro to keep important include and lib directory
information and if someone wants to use BuildProgram, this information is
necessary. So a special configuration is added for it in
`reproduce/config/gnuastro'. This configuration file will allow users to
set their own special configuration if they like, then it will load the
installed BuildProgram configuration file.
|
|
On Mac OS systems, CFITSIO doesn't use path to find the `curl-config'
program (used by to give the library header and linking options), but uses
an absolute path. Therefore the only way we can ask CFITSIO to look into
our build of cURL is to manually change that absolute address.
Also, since all the libraries are now linked dynamically, we don't need the
extra linking flags when building WCSLIB (so it finds CFITSIO).
|