Age | Commit message (Collapse) | Author | Lines |
|
Until now, the files where the people were meant to change didn't have a
proper copyright notice (for example `Copyright (C) YOUR NAME.'). This was
wrong because the license does not convey copyright ownership. So the name
of the file's original author must always be included and when people
modify it (and add their own copyright-able modifications).
With this commit, the file's original author (and email) are added to the
copyright notice and when more than one person modified a file, both names
have their individual copyright notice.
Based on this, the description for adding a copyright notice in
`README-hacking.md' has also been modified.
|
|
Until now, there was a single `tex/src/references.tex' file that housed the
BibTex entries for everything (software and non-software).
Since we have started to include the BibTeX entry for more software, it
will be hard to manage the large (sometime unused) BibTeX entries of the
software in the middle of the non-software related citations in the text of
the paper.
Therefore, with this commit, a `tex/dependencies' directory has been made
which has a separate BibTeX entry file for each software that needs
one. After the software is built, this file is copied to the new
`.local/version-info/cite' directory. At the end, the configure script will
concatenate all the files in this directory into one file which will later
be used with `tex/src/references.tex' by BibLaTeX.
This greatly simplifies managing of citations. Allowing us to focus on the
software-building and paper-writing citations separately/cleanly (and thus
be more efficient in both).
|
|
With this commit, we are applying the new style of citing software within
the build rule of Gnuastro.
|
|
On some systems, M4 isn't available, so the linking to the host system
fails, as a result, we can't build GNU Libtool.
The main reason we weren't building M4 was a bug with the most recent GNU C
library
(http://lists.gnu.org/archive/html/bug-gnulib/2019-04/msg00004.html). But I
found a patch used by Arch Linux which fixes the issue and allows M4 to be
built. As a result, the pipeline is now building M4 also and the patched M4
tarball is now uploaded to my own webpage as backup.
While doing the steps above, I also noticed that we weren't using a tab at
the start of the link definitions of `dependencies-basic.mk'. Although its
not necessary, to be consistent, its good for the lines to always start
with a tab.
|
|
Until recently we were using an actual installed executable file for the
programs. So for Gnuastro, the target was called `astnoisechisel'. But
recently, this approach was changed and the target for each software is a
simple text file with the official software name and version.
So with this commit, we are simply using `gnuastro' for Gnuastro, not
`astnoisechisel'.
|
|
On some GNU/Linux distros, the compiler is separated into `multilib' mode
(for 32-bit and 64-bit support) and by default the extra component of the
compiler is not installed! In such systems for now, we are just creating
symbolic links to the host's compiler (similar to Mac).
While testing, I noticed that we weren't passing a "$downloader" option to
the downloading script of `dependencies.mk' and `dependencies-python.mk'.
Also, I noticed that the Cython and Python-pkg-config packages didn't have
setuptools as a dependency! Both have now been fixed. Also, Cython's
tarball name is now all small-caps (as in all the other tarballs).
|
|
Until now, management of the software names and versions in the paper was
done manually (a macro had to be defined in `initialize.mk', then used in
`paper.tex', so they had to be manually set in two places). Managing this
was not easy.
To fix this, with this commit, each software building rule's target is a
text file that contains its human-readable name and its version. In the
end, the configure script sorts them by their name and writes them into a
LaTeX input file that we can easily import as a file into the main paper.
|
|
We were developing the build of Numpy and Scipy on Mac in a parallel thread
and things seems to be working relatively nice now. There were only two
problems:
1) GCC still has some random building issues on Mac.
2) ATLAS shared libraries can't be built on Mac (so we used OpenBLAS to
build Numpy and Scipy on both Mac and GNU/Linux).
But for now, none of these problems are critical. So, we can progress in
one branch.
There were only very minor conflicts in the merge.
|
|
We wer not able to build `gcc' on Mac, so we are using links to the host
compilers. In this commit we also found that on Mac the HDF5 library
needs an explicit definition of the compilers.
|
|
After trying the build a system with no Python library, I noticed that
Python's HDF5 module (`h5py') needs the HDF5 library and OpenMPI (to work
in parallel). So they were added. Finally `h5py' uses the `mpi4py' module
to communicate with OpenMPI, so it was also added. However, for some
reason, mpi4py doesn't work with this version of OpenMPI (as described in
the comments above).
So for now, h5py doesn't use it and can only work on a single thread, while
the HDF5 C library links with OpenMPI with no problem.
|
|
Until this commit, the installation of all Python packages were
done in a separate Makefile.
With this commit, the pipeline install Python packages as part of the
hight level software. All Python packages rules them remain in a
separate Makefile, but this Makefile is included in the high level
dependency `reproduce/src/make/dependencies.mk'.
|
|
We could not get ATLAS shared libraries on Mac (while the static ATLAS
libraries are built and can be used successfully on Mac). So, the
pipeline now builds OpenBLAS, which both Numpy and Scipy can use on Mac
and GNU/Linux.
We also added FFTW as a dependency of Numpy. Altough Numpy is not linking to
FFTW for some reason. However, since FFTW is a low level library used by
many programs, we have kept it as a dependency of Numpy anyway for now.
|
|
The Makefile that build the shared libraries comes from Arch Linux so it
does not work easily on Mac. But the full ATLAS build goes successfully
for static libraries. For now we are disabling shared libraries on Mac.
Python was built explicity with `clang' on Mac.
|
|
Until now, we were using `flock' (file-lock) for downloading the input
datasets in series. But we couldn't do this when downloading the software
tarballs because `flock' wasn't yet available. Generally, unlike
processing, downloading is much better done in series than in parallel.
To enable serial downloads of the software also, with this commit we are
installing `flock' in the configure script (not in a Makefile). As a
result, besides `flock', we can also benefit from the other good features
of the `reproduce/src/bash/download-multi-try' script *(for example
attempting download again after some time).
Some GNU mirrors may have problems at the time of download, so with this
commit, we are using the main GNU FTP server for GNU programs.
|
|
Until now, we were simply using the host's GCC for Mac systems. But we
found that except for a single step (to fixing `rpath'), it works on
Mac!!! So, GCC is now part of the Mac build as well.
However, we are still having some problems in building ATLAS on Mac. It
works on GNU/Linux, but not in Mac. So for the time being (just
temporarily), we are avoiding ATLAS (and thus Scipy) on Mac systems. We
just filed an issue on the ATLAS discussion list to hopefully fix the
problem soon.
|
|
Until now we were using a symbolic link to replace GCC, but Make doesn't
treat symbolic links like files. So it would rebuild the links every
time. With this commit, only for GCC on Mac systems, we are actually
copying the host's GCC executable to avoid this problem.
Also, a wrong comment for cURL was removed.
|
|
We generalized the libraries suffixes to work on Mac and GNU/Linux.
|
|
Numpy needs ATLAS as shared libraries. So we also need to build Python with
shared libraries. We also need to define site.cfg for numpy and scipy so we
define a master template:
`reproduce/config/pipeline/dependency-numpy-scipy.cfg'
Also `Openssl' did not have rpath so we added with this commit.
|
|
An initial installation of atlas is now included in the pipeline,
but we are still trying to make it compile and build smoothly. In
the process, we found that GCC also needs some modifications
(for example rpath issues).
|
|
Until this commit, we had some of the python packages intalled
but they did not work properly because of the `PYTHONPATH' variables.
That is, the pipeline's `python' was the `python' of the system
instead of the pipeline's `python'.
With this commit this issue has been fixed by setting the correct
`PYTHONPATH'. In this commit we also modify the installation of
`bzip2' because `CMake' was complaining about some libraries built
statically.
|
|
In the libpng installation there was `ilibdir' instead of `ilidir'.
|
|
Until now the installation of Python and its packages (numpy, astropy,
astroquery, etc.) were done in the same `makefile'.
With this commit the installation of Python and its packages have been
split and now it is independent of the other programs. The installation
of all Python packages needs to be written explicitely because pip is
not used anymore.
|
|
All dependencies for building astroquery package have been done.
Until nowthe Python dependencies were built in the same Makefile
as the high level libraries and programs. But, because astroquery
has many dependencies we split the Python and Python packages
installation in a new Makefile.
The installation of differents packages are done using Python and
not pip, because we found some problems when doing it with pip.
Apparently there are some interferences between the packages
installed by the pip of the system and the pip installed as part
of Python in the pipeline.
|
|
Raul Infante-Sainz added the building of Python (along with the Numpy and
Astropy packages) into the pipeline. That work is now being merged into the
main pipeline branch.
There was only this small problem that needed to be fixed: the Python
tarball's name after unpacking is actually `Python-X.X.X' (with a captial
P), not `python-X.X.X'. This has been corrected with this merge.
|
|
The zip program wasn't placed correctly (in alphabetical order) and its URL
command had the wrong indentation! Both have no effect at all on the
processing and are only cosmetic (to help in readability).
|
|
Astropy was added and one very important thing is that we have to
use the pypi tarball (https://pypi.org/) (which is bootstrapped)
and not the github tarball.
|
|
Python needs some packages to be really useful. Numpy is the most
important package for using Python and a lot of other packages
depend on it.
In this commit we add numpy to the pipeline. The tarball of numpy
right now is fossies.
|
|
Many projects use Python so it is necessary include it in the
pipeline.
|
|
With this commit, it is now possible to package the project into a tarball
or zip file, ready to be distributed to collaborators who only want to
modify the final paper (and not do the analysis technicalities), or for
uploading to sites like arXiv, or online LaTeX sharing pages.
|
|
I recently found another fork of metastore that allows its build on macOS
systems (https://github.com/mpctx/metastore). So I forked it into my own
fork with several other corrections (mostly cosmetic!), so it is now much
better suited for this pipeline.
Raul Infante-Sainz has already tested the building of metastore on his
macOS. In a previous test, we also noticed that libbsd should not be built
on Mac systems, so it is now a conditional prerequisite to metastore.
|
|
While editing files, some editors create temporary `~' files that can cause
problems in metastore's ability to delete their host directory if its not
on the other branch. With this commit, a `find' call was added to the post
checkout Git hook to remove such temporary files before metastore is
called.
Also, some comments were added to both git hooks to make them easier to
understand for a beginner.
|
|
Two corrections were made in the Git hooks of Metastore.
1) The shebang at the start of the scripts now uses the absolute adress of
our installed bash, not the relative `.local/bin/bash'. Note that it is
possible to use Git within subdirectories and in that scenario, the
`.local' will fail.
2) The `$$user' section was removed from the command to find the user's
group. With the user as an argument, `groups' may print the user's name
first, then their list of groups. When this happens, the script would
be just repeating the user's name. But the raw `groups' command will
list the groups of the running user.
|
|
After testing the built of Metastore on a server, I noticed that because
its `/etc/passwd' doesn't have the list of users, the `getpwuid' call
within metastore failed and wouldn't let it finish.
So I looked into the code and was able to implement a solution to this
problem by adding two options to it for default values for the user and
group. Also, file attributes are not necessary in our (current) use case of
metastore and caused crashes on our server, so they are also disabled.
|
|
Metastore depends on `bsd/string.h' to work properly (atleast on GNU/Linux
systems). The first system I tried building with had that library, so I
didn't notice! With this commit, we also build `libbsd' as part of the
pipeline.
Also, I couldn't find libbsd's version in any of its installed headers, so
like Libjpeg, we can't actually check and will directly write our internal
version into the paper.
|
|
The pipeline heavily depends on file meta data (and in particular the
modification dates), for example the configuration-Makefiles within the
pipeline are set as prerequisites to the rules of the pipeline.
However, when Git checks out a branch, it doesn't preserve the meta-data of
the files unique to that branch (for example program source files or
configuration-Makefiles). As a result, the rules that depend on them will
be re-done.
This is especially troublesome in the scenario of this reproducible paper
project because we commonly need to switch between branches (for example to
import recent work in the pipeline into the projects). After some
searching, I think the Metastore program is the best solution. Metastore is
now built as part of the pipeline and through two Git hooks, it is called
by Git to store the original meta-data of files into a binary file that is
version controlled (and managed by Metastore).
|
|
The TIFF library can optionally depend on webp [1] and zstd [2]. But these
aren't commonly used in scientific datasets so to avoid a longer build and
managing of extra dependencies (atleast for now!), we are disabling
them. The problem is that they cause a dependency on the host system and if
they are updated/removed, the relevant pipeline programs will crash.
[1] https://en.wikipedia.org/wiki/WebP
[2] https://en.wikipedia.org/wiki/Zstandard
|
|
Wget and cURL depend on many network related libraries by default and if
they are present on the host operating system, they will be linked
with. This causes problems for the pipeline when these libraries are
updated on the host system.
With this commit, I went through the configure time options of both Wget
and cURL and removed any library that didn't seem related to merely
downloading of files (possibly with SSL, because we do build OpenSSL in the
pipeline).
Also, I noticed a new version of cURL has come, so that is also updated.
|
|
Git needs cURL in its build. Until now, by chance cURL was always built
before Git, but while building this pipeline on a system, Git was built
before cURL and we found the problem.
I also noticed that we hadn't added `Your name <your@email.address>' to the
`for-group' script. This has been corrected now.
|
|
ccache is a super annoying program in the context of the reproduction
pipeline. On systems that use it, the `gcc' and `g++' that are found in
PATH are actually calls to `ccache' (so it can manage their call)!
Two steps have been taken to ignore and disable ccache (if it isn't ignored
properly!): 1) when making symbolic links to compilers, if a directory
containing `ccache' is present in the PATH, it is first removed, then we
look for the low-level programs that we won't be building. 2) The
`CCACHE_DISABLE' environment variable is set to 1 where necessary (with the
other environment variables).
|
|
Since the current implementation of this pipeline officially started in
2018, all the files only had 2018 in their copyright years. This has now
been corrected to 2018-2019.
|
|
Both Gzip and Gnuastro were being bootstrapped personally from their Git
repository until now. But fortunately a new release of both came out last
week and so to make things standard we are now using their standard
tarballs.
I also noticed that we weren't checking the version of Gzip or mentioning
it in the acknowledgement section. This was also corrected.
|
|
While checking the build of the previous commit, a failure happened when
linking `reproduce/build/dependencies/installed/bin/sh' with the built Bash
(because the symbolic link already existed!). So a `-f' flag was added to
`ln' to just change it without complaining.
I also noticed that the Git build was also not in verbose mode. So this has
also been corrected.
|
|
Some problems with using the number of threads in dependency building were
fixed.
|
|
On Mac OS systems, CFITSIO doesn't use path to find the `curl-config'
program (used by to give the library header and linking options), but uses
an absolute path. Therefore the only way we can ask CFITSIO to look into
our build of cURL is to manually change that absolute address.
Also, since all the libraries are now linked dynamically, we don't need the
extra linking flags when building WCSLIB (so it finds CFITSIO).
|
|
Mac OS's `install_name_tool' program's command is broken up into two lines,
but I had forgotten to add a line-break so the command would fail. I didn't
notice it myself because this error only shows up on Mac OS systems that
actually need to parse it.
|
|
The build systems of Libgit2 and WCSLIB on Mac OS does not account for
installation in non-standard addresses: `Libgit2' keeps the absolute
address of its build directory (not the installation directory) and WCSLIB
doesn't write any absolute address at all (so the system uses the first one
it finds).
To address these issues, we are now using Mac OS's `install_name_tool'
program to fix the absolute path within the installed shared library.
Since the version of the library is actually present in its shared library
name, in `dependency-versions.mk' we have also separated these two
libraries so later when their version is changed, we are careful in
correcting the shared library name also.
|
|
In the previous commit, I forgot to actually add some changes to the
staging area before committing an pushing. So some of the changes discussed
in the previous commit and now commited.
|
|
Make builds the dependencies of each package based on the order in the
prerequisites list. So when building in parallel, it can greatly help the
over-all build speed if larger packages are built first. Therefore the
three larger Gnuastro dependencies are now placed at earlier places of the
prerequisites.
|
|
Some high-level programs like Wget and cURL need to be built in shared mode
because they also include dynamic loading of libraries. Therefore, if we
only build the lower-level libraries in static mode, our own build will be
ignored and they will go and find the system's shared libraries to link
with. Because of this, for now, we have manually set the `static_build'
variable in the configure script to `no'.
Also, if the downloader fails, we'll delete the output (an empty file in
the case of Wget) because it interefers with a target definition.
|
|
The TeX Live installer needs Wget to operate smoothly, especially on recent
Mac OS systems that don't have Wget pre-installed. Also, it would be good
for the pipeline to have its own downloader. So with this commit, the
pipeline also installs Wget and OpenSSL which is a dependency.
Many other small changes/fixes were done in this process.
|