Age | Commit message (Collapse) | Author | Lines |
|
Until now, the initial project scripts were primarily tested with GNU
Bash. But Bash is not generally available on all systems (it has many
features beyond POSIX). Because of this, effectively we were imposing the
requirement on the user that they must have Bash installed. We recently
started this with setting the shebang of `project' and
`reproduce/software/bash/configure.sh' to `/bin/sh'. After doing so, Raul
and Gaspar reported an error on their systems.
To fix the problem, I installed Dash (a minimalist POSIX-compliant shell)
on my computer and temporarily set the shebangs to `/bin/dash', ran the
project configuration step and fixed all issues that came up. With this
commit, it can go all the way to building GCC on my system's Dash. After
this stage (when `high-level.mk' is called), there is no problem, because
we have our own version of GNU Bash and that installed version is used.
Probably some more issues still remain and will hopefully be found in the
future.
While doing this, I also noticed the following two minor issues:
- The `./project configure' option `--input-dir' was not recognized
because it was mistakenly checking `--inputdir'. It has been corrected.
- The test C programs now use the `<<EOF' method instead of `echo'.
- In `basic.mk', the extra space between `syspath' and `:=' was removed
(it was an ancient relic!).
|
|
Until now, the hashbang of these two shell scripts was set to `/bin/bash',
hence assuming that GNU Bash exists on the host system! But this is an
extra requirement on the host operating system and these two scripts should
be written such that they operate on a POSIX shell (the generic `/bin/sh'
which can point to any shell program).
With this commit this has been implemented! We may confront some errors as
the system is run on other systems, but we should fix such errors and work
hard to make these two scripts as POSIX-compatible as possible (runnable on
any shell, so as not to force users to install Bash before running the
project).
This completes Task #15525.
|
|
Until now, the main commands to run the project were these: `./project
configure' (to build the software), `./project prepare' (to possibly
arrange input datasets and build special configuration Makefiles) and
finally `./project make' to run the project.
The main logic behind the "prepare" phase `top-prepare.mk' is to build
configuration files that can be fed into the "make" step and optimize its
operation. For example when the total number of necessary inputs for the
majority of the analysis is not as large as the total number of
inputs. With "prepare" (when necessary), you go through the raw inputs,
select the ones that are necessary for the rest of the project. The output
of `top-prepare.mk' is a configuration file (a Make variable) that keeps
the IDs (numbers, names, etc). That configuration file would then be used
in the `top-make.mk' to identify the lower level targets and allow optimal
project organization and management.
But the last two are both part of the analysis, and while they indeed need
different calls to Make to be executed, many projects don't actually need a
preparation phase: ultimately, its an implementation choice by the project
developers and doesn't concern the project users (or the developers when
they are running it).
To avoid confusing the users, or simply annoying them when a projet doesn't
need it, with this commit, the top-level `top-prepare.mk' and `top-make.mk'
Makefiles are called with the single `./project make' command and
`./project prepare' has been dropped. I noticed this while writing the
paper on this system.
|
|
Until now, the explanation for a missing static C library didn't actually
guide the users to look above and see the error message! So with this
commit, I edited it a little to be more clear (and mention to look
above). Also, I noticed that on Amazon AWS systems, the static C library is
installed as a separate package, so to help the users, I added the
necessary command and some better explanation.
|
|
Until now, the configuration Makefiles (in
`reproduce/software/config/installation' and `reproduce/analysis/config')
had a `.mk' suffix, similar to the workhorse Makefiles. Although they are
indeed Makefiles, but given their nature (to only keep configuration
parameters), it is confusing (especially to early users) for them to also
have a `.mk' (similar to the analysis or software building Makefiles).
To address this issue, with this commit, all the configuration Makefiles
(in those directories) are now given a `.conf' suffix. This is also assumed
for all the files that are loaded.
The configuration (software building) and running of the template have been
checked with this change from scratch, but please report any error that may
not have been noticed.
THIS IS AN IMPORTANT CHANGE AND WILL CAUSE CRASHES OR UNEXPECTED BEHAVIORS
FOR PROJECTS THAT HAVE BRANCHED FROM THIS TEMPLATE. PLEASE CORRECT THE
SUFFIX OF ALL YOUR PROJECT'S CONFIGURATION MAKEFILES (IN THE DIRECTORIES
ABOVE), OTHERWISE THEY AREN'T AUTOMATICALLY LOADED ANYMORE.
|
|
In the previous commmit, I had forgot to add a `\' after the newly added
`sys_library_path' variable to the `high-level.mk' call.
|
|
Until now, GCC wouldn't build properly on Debian-based operating systems
because `ld' needed to link with several necessary C library features like
`crti.o' and `crtn.o' (this is an `ld' issue, not GCC). The solution is to
add the directory containing them to `LIBRARY_PATH'. In the previous
commit, I actually searched for these files, but while testing on another
system, I noticed that it can be problematic (other architectures may
exist).
With this commit, we are actually finding the build architecture of the
running GCC (which is the same as the `ld') and using that to fix a fixed
directory to `LIBRARY_PATH'.
|
|
Until now, to see if a working static C library and `sys/cdefs.h' exist, we
were checking absolute locations like `/usr/include/sys/cdefs.h' or
`/usr/lib/libc.a' and `/usr/lib64/libc.a'. But this is not robust because
on different systems, they can be in different locations.
With this commit, we actually use `find' to find the location of `libc.a'
and use that to add elements to CPPFLAGS and LDFLAGS. This should fix the
problem on systems that have them on non-standard locations.
|
|
Now that its 2020, its necessary to include this year in the copyright
statements.
|
|
While working on a different branch to build the GNU C Library, I noticed a
few places in the template that need corrections which are now applied:
1. A new-line character after the "C compiler works" notice at the start
of the configure script.
2. Removing possible `::' in the `LD_LIBRARY_PATH' definition of
`basic.mk'. Note that its not necessary in the other steps because we
don't use any outside-defined `LD_LIBRARY_PATH'.
3. Building GMP for C++ and also with `--enable-fat'.
4. Removing the unpacked Perl tarball directory after its installation.
|
|
Since ImageMagick can take long to build, we are now building it in
parallel. Also, the part where we replace an `_' with `\_' in the software
version at the end of the configure script was removed. It is more
clear/readable that the actual rule that includes such a name deals with
the underline (as is the case for `sip_tpv' which already dealt with it).
Finally, I noticed that the checks at the start of `top-prepare' were
missing new-lines. I had forgot that the Make single-shell variable isn't
activated in this stage yet.
|
|
In many real-world scenarios, `./project make' can really benefit from
having some basic information about the data before being run. For example
when quering a server. If we know how many datasets were downloaded and
their general properties, it can greatly optmize the process when we are
designing the solution to be run in `./project make'.
Therefore with this commit, a new phase has been added to the template's
design: `./project prepare'. In the raw template this is empty, because the
simple analysis done in the template doesn't warrant it. But everything is
ready for projects using the template to add preparation phases prior to
the analysis.
|
|
It was some time since these three software were not updated! With this
commit the template now uses the most recent stable release of these
packages.
Also, the hosting server for ImageMagick was moved to my own webpage
because unfortunately ImageMagick removes its tarballs from its own
version.
|
|
Users that are not familiar with the file structure of the project may
specify the current directory (to-level source directory) as their
build-directory. This will cause a crash right after answering the
questions, where `rm' will complain about `tex/build' not being deleted
because it exists as a directory.
To avoid such confusing situtations, the configure script now checks if the
build directory is actually a sub-directory of the source. If it is, it
will complain with a short message and abort. Also, a `CAUTION' statment
has been put in the initial description, right ontop of the question.
This bug was reported Carlos Allende Prieto and David Valls-Gabaud.
|
|
Until now, when building PatchELF, we would always require that it be done
statically. However, some systems don't have a static C library available
for linking. This cause a crash in the static building of PatchELF. But a
static PatchELF is necessary for correcting RPATH in GCC's outputs.
With this commit, in the configure script we check if a static C library is
linkable for the compiler. If it isn't then `host_cc' will be set to 1 and
GCC won't be built. We also pass the result of this test to `basic.mk'
(through `good_static_lib'), so if a static C library isn't available, it
builds a dynamically linked PatchELF.
This bug was reported by Elham Saremi.
|
|
Until now, the Fortran compiler check wouldn't delete the files it creates
in the temporary software building directory.
With this commit, the cleaning steps have been added.
|
|
Until now, we were just checking for the existance of a C and Fortran
compiler. But it can happen that even if they exist, they don't operate
properly, for example see some errors that have been reported until now in
P.S. (both on different macOS systems). But finding this source after the
programs have started is frustrating for the user.
With this commit, before we start building anything, we'll check these two
compilers with a simple program and see if they can indeed compile, and if
their compiled program can run. If it doesn't work an elaborate error
message is printed to help the users navigate to a solution.
Also, the building of `flock' within `configure.sh' has been moved just
before calling `basic.mk'. This was done so any warning/error message
is printed before actually building anything.
This fixes bug #56715.
P.S. The error messages:
C compiler
----------
conftest.c:9:19: fatal error: stdio.h: No such file or directory
^
compilation terminated.
----------
Fortran compiler
----------------
dyld: Library not loaded: @rpath/libisl.10.dylib
Referenced from:
/path/to/anaconda2/gcc/libexec/gcc/x86_64-apple-darwin15.5.0/4.9.3/f951
Reason: image not found
gfortran: internal compiler error: Abort trap: 6
----------------
|
|
Until now the only way to define the environment of the Make recipes was
through the exported Make variables (mostly in `initialize.mk' for the
analysis steps for example). However, there is only so much you can do with
environment variables! In some situations you want slightly more
complicated environment control, like setting an alias or running of
scripts (things that are commonly done in the `~/.bashrc' file of users to
configure their interactive, non-login shells).
With this commit, a `reproduce/software/bash/bashrc.sh' has been defined
for this job (which is currently empty!). Every major Make step of the
project adds this file as the `BASH_ENV' environment variable, so the shell
that is created to execute a recipe first executes this file, then the
recipe. Each top-level Makefile also defines a `PROJECT_STATUS' environment
variable that enables users to limit their envirnoment setup based on the
condition it is being setup (in particular in the early phase of
`basic.mk', where the user can't make any assumption about the programs and
has to write a portable shell script).
|
|
Until now, there was no check on the integrity of the contents of the
downloaded/copied software tarballs, we only relied on the tarball
name. This could be bad for reproducibility and security, for example on
one server the name of a tarball may be the same but with different
content.
With this commit, the SHA512 checksums of all the software are stored in
the newly created `checksums.mk' (similar to how the versions are stored in
the `versions.mk'). The resulting variable is then defined for each
software and after downloading/copying the file we check to see if the new
tarball has the same checksum as the stored value. If it doesn't the script
will crash with an error, informing the user of the problem.
The only limitation now is a bootstrapping problem: if the host system
doesn't already an `sha512sum' executable, we will not do any checksum
verification until we install our `sha512sum' (as part of GNU
Coreutils). All the tarballs downloaded after GNU Coreutils are built will
have their checksums validated. By default almost all GNU/Linux systems
will have a usable `sha512sum' (its part of GNU Coreutils after all for a
long time: from the Coreutils Changelog file atleast since 2013).
This completes task #15347.
|
|
The configuration step (building all the ncessary software) can take some
time. It is natual for the user to want to see how the build is going
(which software is being built at every moment). So far, we have only put a
"Inspecting status" section in `README-hacking.md' that describes a
solution, but some early users may not have read it yet.
With this commit a short tip was added in the initial installation notice
to inform the user of this very useful command.
|
|
Until now, to work on a project, it was necessary to `./configure' it and
build the software. Then we had to run `.local/bin/make' to run the project
and do the analysis every time. If the project was a shared project between
many users on a large server, it was necessary to call the `./for-group'
script.
This way of managing the project had a major problem: since the user
directly called the lower-level `./configure' or `.local/bin/make' it was
not possible to provide high-level control (for example limiting the
environment variables). This was especially noticed recently with a bug
that was related to environment variables (bug #56682).
With this commit, this problem is solved using a single script called
`project' in the top directory. To configure and build the project, users
can now run these commands:
$ ./project configure
$ ./project make
To work on the project with other users in a group these commands can be
used:
$ ./project configure --group=GROUPNAME
$ ./project make --group=GROUPNAME
The old options to both configure and make the project are still valid. Run
`./project --help' to see a list. For example:
$ ./project configure -e --host-cc
$ ./project make -j8
The old `configure' script has been moved to
`reproduce/software/bash/configure.sh' and is called by the new `./project'
script. The `./project' script now just manages the options, then passes
control to the `configure.sh' script. For the "make" step, it also reads
the options, then calls Make. So in the lower-level nothing has
changed. Only the `./project' script is now the single/direct user
interface of the project.
On a parallel note: as part of bug #56682, we also found out that on some
macOS systems, the `DYLD_LIBRARY_PATH' environment variable has to be set
to blank. This is no problem because RPATH is automatically set in macOS
and the executables and libraries contain the absolute address of the
libraries they should link with. But having `DYLD_LIBRARY_PATH' can
conflict with some low-level system libraries and cause very hard to debug
linking errors (like that reported in the bug report).
This fixes bug #56682.
|
|
Until now, the software building and analysis steps of the pipeline were
intertwined. However, these steps (of how to build a software, and how to
use it) are logically completely independent.
Therefore with this commit, the pipeline now has a new architecture
(particularly in the `reproduce' directory) to emphasize this distinction:
The `reproduce' directory now has the two `software' and `analysis'
subdirectories and the respective parts of the previous architecture have
been broken up between these two based on their function. There is also no
more `src' directory. The `config' directory for software and analysis is
now mixed with the language-specific directories.
Also, some of the software versions were also updated after some checks
with their webpages.
This new architecture will allow much more focused work on each part of the
pipeline (to install the software and to run them for an analysis).
|