Age | Commit message (Collapse) | Author | Lines |
|
The main problems with this dataset was the names of the journals (which
sometimes have single quotes or apostrophes in them that is really annoying
for SED)! But ultimately, for the simple study we want to do here, the
journal names are irrelevant, so in the end I just ignored the names. Later
we can set an identifier for the journals if necessary.
But now we have the basic information in a way that is usable in a plot to
show in this paper.
|
|
Now that its 2020, its necessary to include this year in the copyright
statements.
|
|
While working on a different branch to build the GNU C Library, I noticed a
few places in the template that need corrections which are now applied:
1. A new-line character after the "C compiler works" notice at the start
of the configure script.
2. Removing possible `::' in the `LD_LIBRARY_PATH' definition of
`basic.mk'. Note that its not necessary in the other steps because we
don't use any outside-defined `LD_LIBRARY_PATH'.
3. Building GMP for C++ and also with `--enable-fat'.
4. Removing the unpacked Perl tarball directory after its installation.
|
|
Since ImageMagick can take long to build, we are now building it in
parallel. Also, the part where we replace an `_' with `\_' in the software
version at the end of the configure script was removed. It is more
clear/readable that the actual rule that includes such a name deals with
the underline (as is the case for `sip_tpv' which already dealt with it).
Finally, I noticed that the checks at the start of `top-prepare' were
missing new-lines. I had forgot that the Make single-shell variable isn't
activated in this stage yet.
|
|
In many real-world scenarios, `./project make' can really benefit from
having some basic information about the data before being run. For example
when quering a server. If we know how many datasets were downloaded and
their general properties, it can greatly optmize the process when we are
designing the solution to be run in `./project make'.
Therefore with this commit, a new phase has been added to the template's
design: `./project prepare'. In the raw template this is empty, because the
simple analysis done in the template doesn't warrant it. But everything is
ready for projects using the template to add preparation phases prior to
the analysis.
|
|
It was some time since these three software were not updated! With this
commit the template now uses the most recent stable release of these
packages.
Also, the hosting server for ImageMagick was moved to my own webpage
because unfortunately ImageMagick removes its tarballs from its own
version.
|
|
Users that are not familiar with the file structure of the project may
specify the current directory (to-level source directory) as their
build-directory. This will cause a crash right after answering the
questions, where `rm' will complain about `tex/build' not being deleted
because it exists as a directory.
To avoid such confusing situtations, the configure script now checks if the
build directory is actually a sub-directory of the source. If it is, it
will complain with a short message and abort. Also, a `CAUTION' statment
has been put in the initial description, right ontop of the question.
This bug was reported Carlos Allende Prieto and David Valls-Gabaud.
|
|
Until now, when building PatchELF, we would always require that it be done
statically. However, some systems don't have a static C library available
for linking. This cause a crash in the static building of PatchELF. But a
static PatchELF is necessary for correcting RPATH in GCC's outputs.
With this commit, in the configure script we check if a static C library is
linkable for the compiler. If it isn't then `host_cc' will be set to 1 and
GCC won't be built. We also pass the result of this test to `basic.mk'
(through `good_static_lib'), so if a static C library isn't available, it
builds a dynamically linked PatchELF.
This bug was reported by Elham Saremi.
|
|
Until now, the Fortran compiler check wouldn't delete the files it creates
in the temporary software building directory.
With this commit, the cleaning steps have been added.
|
|
Until now, we were just checking for the existance of a C and Fortran
compiler. But it can happen that even if they exist, they don't operate
properly, for example see some errors that have been reported until now in
P.S. (both on different macOS systems). But finding this source after the
programs have started is frustrating for the user.
With this commit, before we start building anything, we'll check these two
compilers with a simple program and see if they can indeed compile, and if
their compiled program can run. If it doesn't work an elaborate error
message is printed to help the users navigate to a solution.
Also, the building of `flock' within `configure.sh' has been moved just
before calling `basic.mk'. This was done so any warning/error message
is printed before actually building anything.
This fixes bug #56715.
P.S. The error messages:
C compiler
----------
conftest.c:9:19: fatal error: stdio.h: No such file or directory
^
compilation terminated.
----------
Fortran compiler
----------------
dyld: Library not loaded: @rpath/libisl.10.dylib
Referenced from:
/path/to/anaconda2/gcc/libexec/gcc/x86_64-apple-darwin15.5.0/4.9.3/f951
Reason: image not found
gfortran: internal compiler error: Abort trap: 6
----------------
|
|
Until now the only way to define the environment of the Make recipes was
through the exported Make variables (mostly in `initialize.mk' for the
analysis steps for example). However, there is only so much you can do with
environment variables! In some situations you want slightly more
complicated environment control, like setting an alias or running of
scripts (things that are commonly done in the `~/.bashrc' file of users to
configure their interactive, non-login shells).
With this commit, a `reproduce/software/bash/bashrc.sh' has been defined
for this job (which is currently empty!). Every major Make step of the
project adds this file as the `BASH_ENV' environment variable, so the shell
that is created to execute a recipe first executes this file, then the
recipe. Each top-level Makefile also defines a `PROJECT_STATUS' environment
variable that enables users to limit their envirnoment setup based on the
condition it is being setup (in particular in the early phase of
`basic.mk', where the user can't make any assumption about the programs and
has to write a portable shell script).
|
|
Until now, there was no check on the integrity of the contents of the
downloaded/copied software tarballs, we only relied on the tarball
name. This could be bad for reproducibility and security, for example on
one server the name of a tarball may be the same but with different
content.
With this commit, the SHA512 checksums of all the software are stored in
the newly created `checksums.mk' (similar to how the versions are stored in
the `versions.mk'). The resulting variable is then defined for each
software and after downloading/copying the file we check to see if the new
tarball has the same checksum as the stored value. If it doesn't the script
will crash with an error, informing the user of the problem.
The only limitation now is a bootstrapping problem: if the host system
doesn't already an `sha512sum' executable, we will not do any checksum
verification until we install our `sha512sum' (as part of GNU
Coreutils). All the tarballs downloaded after GNU Coreutils are built will
have their checksums validated. By default almost all GNU/Linux systems
will have a usable `sha512sum' (its part of GNU Coreutils after all for a
long time: from the Coreutils Changelog file atleast since 2013).
This completes task #15347.
|
|
The configuration step (building all the ncessary software) can take some
time. It is natual for the user to want to see how the build is going
(which software is being built at every moment). So far, we have only put a
"Inspecting status" section in `README-hacking.md' that describes a
solution, but some early users may not have read it yet.
With this commit a short tip was added in the initial installation notice
to inform the user of this very useful command.
|
|
Until now, to work on a project, it was necessary to `./configure' it and
build the software. Then we had to run `.local/bin/make' to run the project
and do the analysis every time. If the project was a shared project between
many users on a large server, it was necessary to call the `./for-group'
script.
This way of managing the project had a major problem: since the user
directly called the lower-level `./configure' or `.local/bin/make' it was
not possible to provide high-level control (for example limiting the
environment variables). This was especially noticed recently with a bug
that was related to environment variables (bug #56682).
With this commit, this problem is solved using a single script called
`project' in the top directory. To configure and build the project, users
can now run these commands:
$ ./project configure
$ ./project make
To work on the project with other users in a group these commands can be
used:
$ ./project configure --group=GROUPNAME
$ ./project make --group=GROUPNAME
The old options to both configure and make the project are still valid. Run
`./project --help' to see a list. For example:
$ ./project configure -e --host-cc
$ ./project make -j8
The old `configure' script has been moved to
`reproduce/software/bash/configure.sh' and is called by the new `./project'
script. The `./project' script now just manages the options, then passes
control to the `configure.sh' script. For the "make" step, it also reads
the options, then calls Make. So in the lower-level nothing has
changed. Only the `./project' script is now the single/direct user
interface of the project.
On a parallel note: as part of bug #56682, we also found out that on some
macOS systems, the `DYLD_LIBRARY_PATH' environment variable has to be set
to blank. This is no problem because RPATH is automatically set in macOS
and the executables and libraries contain the absolute address of the
libraries they should link with. But having `DYLD_LIBRARY_PATH' can
conflict with some low-level system libraries and cause very hard to debug
linking errors (like that reported in the bug report).
This fixes bug #56682.
|
|
Until now, the software building and analysis steps of the pipeline were
intertwined. However, these steps (of how to build a software, and how to
use it) are logically completely independent.
Therefore with this commit, the pipeline now has a new architecture
(particularly in the `reproduce' directory) to emphasize this distinction:
The `reproduce' directory now has the two `software' and `analysis'
subdirectories and the respective parts of the previous architecture have
been broken up between these two based on their function. There is also no
more `src' directory. The `config' directory for software and analysis is
now mixed with the language-specific directories.
Also, some of the software versions were also updated after some checks
with their webpages.
This new architecture will allow much more focused work on each part of the
pipeline (to install the software and to run them for an analysis).
|