Age | Commit message (Collapse) | Author | Lines |
|
On systems that allow it (like GNU/Linux systems), Maneage will build the
necessary software in shared memory (a directory that is actually in the
RAM, not on an SSD/HDD, on GNU/Linux systems, it is '/dev/shm'). This
allows Maneage to operate faster and not harm the HDD/SSD with all the
temporary writing of many small files.
Until now, we would only check that this directory exists and that it has
enough space. However, some systems also set the 'noexec' flag on shared
memory for security reasons [1]. This causes Maneage to crash upon building
of the software in later phases.
With this commit, at the very start of the configuration step, and after
all other shared-memory checks are done, a dummy executable script file is
created there and its execution is tested. If it doesn't work, shared
memory will not be used at all.
In the process, the steps dealing with the software building directory in
the configure script have been brought in one place and comments were added
to further clarify every step.
This commit was initially done by Boud Roukema and later edited by Mohammad
Akhlaghi.
[1] https://web.archive.org/web/20210624192819/https://serverfault.com/questions/72356/how-useful-is-mounting-tmp-noexec
|
|
Once a year, the texlive update system becomes incompatible with the
version from the previous year. Since a texlive install failure is
considered non-fatal by 'high-level.mk', so until now, the user could miss
the printed message and mistakenly believe that the configure is valid.
This commit explicitly adds a 10-second delay that should be enough for a
user who does the 'configure --existing-conf' step alone to notice that
there is a TeX Live problem. It also adds the explicit instruction of how
to allow an update from an earlier year's texlive installer to the warning
message (by deleting '.build/software/tarballs/install-tl-unx.tar.gz'). I
had to rediscover this a few times for old Maneage installs.
Also, a few lines in 'reproduce/software/shell/configure.sh' were indented
with a TAB (that is not recommended because TAB is displayed with different
widths on different browsers). So while doing this commit, those TABs were
also converted to a space.
|
|
When built in 'group' mode, the write permissions of all created files will
be activated for a certain group of users in the host operating system. The
user specifies the name of the group with the '--group' option at configure
time. At the very start, the './project' script checks to see if the given
group name actually exists or not (to avoid hard-to-debug errors popping up
later).
Until now, the checking 'sg' command (that was used to build the project
with group-writable permissions) would always fail due to the excessive
number of redirections. Therefore, it would always print the error message
and abort.
With this commit, the output of 'sg' is no longer re-directed (which also
helps users in debuggin). If the group does actually exist, it will just
print a small statement saying so, and if it fails, the error message is
printed. This fixed the problem, allowing maneage to be built in
group-mode.
I also noticed that the variable name keeping the group name
('reproducible_paper_group_name') used the old name for the project (which
was "Reproducible paper template"! So it has been changed/corrected to
'maneage_group_name'.
|
|
Until now the SWIG software would use the host operating system's packages
to find the TCL configuraiton (which we don't install yet in Maneage). In
particular, you can see the error during its configuration here:
....
checking for pkg-config... pkg-config
checking for Tcl configuration... found /usr/lib/tclConfig.sh
/usr/lib/tclConfig.sh: line 2: dpkg-architecture: command not found
/usr/lib//tcl8.6/tclConfig.sh: line 2: dpkg-architecture: com. not found
With this commit, TCL has been disabled when building SWIG with the
'--without-tcl' option. Later, when we add TCL in Maneage, we can remove
this option.
|
|
With a recent update of macOS systems (macOS Big Sur 11.2.3 and Xcode
12.4), there are many warnings when building C programs (for example the
simple program we compile to check the compiler, or some of the software
like `gzip'). It prints hundreds of warning lines for every source file
that are irrelevant for our builds, but really clutters the output.
With this commit, these warnings are disabled by adding
`-Wno-nullability-completeness' to the 'CPPFLAGS' environment
variable. This has also been added to the very first check of the C
compiler in the configure step.
|
|
Until now, each time there was a problem in the configuration of Maneage'd
projects and debugging was necessary, we had to take the following changes:
- Run the configuration on a single thread ('-j1') to see the building of
only the problematic software.
- Disable the Zenodo check manually by commenting those parts of
'reproduce/software/shell/configure.sh'. Because the internet connection
wastes a few seconds and is thus very annoying during repeated runs!
- Manually remove the '-k' option that was passed to Make (when building
the software). With the '-k', Make keeps going with the execution of
other targets if something crashes and this usually causes confusions
during the debugging.
Doing the manual changes within the code was both very annoying and prone
to errors (forgetting to correct it!).
With this commit, the existing '--debug' option has been generalized to the
software configuration phase of Maneage also. Until now, it was only
available in the analysis phase (and would directly be passed to the 'make'
command that would run the analysis). When this option is used, and the
project is in the software configuration phase, the Zenodo check won't be
done, it will use one single thread ('-j1'), and it will stop the execution
as soon as an error occurs (Make is not run with '-k').
|
|
Until now when making a link to the system's 'dl' and 'pthread' libraries
we were simply linking the installed location on the system (in
'/usr/lib'). However, in some systems, these may themselves be links to
other locations and this could cause linking problems.
With this commit, we now use 'realpath' to extract the absolute address of
the final file that the libraries may link to, and directly link to them.
A minor cosmetic correction was also made in the build rule for CFITSIO:
the long line was broken into two!
|
|
Until now, the build directory contained a 'software/' directory (that
hosted all the built software), a 'tex/' subdirectory for the final
building of the paper, and many other directories containing
intermediate/final data of the specific project. But this mixing of built
software and data is against our modularity and minimal complexity
principles: built software and built data are separate things and keeping
them separate will enable many optimizations.
With this commit, the build directory of the core Maneage branch will only
contain two sub-directories: 'software/' and 'analysis/'. The 'software/'
directory has the same contents as before and is not touched in this
commit. However, the 'analysis/' directory is new and everything created in
the './project make' phase of the project will be created inside of this
directory.
To facilitate easy access to these top-level built directories, two new
variables are defined at the top of 'initialize.mk': 'badir', which is
short for "built-analysis directory" and 'bsdir', which is short for
"built-software directory".
HOW TO IMPLEMENT THIS CHANGE IN YOUR PROJECT. It is easy: simply replace
all occurances of '$(BDIR)' in your project's subMakefiles (except the ones
below) to '$(badir)'. To confirm if everything is fine before building your
project from scratch after merging, you can run the following command to
see where 'BDIR' is used and confirm the only remaning cases.
$ grep -r BDIR reproduce/analysis/*
--> make/verify.mk: innobdir=$$(echo $$infile | sed -e's|$(BDIR)/||g'); \
--> make/initialize.mk:badir=$(BDIR)/analysis
--> make/initialize.mk:bsdir=$(BDIR)/software
--> make/initialize.mk: $$sys_rm -rf $(BDIR)
--> make/top-prepare.mk:all: $(BDIR)/software/preparation-done.mk
'BDIR' should only be present in lines of the files above. If you see
'$(BDIR)' used anywhere else, simply change it to '$(badir)'. Ofcourse, if
your project assumes BDIR in other contexts, feel free to keep it, it will
not conflict. If anything un-expected happens, please post a comment on the
link below (you need to be registered on Savannah to post a comment):
https://savannah.nongnu.org/task/?15855
One consequence of this change is that the 'analysis/' subdirectory can be
optionally mounted on a separate partition. The need for this actually came
up for some new users of Maneage in a Docker image. Docker can fix
portability problems on systems that we haven't yet supported (even
Windows!), or had a chance to fix low-level issues on. However, Docker
doesn't have a GUI interface. So to see the built PDF or intermediate data,
it was necessary to copy the built data to the host system after every
change, which is annoying during working on a project. It would also need
two copies of the source: one in the host, one in the container. All these
frustrations can be fixed with this new feature.
To describe this scenario, README.md now has a new section titled "Only
software environment in the Docker image". It explains step-by-step how you
can make a Docker image to only host the built software environment. While
your project's source, software tarballs and 'BDIR/analysis' directories
are on your host operating system. It has been tested before this commit
and works very nicely.
|
|
Until now, when building GNU Binutils on GNU Linux operating systems, we
would simply put a link to the host's core C library components (the
'*crt*' files). However, the symbolic link wasn't "forced"! So if it
already existed in the build directory, it would crash.
With this commit a '-f' option has been added to the 'ln' command and this
fixed the problem.
This bug was reported by Zahra Sharbaf.
|
|
After correctly setting Less to depend on 'ncurses', I noticed its still
not linking to Maneage's 'ncurses', but pointing to my host system's
'ncurses' (that happens to have the same version! So it would crash on a
system with a different version). This shows that like some other software,
we need to manually correct the RPATH inside Less.
With this command, the necessary call to 'patchelf' has been added and with
it, the installed 'less' command properly linked to Maneage's internal
build of 'ncurses'.
|
|
Until now, the 'less' software package (used to view large files easily on
the command-line and used by Git for things like 'git diff' or 'git log')
only depended on 'patchelf' (which is a very low-level software).
However, as Boud reported in bug #59811 [1], building less would crash with
an error saying "Cannot find terminal libraries" in some systems (including
the proposed Docker image of 'README.md' which I confirmed
afterwards). Looking into the 'configure' script of 'less', I noticed that
'less' is actually just checking for some functions provided by the ncurses
library!
With this commit, 'less' depends on 'ncurses'. I was able to confirm that
with this change, 'less' successfully builds within the Docker image.
[1] https://savannah.nongnu.org/bugs/?59811
|
|
Having entered 2021, it was necessary to update the copyright years at the
top of the source files. We recommend that you do this for all your
project-specific source files also.
|
|
Until now, when building the high-level (optional) software, we would give
both 'CPPFLAGS' and 'C_INCLUDE_PATH' the same value/directory in
'high-level.mk'. But we recently found that on macOS's C compiler
('clang'), if a directory is included in both 'CPPFLAGS' and
'C_INCLUDE_PATH', then that directory is ignored in 'CPPFLAGS' (which has
higher priority). This caused linking problems when the version of a
software on the host was different from the Maneage version.
With this commit, 'C_INCLUDE_PATH' is not set on macOS any more and this
fixed the problem on the reported systems.
This bug was fixed with the help of Mohammad Akhlaghi and Mahdieh Navabi.
|
|
Less is rarely used in non-interactive mode and is primarily intended for
interactively viewing large files. So its need within Maneage (for batch
processing) wasn't often felt until now. However, when running './project
shell' (which completely closes-off the outside environment), or building a
Maneage'd project within a minimal container that doesn't have less, it
becomes hard to use Git (and in particular its 'diff' output which depends
on 'less').
With this commit, Less has been added as a dependency of Git in
'basic.mk'. In total its built product is roughly 800KB and builds within a
second or two. So it isn't a burden on any project. But it can be very
useful when the projects are being developed within the Maneage environment
itself.
|
|
In a recent build on a macOS, we recognized that Texinfo needs the
'libintl.h' headers of Gettext. However, Gettext depends on M4, and until
now we had set M4 to depend on Texinfo. Therefore adding Gettext as a
dependency of Texinfo would cause a circular dependency.
On the macOS, we temporarily disabled M4's Texinfo dependency, and the
build went through. I also checked on my GNU/Linux system: temporarily
renamed all Texinfo built files from my system and done a clean build of M4
and it succeeded. To be further safe, I built Maneage from this commit
(where M4 doesn't depend on Texinfo) in a Docker container, and it went
through with no problems. So the current M4 version indeed doesn't need
Texinfo. I think adding Texinfo as a dependency of M4 was a historic issue
from the early days.
In the process, I also cleaned 'basic.mk' a little:
- A "# Level N" comment was added on top of each group of software that
can be built in parallel (generally).
- GNU Nano was moved to the end of the file (to be "Level 6").
- Some comments were edited in some places.
|
|
After a fresh build of Maneage with a newly downloaded TeXLive, I noticed
that it is complaining about not finding 'xstring.sty', apparently some
package that depeneded on it is no longer including it itself!
It is thus now added to the packages that are built by Maneage's TeXLive.
|
|
Until now, the core Maneage branch included some configuration files for
Gnuastro's programs. This was actually a remnant of the distant past when
Maneage didn't actually build its own software and we had to rely on the
host's software versions. This file contained the configuration files
specific to Gnuastro for this project and also had a feature to avoid
checking the host's own configuration files.
However, we now build all our software ourselves with fixed configuration
files (for the version that is being installed and its version is
stored). So those extra configuration files were just extra and caused
confusion and problems in some scenarios. With this commit, those extra
files are now removed.
Also, two small issues are also addressed in parallel with this commit:
- When running './project make clean', the 'hardware-parameters.tex' macro
file (which is created by './project configure' is not deleted.
- The project title is now written into the default output's PDF's
properties (through 'hypersetup' in 'tex/src/preamble-header.tex')
through the LaTeX macro.
All these issues were found and fixed with the help of Samane Raji.
|
|
Until now, during the configure step it was checked if the host Operative
System were GNU/Linux, and if not, we assumed it is macOS. However, it can
be any other different OS! With this commit, now we explicity check if the
system is GNU/Linux or Darwin (macOS). If it is not any of them, a warning
message says to the user that the host system is different from which we
have checked so far (and invite to contact us if there is any problem).
In addition to this, if the system is macOS, now it checks if Xcode is
already installed in the host system. If it is not installed, a warning
message informs the user to do that in case a problem/crash in the
configure step occurs. We have found that it is convenient to have Xcode
installed in order to avoid some problems.
|
|
Tcl/Tk are a set of tools to provide Graphic User Interface (GUI) support
in some software. But they are not yet natively built within Maneage,
primarily because we have higher-priority work right now. GUI tools in
general aren't high on our priority list right now because GUI tools are
generally good for human interaction (which is contrary to the reproducible
philosophy), not automatic analysis (a core concept in reproducibility). So
even later, when we do include Tcl/Tk in Maneage, their direct usage will
be discouraged.
Until this commit, because we don't yet build Tcl/Tk, the default maneage
install of the statistical package R failed on a Debian Stretch, with 6227
repeats of the line:
'/usr/lib//tcl8.5/tclConfig.sh: line 2: dpkg-architecture:
command not found'
To fix this problem (atleast until Tcl/Tk is installed within Maneage), R
is now configured with the '--without-tcltk' option which fixed the
problem. Please see the description above the R installation instructions
in 'reproduce/software/make/high-level.mk' for more.
|
|
Following the previous commit, we recognized that the 'IFS' terms are not
necessary and can be even cause problems. So all their occurances in the
scripts of Maneage have been removed with this commit.
|
|
Until a recent commit, the IFS='"' was added at the start of the variables
in this shell script and as a result, the SPACE character wasn't being used
as a delimiter. This caused a major problem when downloading the tarballs
(all the backup servers were considered as the top link).
With this commit we removed these 'IFS' statements). Because we now check
for the existance of meta-characters in the build directory name, there is
no more problem, and also generally both the calling command and
internally, we have double-qutations around the variable names. So removal
of IFS will not affect the result in this scenario.
This bug was found by Mohammadreza Khellat.
|
|
While a project is under development, the raw analysis software are not the
only necessary software in a project. We also need tools to all the edit
plain-text files within the Maneaged project. Usually people use their
operating system's plain-text editor. However, when working on the project
on a new computer, or in a container, the plain-text editors will have
different versions, or may not be present at all! This can be very annoying
and frustrating!
With this commit, Maneage now installs GNU Nano as part of the basic
tools. GNU Nano is a very simple and small plain text editor (the installed
size is only ~3.5MB, and it is friendly to new users). Therefore, any
Maneaged project can assume atleast Nano will be present (in particular
when no editor is available on the running system!). GNU Emacs and VIM
(both without extra dependencies, in particular without GUI support) are
also optionally available in 'high-level.mk' (by adding them to
'TARGETS.conf').
The basic idea for the more advanced editors (Emacs and VIM) is that
project authors can add their favorite editor while they are working on the
project, but upon publication they can remove them from 'TARGETS.conf'.
A few other minor things came up during this work and are now also fixed:
- The 'file' program and its libraries like 'libmagic' were linking to
system's 'libseccomp'! This dependency then leaked into Nano (which
depends on 'libmagic'). But this is just an extra feature of 'file',
only for the Linux kernel. Also, we have no dependency on it so far. So
'file' is not configured to not build with 'libseccomp'.
- A typo was fixed in the line where the physical core information is
being read on macOS.
- The top-level directories when running './project shell' are now quoted
(in case they have special characters).
|
|
Until now, no machine-related specifications were being documented in the
workflow. This information can become helpful when observing differences in
the outcome of both software and analysis segments of the workflow by
others (some software may behave differently based on host machine).
With this commit, the host machine's 'hardware class' and 'byte-order' are
collected and now available as LaTeX macros for the authors to use in the
paper. Currently it is placed in the acknowledgments, right after
mentioning the Maneage commit.
Furthermore, the project and configuration scripts are now capable of
dealing with input directory names that have SPACE (and other special
characters) by putting them inside double-quotes. However, having spaces
and metacharacters in the address of the build directory could cause
build/install failure for some software source files which are beyond the
control of Maneage. So we now check the user's given build directory
string, and if the string has any '@', '#', '$', '%', '^', '&', '*', '(',
')', '+', ';', and ' ' (SPACE), it will ask the user to provide a different
directory.
|
|
Until now, if the software source tarballs already existed on the system
they would be copied inside the project. However, the software source
tarballs are sometimes/mostly larger than their actual product and can
consume significant space (~375 MB in the core branch!).
With this commit, when the software are present on the system, their
symbolic link will be placed in 'BDIR/software/tarballs', not a full
copy. Also, because the tarballs in software tarball directory may
themselves be links, we use 'realpath' to find the final place of the
actual file and link to that location. Therefore if 'realpath' can't be
found (prior to installing Coreutils in Maneage), we will copy the tarballs
from the given software tarball directory. After Maneage has installed
Coreutils, the project's own 'realpath' will be used. Of course, if the
software are downloaded, their full downloaded copy will be kept in
'BDIR/software/tarballs', nothing has changed in the downloading scenario.
|
|
It was a long time that the Maneage software versions hadn't been updated.
With this commit, the versions of all basic software were checked and 17 of
that had newer versions were updated. Also, 16 high-level programs and
libraries were updated as well as 7 Python modules. The full list is
available below.
Basic Software (affecting all projects)
---------------------------------------
bash 5.0.11 -> 5.0.18
binutils 2.32 -> 2.35
coreutils 8.31 -> 8.32
curl 7.65.3 -> 7.71.1
file 5.36 -> 5.39
gawk 5.0.1 -> 5.1.0
gcc 9.2.0 -> 10.2.0
gettext 0.20.2 -> 0.21
git 2.26.2 -> 2.28.0
gmp 6.1.2 -> 6.2.0
grep 3.3 -> 3.4
libbsd 0.9.1 -> 0.10.0
ncurses 6.1 -> 6.2
perl 5.30.0 -> 5.32.0
sed 4.7 -> 4.8
texinfo 6.6 -> 6.7
xz 5.2.4 -> 5.2.5
Custom programs/libraries
-------------------------
astrometrynet 0.77 -> 0.80
automake 0.16.1 -> 0.16.2
bison 3.6 -> 3.7
cfitsio 3.47 -> 3.48
cmake 3.17.0 -> 3.18.1
freetype 2.9 -> 2.10.2
gdb 8.3 -> 9.2
ghostscript 9.50 -> 9.52
gnuastro 0.11 -> 0.12
libgit2 0.28.2 -> 1.0.1
libidn 1.35 -> 1.36
openmpi 4.0.1 -> 4.0.4
R 3.6.2 -> 4.0.2
python 3.7.4 -> 3.8.5
wcslib 6.4 -> 7.3
yaml 0.2.2 -> 0.2.5
Python modules
--------------
cython 0.29.6 -> 0.29.21
h5py 2.9.0 -> 2.10.0
matplotlib 3.1.1 -> 3.3.0
mpi4py 3.0.2 -> 3.0.3
numpy 1.17.2 -> 1.19.1
pybind11 2.4.3 -> 2.5.0
scipy 1.3.1 -> 1.5.2
|
|
When the host C compiler is used (either by calling '--host-cc' or on OSs
that we can't build the GNU C Compiler), Maneage will also not build the
Fortran compiler 'gfortran'. Until now, the './project configure' script
would give a big warning about the need for 'gfortran' and the fact that it
is missing, and would for 5 seconds, but it would continue anyway.
For projects that don't need 'gfortran', this can be confusing to the users
and for those that need 'gfortran', it means that a lot of time and cpu
cycles are wasted compiling non-fortran software that are unusable in the
end.
With this commit, the 'need_gfortarn' variable has been added
'reproduce/software/shell/configure.sh', in a new part that is devoted to
project-specific features. If it equals '0', then the 'gfortran' test (and
message!) isn't done at all, but if it is set to '1', then the configure
stage will halt immediately gfortran is not found and not built.
The default operations of the core Maneage branch don't need 'gfortran', so
by default it is set to 0. But 'gfortran' is necessary for all projects
that use Numpy (Python's numeric library) for example. So if your project
needs 'gfortran', please set this new variable to 1. As mentioned in the
comments of 'configure.sh', ideally we should detect this automatically,
but we haven't had the time to implement it yet.
|
|
Prior to this commit, compilation of OpenMPI used the default OpenMPI
choices of deciding which libraries should be used in relating to a job
scheduler [1] (such as Slurm [2]). Given that the user on a multi-user
cluster has to accept the sysadmin's choice of a job scheduler, the
question of whether to (1) link with OpenMPI's own libraries (and increase
the reproducibility of the science project) or rather (2) link with the
sysadmin managed libraries (more likely to be compatible with the host's
job scheduler), is an open question of which the best strategy for
reproducibility needs to be debated and studied.
In this commit, strategy (1) is adopted. The options '--withpmix=internal'
and '--with-hwloc=internal' are added to the configure command. The working
assumption is that the Maneage version of OpenMPI is likely to be modern
enough to be compatible with the native job scheduler such as
Slurm. Compilation without any 'pmix' option gave a fail in at least one
case; it appears that an external pmix library was sought by the configure
script.
As of OpenMPI 4.0.1, the internal libevent library is used by default, so
there appears to be no option to force it to be chosen internally.
This commit also includes the option '--without-verbs'. This option
removes a library related to "infiniband", "verbs", "openib" and "BTL";
this library appears to be deprecated. See [3], [4] for discussion.
Please add feedback and discussion to the Maneage task about openmpi
linking strategies (1) (internal) and (2) (external) at Savannah [5].
[1] https://en.wikipedia.org/wiki/Job_scheduler#Batch_queuing_for_HPC_clusters
[2] https://en.wikipedia.org/wiki/Slurm_Workload_Manager - To avoid a name
clash, 'slurm-wlm' is the metapackage in Debian for the client
commands, the compute node daemon, and the central node daemon. An
unrelated package 'slurm' also exists.
[3] https://www-lb.open-mpi.org/faq/?category=openfabrics#ofa-device-error
[4] https://www-lb.open-mpi.org/faq/?category=building
[5] https://savannah.nongnu.org/task/index.php?15737
|
|
Until now, if a project needed the healpy software package, Maneage would
crash with the following error message (abridged for full name in build
directory). This was caused by a typo in the version of 'healpix' (the
dependency of 'healpy').
make: *** No rule to make target '.../version-info/proglib/healpix-'
With this commit, the typo in line 334 of 'python.mk' is fixed, so that
when '$(ipydir)/healpy-$(healpy-version)' gets called it correctly searches
for a rule to make '$(ibidir)/healpix-$(healpix-version)'.
|
|
In the previous commit (Commit 1bc00c9: Only using clang in macOS systems
that also have GCC) we set the used C compiler for high-level programs to
be 'clang' on macOS systems. But I forgot to do the same kind of change in
the configure script (to prefer 'clang' when we are testing for a C
compiler on the host).
With this commit, the compiler checking phases of the configure script have
been improved, so on macOS systems, we now first search for 'clang', then
search for 'gcc'.
While doing this, I also noticed that the 'rpath' checking command was done
before we actually define 'instdir'!!! So in effect, the 'rpath' directory
was being set to '/lib'! So with this commit, this test has been taken to
after defining 'instdir'.
|
|
Until now, when Maneage was built on a macOS that had both a clang and GCC,
we would make links to both. But this cause many conflicts in some
high-level programs (for example Numpy and etc, all the programs where we
have explicity set 'export CC=clang' before the build recipe). This happens
because the GCC that is built on a macOS isn't complete for some
operations.
To fix this problem, when we are on a macOS, we explicity set 'gcc' to
point to 'clang' and 'g++' to point to 'clang++'. We also don't link to the
host's C-preprocessor ('cpp') on macOS systems because this is only a GNU
feature and using the GNU CPP is also known to have some basic
problems. For example this was reported by Mahdieh Nabavi (which was the
main trigger for this work):
ld: Symbol not found: ___keymgr_global
Referenced from: /Users/Mahdieh/build/software/installed/bin/cpp
Expected in: /usr/lib/libSystem.B.dylib
Also, to avoid linking to another link on the host tools (in the 'makelink'
function of 'basic.mk'), we are now using 'realpath'.
|
|
Until now, when reading the host's PATH environment variable we weren't
accounting for directory names with a space character. This was most
prominently visible in the 'low-level-links' step where we put links to
some core system components into the project's build directory (mainly for
prorietary systems like macOS).
To address the problem, double quotations have been placed around the part
that we extract 'ccache' from the PATH, and the part where we make the
symbolic link. In the process the comments above 'makelink' were made more
clear and 'low-level-links' now depends on 'grep' (which is the
highest-level program it uses).
This bug was reported by Mahdieh Navabi.
|
|
Until this commit, once Libidn was installed, insted of its own name and
version, the name and version of Libjpeg were saved (in the target if
Libidn). This robably come from a copy/paste of the rule.
With this commit, this minor bug has been corrected. I also added my name
as an author of `reproduce/software/make/xorg.mk' Makefile since I added
some code there.
|
|
After recently adding util-linux to Maneage build-tree, we had forgot to
delete the unpacked and built source directory after it was installed! This
has been corrected with this commit.
|
|
Until now, when the user specified an input and software directory, the raw
string they entered was used. But when this string was a relative location,
this could be problematic in general scenarios.
With this commit, the same function that finds the absolute location of the
build directory is used to find the absolute address of the data and
software directories.
|
|
Until now, in order to build Ghostscript, the project used the host's Xorg
libraries. This was because we hadn't yet added the necessary build rules
for them.
With this commit, the instructions to build the necessary Xorg libraries
for Ghostscript have also been added. Also, the shared Ghostscript library
has been built with this commit and two sets of standard fonts are also
included, setting us on the path to build TeXLive from source later.
This task was done with the help and support of Raul Infante-Sainz.
|
|
Until this commit, there was a problem when building Bison in parallel in
macOS systems. With this commit, this problem has been fixed by updating
Bison to its most recent version (3.6).
|
|
POSSIBLE EFFECT ON YOUR PROJECT: The changes in this commit may only cause
conflicts to your project if you have changed the software building
Makefiles in your project's branch (e.g., 'basic.mk', 'high-level.mk' and
'python.mk'). If your project has only added analysis, it shouldn't be
affected.
This is a large commit, involving a long series of corrections in a
differnt branch which is now finally being merged into the core Maneage
branch. All changes were related and came up naturally as the low-level
infrastructure was improved. So separating them in the end for the final
merge would have been very time consuming and we are merging them as one
commit.
In general, the software building Makefiles are now much more easier to
read, modify and use, along with several new features that have been
added. See below for the full list.
- Until now, Maneage needed the host to have a 'make' implementation
because Make was necessary to build Lzip. Lzip is then used to
uncompress the source of our own GNU Make. However, in the
minimalist/slim versions of operating systems (for example used to build
Docker images) Make isn't included by default. Since Lzip was the only
program before our own GNU Make was installed, we consulting Antonio
Diaz Diaz (creator of Lzip) and he kindly added the necessary
functionality to a new version of Lzip, which we are using now. Hence we
don't need to assume a Make implementation on the host any more. With
this commit, Lzip and GNU Make are built without Make, allowing
everything else to be safely built with our own custom version of GNU
Make and not using the host's 'make' at all.
- Until recently (Commit 3d8aa5953c4) GNU Make was built in
'basic.mk'. Therefore 'basic.mk' was written in a way that it can be
used with other 'make' implementations also (i.e., important shell
commands starting with '&&' and ending in '\' without any comments
between them!). Furthermore, to help in style uniformity, the rules in
'high-level.mk' and 'python.mk' also followed a similar structure. But
due to the point above, we can now guarantee that GNU Make is used from
the very first Makefile, so this hard-to-read structure has been removed
in the software build recipes and they are much more readable and
edit-friendly now.
- Until now, the default backup servers where at some fixed URLs, on our
own pages or on Gitlab. But recently we uploaded all the necessary
software to Zenodo (https://doi.org/10.5281/zenodo.3883409) which is
more suitable for this task (it promises longevity, has a fixed DOI,
while allowing us to add new content, or new software tarball
versions). With this commit, a small script has been written to extract
the most recent Zenodo upload link from the Zenodo DOI and use it for
downloading the software source codes.
- Until now, we primarily used the webpage of each software for
downloading its tarball. But this caused many problems: 1) Some of them
needed Javascript before the download, 2) Some URLs had a complex
dependency on the version number, 3) some servers would be randomly down
for maintenance and etc. So thanks to the point above, we now use the
Zenodo server as the primary download location. However, if a user wants
to use a custom software that is not (yet!) in Zenodo, the download
script gives priority to a custom URL that the users can give as Make
variables. If that variable is defined, then the script will use that
URL before going onto Zenodo. We now have a special place for such URLs:
'reproduce/software/config/urls.conf'. The old URLs (which are a good
documentation themselves) are preserved here, but are commented by
default.
- The software source code downloading and checksum verification step has
been moved into a Make function called 'import-source' (defined in the
'build-rules.mk' and loaded in all software Makefiles). Having taken all
the low-level steps there, I noticed that there is no more need for
having the tarball as a separate target! So with this commit, a single
rule is the only place that needs to be edited/added (greatly
simplifying the software building Makefiles).
- Following task #15272, A new option has been added to the './project'
script called '--all-highlevel'. When this option is given, the contents
of 'TARGETS.conf' are ignored and all the software in Maneage are built
(selected by parsing the 'versions.conf' file). This new option was
added to confirm the extensive changes made in all the software building
recipes and is great for development/testing purposes.
- Many of the software hadn't been tested for a long time! So after using
the newly added '--all-highlevel', we noticed that some need to be
updated. In general, with this commit, 'libpaper' and 'pcre' were added
as new software, and the versions of the following software was updated:
'boost', 'flex', 'libtirpc', 'openblas' and 'lzip'. A 'run-parts.in'
shell script was added in 'reproduce/software/shell/' which is installed
with 'libpaper'.
- Even though we intentionally add the necessary flags to add RPATH inside
the built executable at compilation time, some software don't do it
(different software on different operating systems!). Until now, for
historical reasons this check was done in different ways for different
software on GNU/Linux sytems. But now it is unified: if 'patchelf' is
present we apply it. Because of this, 'patchelf' has been put as a
top-level prerequisite, right after Tar and is installed before anything
else.
- In 'versions.conf', GNU Libtool is recognized as 'libtool', but in
'basic.mk', it was 'glibtool'! This caused many confusions and is
corrected with this commit (in 'basic.mk', it is also 'libtool').
- A new argument is added to the './project' script to allow easy loading
of the project's shell and environment for fast/temporary testing of
things in the same environment as the project. Before activating the
project's shell, we completely remove all host environment variables to
simulate the project's environment. It can be called with this command:
'./project shell'. A simple prompt has also been added to highlight that
the user is using the Maneage shell!
|
|
Until now, Maneage would accept the given build directory, regardless of
the free memory available there. This could cause confusing situations for
new users who don't know about the minimum storage requirement.
With this commit, after all other checks on the given build directory are
completed, the configure script will check the available space and warns
the user if there is less than almost 5GB free space available in the build
directory (with a 5 second delay).
It won't cause a crash because some projects may require roughly smaller
than this space (the default only needs roughly 2GB). But we also don't
want the host's partition to get too close to being full, causing them
problems elsewhere. We can change the behavior as desired in future
commits.
|
|
In Commit 105467fe6402 (Software tarballs are downloaded even if not
built), we introduced tests to download the tarballs of software even if
they don't need to be built on the respective host. However some small
typos in the checks existed that could cause a crash on macOS. In
particular in the building of PatchELF and libbsd we had forgot to add the
necessary 'x' before the 'yes' in the conditional to check if a we are on
macOS or not.
With this commit these two checks have been corrected. Also, in the
building of 'isl' and 'mpc', we now check for 'host_cc' (signifying that
the user wants to use their host C compiler for the high-level step)
instead of 'on_mac_os'. The reason is that even on non-macOS systems, a
user may not want to build the C compiler from scratch and use the
'--host-cc' option. In such cases, they don't need to compile 'isl' and
'mpc'.
|
|
Until now, the English texts that embeds the list of software to
acknowledge in the paper was hard-wired into the low-level coding
('reproduce/software/shell/configure.sh' to be more specific). But this
file is very low-level, thus discouraging users to modify this surrounding
text.
While the list of software packages can be considered to be 'data' and is
fixed, the surrounding text to describe the lists is something the authors
should decide on. Authors of a scientific research paper take
responsibility for the full paper, including for the style of the
acknowledgments, even if these may well evolve into some standard text.
With this commit, authors who do *not* modify
'reproduce/software/config/acknowledge_software.sh' will have a default
text, with only a minor English correction from earlier versions of
Maneage. However, Authors choosing to use their own wording should be able
to modify the text parameters in
`reproduce/software/config/acknowledge_software.sh` in the obvious
way. This is much more modular than asking project authors to go looking
into the long and technical 'configure.sh' script.
Systematic issues: the file
`reproduce/software/config/acknowledge_software.sh` is an executable shell
script, because it has to be called by
`reproduce/software/shell/configure.sh`, which, in principle, does not yet
have access to `GNU make` (if I understand the bootstrap sequence
correctly). It is placed in `config/` rather than `shell/`, because the
user will expect to find configuration files in `config/`, not in `shell/`.
A possible alternative to avoid having a shell script as a configure file
would be to let `reproduce/software/config/acknowledge_software.sh` appear
to be a `make` file, but analyse it in `configure.sh` using `sed` to remove
whitespace around `=`, and adding other hacks to switch from `make` syntax
to `shell` syntax. However, this risks misleading the user, who will not
know whether s/he should follow `make` conventions or `shell` conventions.
|
|
Some low-level software aren't necessary on some operating systems, for
example GCC can't be built on macOS, hence we don't build it and the
GCC-only dependencies. Also, on GNU/Linux systems users could configure
with '--host-cc' to avoid all the time it takes to build GCC when doing a
fast test.
Until now, in such cases not only was the software not installed, but the
tarballs of the software were also not downloaded. Hence making the output
of '--dist-software' incomplete (as in bug #58561).
With this commit, we now import all the necessary tarballs, when the
software isn't necessary for the particular system, it won't be built or
cited, but its tarball will be present anyway, thus allowing the output of
'--dist-software' to be complete.
|
|
Until now, when making the link to Gnuastro's configuration files, the
'configure.sh' script would incorrectly link to the old configuration
directory under the 'reproduce/software' directory. With this commit, it is
moved to the proper directory under 'reproduce/analysis'.
|
|
Until now, when adding the necessary library flags to the build of XLSX
I/O, we were effectively over-writing the 'LDFLAGS' variables. So the
compiler was effectively not being told where to look for the necessary
libraries.
With this commit, to fix the problem, we now append the new linking flags
to LDFLAGS in XLSX I/O's build, not over-write it.
|
|
After trying a clean build of Maneage in a Docker image (with a minimal
debian:stable-20200607-slim OS), I noticed that the building of OpenSSL is
failing because it doesn't find the proper Perl functionality. To fix it,
with this commit, Perl is set as a prerequisite of OpenSSL and this fixed
the problem.
|
|
The project configuration requires a build-directory at configuration time,
two other directories can optionally be given to avoid downloading the
project's necessary data and software. It is possible to give these three
directories as command-line options, or by interactively giving them after
running the configure script.
Until now, when these directories weren't given as command-line options,
and the running shell was non-interactive, the configure script would crash
on the line trying to interactively read the user's given directories (the
'read' command).
With this commit, all the 'read' commands for these three directories are
now put within an 'if' statement. Therefore, when 'read' fails (the shell
is non-interactive), instead of a quiet crash, a descriptive message is
printed, telling the user that cause of the problem, and suggesting a fix.
This bug was found by Michael R. Crusoe.
|
|
Until now, the description of the input-data directory at configure time
included a description of the input data (created by reading the values of
'INPUTS.conf'). Maintaining this is easy for a single dataset, but it
becomes hard for a general project which may need many input datasets.
To avoid extra complexity (for maintaining this list), the description now
points a user of the project to the 'INPUTS.conf' file and asks them to
look inside of it for seeing the necessary data. This infact helps with the
users becoming familiar with the internal structure of Maneage and will
allow the authors to focus on not having to worry about updating the
low-level 'configure.sh' script.
|
|
When './project configure' is run, after the basic checks of the compiler,
a small statement is printed telling the user that some configuration
questions will now be asked to start building Maneage on the system. Until
now this description was confusing: it lead the reader to think that the
local configuration (which was recommended to read before continuing) is in
another file.
With this commit, the text has been edited to explictly mention that the
description of the steps following this notice should be read
carefully. Thus avoiding that confusion.
This issue was mentioned by Michael R. Crusoe.
|
|
Possible semantic conflicts (that may not show up as Git conflicts but may
cause a crash in your project after the merge):
1) The project title (and other basic metadata) should be set in
'reproduce/analysis/conf/metadata.conf'. Please include this file in
your merge (if it is ignored because of '.gitattributes'!).
2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if
you have added all analysis Makefiles to the '.gitattributes' file
(thus not merging any change in them with your branch). For example
with this command:
git diff master...maneage -- reproduce/analysis/make/initialize.mk
3) The old 'verify-txt-no-comments-leading-space' function has been
replaced by 'verify-txt-no-comments-no-space'. The new function will
also remove all white-space characters between the columns (not just
white space characters at the start of the line). Thus the resulting
check won't involve spacing between columns.
A common set of steps are always necessary to prepare a project for
publication. Until now, we would simply look at previous submissions and
try to follow them, but that was prone to errors and could cause
confusion. The internal infrastructure also didn't have some useful
features to make good publication possible. Now that the submission of a
paper fully devoted to the founding criteria of Maneage is complete
(arXiv:2006.03018), it was time to formalize the necessary steps for easier
submission of a project using Maneage and implement some low-level features
that can make things easier.
With this commit a first draft of the publication checklist has been added
to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018
and zenodo.3872248. To help guide users on implementing the good practices
for output datasets, the outputs of the default project shown in the paper
now use the new features). After reading the checklist, please inspect
these.
Some other relevant changes in this commit:
- The publication involves a copy of the necessary software
tarballs. Hence a new target ('dist-software') was also added to
package all the project's software tarballs in one tarball for easy
distribution.
- A new 'dist-lzip' target has been defined for those who want to
distribute an Lzip-compressed tarball.
- The '\includetikz' LaTeX macro now has a second argument to allow
configuring the '\includegraphics' call when the plot should not be
built, but just imported.
|
|
When some files should not be merged, until now we were suggesting to also
add deleted files to the '.gitattributes' file. However, this feature of
Git doesn't work for deleted files and they would still show up in the
'master' branch after a merge.
So with this commit, we have added a simple AWK command to run after a
merge that will automatically detect and delete such files (using the
output of 'git status --porcelain').
Also, two minor typos were corrected in the newly added
'servers-backup.conf' file: the copyright year was wrong and there was no
new-line at the end of the file (a good convention!).
|
|
Until now, Maneage would only build Flock before building everything else
using Make (calling 'basic.mk') in parallel. Flock was necessary to avoid
parallel downloads during the building of software (which could cause
network problems). But after recently trying Maneage on FreeBSD (which is
not yet complete, see bug #58465), we noticed that the BSD implemenation of
Make couldn't parse 'basic.mk' (in particular, complaining with the 'ifeq'
parts) and its shell also had some peculiarities.
It was thus decided to also install our own minimalist shell, Make and
compressor program before calling 'basic.mk'. In this way, 'basic.mk' can
now assume the same GNU Make features that high-level.mk and python.mk
assume. The pre-make building of software is now organized in
'reproduce/software/shell/pre-make-build.sh'.
Another nice feature of this commit is for macOS users: until now the
default macOS Make had problems for parallel building of software, so
'basic.mk' was built in one thread. But now that we can build the core
tools with GNU Make on macOS too, it uses all threads. Furthermore, since
we now run 'basic.mk' with GNU Make, we can use '.ONESHELL' and don't have
to finish every line of a long rule with a backslash to keep variables and
such.
Generally, the pre-make software are now organized like this: first we
build Lzip before anything else: it is downloaded as a simple '.tar' file
that is not compressed (only ~400kb). Once Lzip is built, the pre-make
phase continues with building GNU Make, Dash (a minimalist shell) and
Flock. All of their tarballs are in '.tar.lz'. Maneage then enters
'basic.mk' and the first program it builds is GNU Gzip (itself packaged as
'.tar.lz'). Once Gzip is built, we build all the other compression software
(all downloaded as '.tar.gz'). Afterwards, any compression standard for
other software is fine because we have it.
In the process, a bug related to using backup servers was found in
'reproduce/analysis/bash/download-multi-try' for calling outside of
'basic.mk' and removed Bash-specific features. As a result of that bug-fix,
because we now have multiple servers for software tarballs, the backup
servers now have their own configuration file in
'reproduce/software/config/servers-backup.conf'. This makes it much easier
to maintain the backup server list across the multiple places that we need
it.
Some other minor fixes:
- In building Bzip2, we need to specify 'CC' so it doesn't use 'gcc'.
- In building Zip, the 'generic_gcc' Make option caused a crash on FreeBSD
(which doesn't have GCC).
- We are now using 'uname -s' to specify if we are on a Linux kernel or
not, if not, we are still using the old 'on_mac_os' variable.
- While I was trying to build on FreeBSD, I noticed some further
corrections that could help. For example the 'makelink' Make-function
now takes a third argument which can be a different name compared to the
actual program (used for examle to make a link to '/usr/bin/cc' from
'gcc'.
- Until now we didn't know if the host's Make implementation supports
placing a '@' at the start of the recipe (to avoid printing the actual
commands to standard output). Especially in the tarball download phase,
there are many lines that are printed for each download which was really
annoying. We already used '@' in 'high-level.mk' and 'python.mk' before,
but now that we also know that 'basic.mk' is called with our custom GNU
Make, we can use it at the start for a cleaner stdout.
- Until now, WCSLIB assumed a Fortran compiler, but when the user is on a
system where we can't install GCC (or has activated the '--host-cc'
option), it may not be present and the project shouldn't break because
of this. So with this commit, when a Fortran compiler isn't present,
WCSLIB will be built with the '--disable-fortran' configuration option.
This commit (task #15667) was completed with help/checks by Raul
Infante-Sainz and Boud Roukema.
|