| Age | Commit message (Collapse) | Author | Lines | 
|---|
|  | There weren't any conflicts in this merge. | 
|  | Tcl/Tk are a set of tools to provide Graphic User Interface (GUI) support
in some software. But they are not yet natively built within Maneage,
primarily because we have higher-priority work right now. GUI tools in
general aren't high on our priority list right now because GUI tools are
generally good for human interaction (which is contrary to the reproducible
philosophy), not automatic analysis (a core concept in reproducibility). So
even later, when we do include Tcl/Tk in Maneage, their direct usage will
be discouraged.
Until this commit, because we don't yet build Tcl/Tk, the default maneage
install of the statistical package R failed on a Debian Stretch, with 6227
repeats of the line:
'/usr/lib//tcl8.5/tclConfig.sh: line 2: dpkg-architecture:
command not found'
To fix this problem (atleast until Tcl/Tk is installed within Maneage), R
is now configured with the '--without-tcltk' option which fixed the
problem. Please see the description above the R installation instructions
in 'reproduce/software/make/high-level.mk' for more. | 
|  | Following the previous commit, we recognized that the 'IFS' terms are not
necessary and can be even cause problems. So all their occurances in the
scripts of Maneage have been removed with this commit. | 
|  | Until a recent commit, the IFS='"' was added at the start of the variables
in this shell script and as a result, the SPACE character wasn't being used
as a delimiter. This caused a major problem when downloading the tarballs
(all the backup servers were considered as the top link).
With this commit we removed these 'IFS' statements). Because we now check
for the existance of meta-characters in the build directory name, there is
no more problem, and also generally both the calling command and
internally, we have double-qutations around the variable names. So removal
of IFS will not affect the result in this scenario.
This bug was found by Mohammadreza Khellat. | 
|  | This paper is generally about data analysis pipelines, so the abstract now
starts with "Analysis pipelines" instead of "Reproducible workflows". I
also noticed that the sentence was mistakenly broken into multiple lines. | 
|  | Only two small conflicts came up:
 * The addition of the hardware architecture macro in 'paper.tex' (which
   was removed for now, but will be added as the referee has requested
   within the text).
 * The usage of "" around directory variables in 'paper.mk'. | 
|  | I saw this link today in the news (to be implemented from November 1st,
2020), and because it is directly related to this work, I added it. Many
people assume that simply pushing a Docker image to DockerHub is enough to
preserve it, but ignore how much it costs to maintain the storage and
network capacity. | 
|  | With the previous commit, we now build Nano by default within Maneage, and
project authors can ask to install Emacs and Vim within 'TARGETS.conf'. So
in the instructions to build within a Docker image have been removed. | 
|  | While a project is under development, the raw analysis software are not the
only necessary software in a project. We also need tools to all the edit
plain-text files within the Maneaged project. Usually people use their
operating system's plain-text editor. However, when working on the project
on a new computer, or in a container, the plain-text editors will have
different versions, or may not be present at all! This can be very annoying
and frustrating!
With this commit, Maneage now installs GNU Nano as part of the basic
tools. GNU Nano is a very simple and small plain text editor (the installed
size is only ~3.5MB, and it is friendly to new users). Therefore, any
Maneaged project can assume atleast Nano will be present (in particular
when no editor is available on the running system!). GNU Emacs and VIM
(both without extra dependencies, in particular without GUI support) are
also optionally available in 'high-level.mk' (by adding them to
'TARGETS.conf').
The basic idea for the more advanced editors (Emacs and VIM) is that
project authors can add their favorite editor while they are working on the
project, but upon publication they can remove them from 'TARGETS.conf'.
A few other minor things came up during this work and are now also fixed:
 - The 'file' program and its libraries like 'libmagic' were linking to
   system's 'libseccomp'! This dependency then leaked into Nano (which
   depends on 'libmagic'). But this is just an extra feature of 'file',
   only for the Linux kernel. Also, we have no dependency on it so far. So
   'file' is not configured to not build with 'libseccomp'.
 - A typo was fixed in the line where the physical core information is
   being read on macOS.
 - The top-level directories when running './project shell' are now quoted
   (in case they have special characters). | 
|  | Until now, no machine-related specifications were being documented in the
workflow. This information can become helpful when observing differences in
the outcome of both software and analysis segments of the workflow by
others (some software may behave differently based on host machine).
With this commit, the host machine's 'hardware class' and 'byte-order' are
collected and now available as LaTeX macros for the authors to use in the
paper. Currently it is placed in the acknowledgments, right after
mentioning the Maneage commit.
Furthermore, the project and configuration scripts are now capable of
dealing with input directory names that have SPACE (and other special
characters) by putting them inside double-quotes. However, having spaces
and metacharacters in the address of the build directory could cause
build/install failure for some software source files which are beyond the
control of Maneage. So we now check the user's given build directory
string, and if the string has any '@', '#', '$', '%', '^', '&', '*', '(',
')', '+', ';', and ' ' (SPACE), it will ask the user to provide a different
directory. | 
|  | When building Maneage inside a Docker container, in the end the users want
to extract the final outputs from the container into their host operating
system to inspect more comfortably. So with this commit, a short
examplanation has been added on how to do this.
We also noticed that it is much better if the 'Dockerfile' is stored and
run in an empty directory, otherwise, it will start parsing the full
directory and its subdirectories as the docker image's environment. | 
|  | Until now, the replicated plot had the width of the full page and the data
lineage graph was under it. Together they were covering more than half of
the height of the page! But the plot showing the number of papers with
tools really doesn't have too much detail, and all the space was being
wasted.
With this commit, the plot is now much much thinner and the data lineage
graph has been fitted to the right of it. | 
|  | Some very minor conflicts came up and were easily corrected. They were
mostly in parts that are also shared with the demonstration in the core
Maneage branch. | 
|  | The '.bbl' suffix in the comment of one call to LaTeX was incorrectly
written as '.bb'. | 
|  | Until now, './project --check-config' would only print the names of the
software that were being built. Besides that, it is also useful to know
which packages have most recently finished.
With this commit, we now print the last 5 built software packages with
'--check-config' also, and the output has also been placed in a row of '='s
to help separate it in each round. Also some more sanity checks have been
added so it doesn't print error messages. | 
|  | Until now, if the software source tarballs already existed on the system
they would be copied inside the project. However, the software source
tarballs are sometimes/mostly larger than their actual product and can
consume significant space (~375 MB in the core branch!).
With this commit, when the software are present on the system, their
symbolic link will be placed in 'BDIR/software/tarballs', not a full
copy. Also, because the tarballs in software tarball directory may
themselves be links, we use 'realpath' to find the final place of the
actual file and link to that location. Therefore if 'realpath' can't be
found (prior to installing Coreutils in Maneage), we will copy the tarballs
from the given software tarball directory. After Maneage has installed
Coreutils, the project's own 'realpath' will be used. Of course, if the
software are downloaded, their full downloaded copy will be kept in
'BDIR/software/tarballs', nothing has changed in the downloading scenario. | 
|  | It was a long time that the Maneage software versions hadn't been updated.
With this commit, the versions of all basic software were checked and 17 of
that had newer versions were updated. Also, 16 high-level programs and
libraries were updated as well as 7 Python modules. The full list is
available below.
Basic Software (affecting all projects)
---------------------------------------
bash            5.0.11 -> 5.0.18
binutils        2.32 -> 2.35
coreutils       8.31 -> 8.32
curl            7.65.3 -> 7.71.1
file            5.36 -> 5.39
gawk            5.0.1 -> 5.1.0
gcc             9.2.0 -> 10.2.0
gettext         0.20.2 -> 0.21
git             2.26.2 -> 2.28.0
gmp             6.1.2 -> 6.2.0
grep            3.3 -> 3.4
libbsd          0.9.1 -> 0.10.0
ncurses         6.1 -> 6.2
perl            5.30.0 -> 5.32.0
sed             4.7 -> 4.8
texinfo         6.6 -> 6.7
xz              5.2.4 -> 5.2.5
Custom programs/libraries
-------------------------
astrometrynet   0.77 -> 0.80
automake        0.16.1 -> 0.16.2
bison           3.6 -> 3.7
cfitsio         3.47 -> 3.48
cmake           3.17.0 -> 3.18.1
freetype        2.9 -> 2.10.2
gdb             8.3 -> 9.2
ghostscript     9.50 -> 9.52
gnuastro        0.11 -> 0.12
libgit2         0.28.2 -> 1.0.1
libidn          1.35 -> 1.36
openmpi         4.0.1 -> 4.0.4
R               3.6.2 -> 4.0.2
python          3.7.4 -> 3.8.5
wcslib          6.4 -> 7.3
yaml            0.2.2 -> 0.2.5
Python modules
--------------
cython          0.29.6 -> 0.29.21
h5py            2.9.0 -> 2.10.0
matplotlib      3.1.1 -> 3.3.0
mpi4py          3.0.2 -> 3.0.3
numpy           1.17.2 -> 1.19.1
pybind11        2.4.3 -> 2.5.0
scipy           1.3.1 -> 1.5.2 | 
|  | When the host C compiler is used (either by calling '--host-cc' or on OSs
that we can't build the GNU C Compiler), Maneage will also not build the
Fortran compiler 'gfortran'. Until now, the './project configure' script
would give a big warning about the need for 'gfortran' and the fact that it
is missing, and would for 5 seconds, but it would continue anyway.
For projects that don't need 'gfortran', this can be confusing to the users
and for those that need 'gfortran', it means that a lot of time and cpu
cycles are wasted compiling non-fortran software that are unusable in the
end.
With this commit, the 'need_gfortarn' variable has been added
'reproduce/software/shell/configure.sh', in a new part that is devoted to
project-specific features. If it equals '0', then the 'gfortran' test (and
message!) isn't done at all, but if it is set to '1', then the configure
stage will halt immediately gfortran is not found and not built.
The default operations of the core Maneage branch don't need 'gfortran', so
by default it is set to 0. But 'gfortran' is necessary for all projects
that use Numpy (Python's numeric library) for example. So if your project
needs 'gfortran', please set this new variable to 1. As mentioned in the
comments of 'configure.sh', ideally we should detect this automatically,
but we haven't had the time to implement it yet. | 
|  | One of the LaTeX macros reported by 'initialize.mk' is the git commit hash
of the most recent 'maneage' branch that the project has been branched
from. However, not all projects will retain the maneage reference. This can
happen for example when people don't push the 'maneage' reference to their
repository and then clone from their own repository to a second
computer. Therefore, until now, in such situations, Maneage would break
with an error.
With this commit, in such scenarios, a place holder string is used instead,
clearly highlighting that there is no 'maneage' reference. | 
|  | Prior to this commit, compilation of OpenMPI used the default OpenMPI
choices of deciding which libraries should be used in relating to a job
scheduler [1] (such as Slurm [2]). Given that the user on a multi-user
cluster has to accept the sysadmin's choice of a job scheduler, the
question of whether to (1) link with OpenMPI's own libraries (and increase
the reproducibility of the science project) or rather (2) link with the
sysadmin managed libraries (more likely to be compatible with the host's
job scheduler), is an open question of which the best strategy for
reproducibility needs to be debated and studied.
In this commit, strategy (1) is adopted. The options '--withpmix=internal'
and '--with-hwloc=internal' are added to the configure command. The working
assumption is that the Maneage version of OpenMPI is likely to be modern
enough to be compatible with the native job scheduler such as
Slurm. Compilation without any 'pmix' option gave a fail in at least one
case; it appears that an external pmix library was sought by the configure
script.
As of OpenMPI 4.0.1, the internal libevent library is used by default, so
there appears to be no option to force it to be chosen internally.
This commit also includes the option '--without-verbs'.  This option
removes a library related to "infiniband", "verbs", "openib" and "BTL";
this library appears to be deprecated. See [3], [4] for discussion.
Please add feedback and discussion to the Maneage task about openmpi
linking strategies (1) (internal) and (2) (external) at Savannah [5].
[1] https://en.wikipedia.org/wiki/Job_scheduler#Batch_queuing_for_HPC_clusters
[2] https://en.wikipedia.org/wiki/Slurm_Workload_Manager - To avoid a name
    clash, 'slurm-wlm' is the metapackage in Debian for the client
    commands, the compute node daemon, and the central node daemon. An
    unrelated package 'slurm' also exists.
[3] https://www-lb.open-mpi.org/faq/?category=openfabrics#ofa-device-error
[4] https://www-lb.open-mpi.org/faq/?category=building
[5] https://savannah.nongnu.org/task/index.php?15737 | 
|  | Roukema+2020 (arXiv:2007.11779) is a newly published (as preprint) paper
that uses Maneage, so it is being added to the list of published or
submitted papers in 'README-hacking.md'. The Software Heritage URL sticks
out way beyond the standard number of columns in the plain text form of the
updated 'README-hacking.md' file, when rendered using markdown, it
shouldn't look so bad.
Also, see the related task https://savannah.nongnu.org/task/index.php?15736
(Raul+2020 should be Infante-Sainz+2020) for a suggestion of a more
standard machine-readable format.
It should be mentioned and emphasised to the reader that one should very
carefully and obediently note and pay attention to the noteworthy fact that
a few distracting words [1] such as "Note that" are removed in this
commit. ;)
[1] https://en.wiktionary.org/wiki/pontification | 
|  | There are many different directory trees involved in Maneage system: the
top directory, the 'reproduce/' directory and its sub-directories,
'.build/' (that point to a user-defined build area), and a possibly
user-defined input directory. Until now, in the case of a download checksum
failure, it was not immediately obvious [1] to the user *where* the file
with a failed checksum is.
To clarify to the user *where* the suspicious file is now located, this
commit adds a line to 'reproduce/analysis/make/download.mk' to print out
this full path location: '$$unchecked' along with the expected and
calculated checksums.
[1] Euphemism for me spending lots of time debugging and being confused. | 
|  | This commit clarifies the initial usage of Zenodo for reserving a Zenodo
identifier and starting an 'unpublished' upload. Some other minor wording
changes are done here. | 
|  | Until this commit, the '$(project-package-contents)' rules in
'reproduce/analysis/make/initialize.mk' included a line to provide all
contents, recursively, of the directory 'reproduce/' in the package for
further distribution.
This could potentially lead to the distribution of private working files
that are used during development and not intended for general distribution.
With this commit, only those files in 'reproduce/' and 'tex/src' that are
under version control are copied to the temporary directory (that is later
used for creating an archive). With this change, the archiving commands
actually became more clean (we don't have to manually remove 'LOCAL.conf'
or other temporary files). Extensive comments have also been added above
each step to clarify each step's purpose and method. | 
|  | Docker is a "container" technology that allows an almost independent
operating system run on the host. It is useful when the host OS doesn't
support some features or has internal problems (for example its C library
or C compiler have problems). Fortunately a Maneaged project can easily be
built within a Docker image and a minimal image operating system.
With this commit, a section has been added to 'README.md' to describe this
process. Each step of the Dockerfile is explined, to help users that may
not be too familiar with Docker, or help Docker user who are not familiar
with Maneage. | 
|  | Until now, if a project needed the healpy software package, Maneage would
crash with the following error message (abridged for full name in build
directory). This was caused by a typo in the version of 'healpix' (the
dependency of 'healpy').
  make: *** No rule to make target '.../version-info/proglib/healpix-'
With this commit, the typo in line 334 of 'python.mk' is fixed, so that
when '$(ipydir)/healpy-$(healpy-version)' gets called it correctly searches
for a rule to make '$(ibidir)/healpix-$(healpix-version)'. | 
|  | Until now the './project make dist' command implicitly assumed that the
'tex/tikz' directory always contains PDF files (because of the 'cp
tex/tikz/*.pdf $$dir/tex/tikz' line). This was annoying for projects that
don't use TiKZ or PGFPlots to generate their plots, and they had to
manually comment this line.
With this commit a check has been placed to see if any PDF files exist in
there at all. If there aren't PDF files, the 'cp' command above is ignored. | 
|  | In the previous commit (Commit 1bc00c9: Only using clang in macOS systems
that also have GCC) we set the used C compiler for high-level programs to
be 'clang' on macOS systems. But I forgot to do the same kind of change in
the configure script (to prefer 'clang' when we are testing for a C
compiler on the host).
With this commit, the compiler checking phases of the configure script have
been improved, so on macOS systems, we now first search for 'clang', then
search for 'gcc'.
While doing this, I also noticed that the 'rpath' checking command was done
before we actually define 'instdir'!!! So in effect, the 'rpath' directory
was being set to '/lib'! So with this commit, this test has been taken to
after defining 'instdir'. | 
|  | Until now, when the bibliography file ('paper.bbl') had a LaTeX-related
error (for example the journal name was a LaTeX macro that isn't defined),
the first 'pdflatex' command that is run before 'biber' would crash, not
allowing the project to reach 'biber'. So the user would have to manually
remove 'paper.bbl' before running './project make'.
With this commit, we remove any possibly existing 'paper.bbl' file before
rebuilding it. Generally, this also helps in keeping things clean during
the generation of the new bibliography.
This bug was found by Mahdieh Nabavi. | 
|  | Until now, when Maneage was built on a macOS that had both a clang and GCC,
we would make links to both. But this cause many conflicts in some
high-level programs (for example Numpy and etc, all the programs where we
have explicity set 'export CC=clang' before the build recipe). This happens
because the GCC that is built on a macOS isn't complete for some
operations.
To fix this problem, when we are on a macOS, we explicity set 'gcc' to
point to 'clang' and 'g++' to point to 'clang++'. We also don't link to the
host's C-preprocessor ('cpp') on macOS systems because this is only a GNU
feature and using the GNU CPP is also known to have some basic
problems. For example this was reported by Mahdieh Nabavi (which was the
main trigger for this work):
  ld: Symbol not found: ___keymgr_global
    Referenced from: /Users/Mahdieh/build/software/installed/bin/cpp
    Expected in: /usr/lib/libSystem.B.dylib
Also, to avoid linking to another link on the host tools (in the 'makelink'
function of 'basic.mk'), we are now using 'realpath'. | 
|  | To help in the documentation, the Git hash of the Maneage branch commit
that the project has most recently merged with (or branched from) is now
also provided as a LaTeX macro ('\maneageversion').
It is calculated in 'reproduce/analysis/make/initialize.mk' (in the recipe
to 'initialize.tex'). | 
|  | Until now, the dataset's configuration names had a 'WFPC2' prefix. But this
very alien to anyone that is not familiar with the history of the Hubble
Space Telescope (the camera is no longer used! Its just used here since its
one of the standard FITS files from the FITS standard webpage).
With this commit the variable names have been modified to be more readable
and clear (having a 'DEMO-' prefix). Also the comments of 'INPUTS.conf'
(describing the purpose of each variable) were edited and made more clear. | 
|  | Until now, the 'shell' mode of the './project' script was missing in the
top output of './project --help' and in the error message printed when no
operation was given, or when more than one operation was given.
This is now corrected. | 
|  | In 'README.md' I tried to explain a little better that TeXLive will only
install its necessary packages, not the full TeXLive library! Also in
paper.mk, I slightly improved the comments with very minor edits.
Both these parts are slated to go into the core Maneage branch, so its
important to maintain them here for now. | 
|  | In the previous commit, the modified abstract of the acknowledgments only
included the URL of Maneage, but its more formal to cite the Maneage paper,
the URL is already present in the paper. | 
|  | Until now, the acknowledgment section didn't contain the new name of
Maneage and it also included an acknowledgment of Gnuastro (which is not
appropriate for a general project which may not use Gnuastro).
With this commit this is fixed. | 
|  | The explanation was made more clear. | 
|  | Until now, when reading the host's PATH environment variable we weren't
accounting for directory names with a space character. This was most
prominently visible in the 'low-level-links' step where we put links to
some core system components into the project's build directory (mainly for
prorietary systems like macOS).
To address the problem, double quotations have been placed around the part
that we extract 'ccache' from the PATH, and the part where we make the
symbolic link. In the process the comments above 'makelink' were made more
clear and 'low-level-links' now depends on 'grep' (which is the
highest-level program it uses).
This bug was reported by Mahdieh Navabi. | 
|  | This was pointed out by Mervyn O'Luing. | 
|  | Until this commit, once Libidn was installed, insted of its own name and
version, the name and version of Libjpeg were saved (in the target if
Libidn). This robably come from a copy/paste of the rule.
With this commit, this minor bug has been corrected. I also added my name
as an author of `reproduce/software/make/xorg.mk' Makefile since I added
some code there. | 
|  | The explanations are now more clear for someone that is less familiar with
Docker. | 
|  | There weren't any conflicts in this merge. | 
|  | After recently adding util-linux to Maneage build-tree, we had forgot to
delete the unpacked and built source directory after it was installed! This
has been corrected with this commit. | 
|  | Until now, when the user specified an input and software directory, the raw
string they entered was used. But when this string was a relative location,
this could be problematic in general scenarios.
With this commit, the same function that finds the absolute location of the
build directory is used to find the absolute address of the data and
software directories. | 
|  | Until now, when the user wanted to complete remove all built files
(including software), the './project make distclean' command would fail if
the git hooks weren't installed. They are present when the project's
configuration has been successfully finished, but this bug can happen when
trying to re-do an incomplete build.
With this commit, this is fixed by adding an '-f' has been added before the
'rm' command for the Git hooks.
This commit was also done in the core Maneage branch. | 
|  | Until now, when the user wanted to complete remove all built files
(including software), the './project make distclean' command would fail if
the git hooks weren't installed. They are present when the project's
configuration has been successfully finished, but this bug can happen when
trying to re-do an incomplete build.
With this commit, this is fixed by adding an '-f' has been added before the
'rm' command for the Git hooks. | 
|  | With the new features in Maneage to install the necessry Xorg libraries,
the explanations of the Docker image creation also needed to be updated. | 
|  | There weren't any conflicts in this merge. | 
|  | Mervyn had read the paper and provided some interesting thoughts that I
tried to implement. Mervyn's comments are shown below. I just haven't
addressed the last point yet, because I am affraid it may make the text too
long (we are already on the boundary of the word-limit). We have already
discussed that it is a good research topic, and have hopefully triggered
the curiosity of the readers to test it ;-).
-------------------
Page 2: Regarding Criterion 1: Completeness.  A project must be self
contained? So this includes not requiring root or administrator
privileges. This suggests that the project is only made open after the
development has been completed?
Regarding Criterion 5: 'a clerk can do it' -- in the pc world that we live
in could this be taken as a disparaging comment?
Page 5: 'The C library is linked with all programs, and this dependence can
hypothetically hinder exact reproducibility of results, but we have not
encountered this so far.' - what do you think might happen if this does
affect reproducibility? Do you have a plan to deal with this? Or are you
going to wait until you hear of such cases as the number will probably be
small? Have you done probability analysis to show that the rates are likely
to be very small? Or should you have a disclaimer with maneage? | 
|  | Until now, in order to build Ghostscript, the project used the host's Xorg
libraries. This was because we hadn't yet added the necessary build rules
for them.
With this commit, the instructions to build the necessary Xorg libraries
for Ghostscript have also been added. Also, the shared Ghostscript library
has been built with this commit and two sets of standard fonts are also
included, setting us on the path to build TeXLive from source later.
This task was done with the help and support of Raul Infante-Sainz. |