| Age | Commit message (Collapse) | Author | Lines | 
|---|
|  | Until now, we were asking the users of Maneage to cite the first paper that
used its primoridal version (arXiv:1505:01664). But there is now a paper
that fully describes the concept (arXiv:2006.03018).
With this commit, in the 'citation' section of 'README-hacking.md' we now
ask to cite the new paper. | 
|  | Following Boud's great suggestion, I also summarized my CV to be less
than 40 words. | 
|  | Following Boud's great suggestion, I also summarized my CV to be less than
40 words. | 
|  | This commit provides shorter CVs for me (Boud) + David
in order to get closer to the 6500 word limit. Our CVs
are the least significant part of the paper. | 
|  | This commit makes the numbered links to references such as [13]
[14] [15] in the appendices clickable in the pdf. The solution
was to call the "\newcites" command from the "multilibs" package
*after* loading "hyperref".
First do "rm -fv .build/tex/build/*.bbl .build/tex/build/*.aux"
and then "./project make" a few times. | 
|  | This commit fixes the error of trying to run bibtex on
appendix.tex when the --no-appendix option is selected.
A hardwired hack, appropriate only for this specific paper,
replaces the more-than-three-author parts of two long author
lists by "et al."  To test this without having to redownload
the menke file, first do
"rm -fv .build/tex/build/*.aux .build/tex/build/*.bbl"
and then "./project make --no-appendix" a few times.
This commit should reduce the word length by about 70 words. | 
|  | The only issue that still remains is how to address statistical
reproducibility, and I am in touch with Boud to do this in the best way
possible (it has been highlighted with '#####'s in the answers. | 
|  | There is an answer for all the referee points now. I also did some minor
edits in the paper. But we are still over the limit by around 250 words.
The only remaining point that is not yet addressed (and has '####' around
it) is the discussion on parallelization and its effect on reproducibility. | 
|  | This commit is intended to be submittable quality.
Point 56 was removed, and the later points renumbered,
because it was a point of Reviewer 5 described what we
have done - it was not a criticism to respond do. :)
The current word count (without abstract and references)
is 6091. | 
|  | This commit only modifies "peer-review/1-answer.txt", giving
answers to Reviewer 4; these mostly take into account David's
email list of proposed answers. No changes are done to
"paper.tex". | 
|  | Copyediting of points 16 to 32 (paper.tex +
peer-review/1-answer.txt) is done in this commit.
TODO list:
2. paper lacking focus
9. tidy up README-hacking.md for appearance on website
App B.G. similar to Figure ?? - ref missing
29. website: README-hacking.md and tutorial "on same page" | 
|  | This commit updates "paper.tex" and "peer-review/1-answer.txt"
for the first 15 (out of 59!) reviewer points, excluding
points 2 (not yet done) and 9 (README-hacking.md needs
tidying).
A fix to "reproduce/analysis/make/paper.mk" for the
links in the appendices is also done in this commit (the same
algorithm as for paper.tex is added). The links in the appendices
are not (yet) clickable. | 
|  | This commit tidies up minor aspects of the language in the text
marked by "\new", e.g. a "wokflow" would be fine for Chinese
cooking, but is a little off-topic for Maneage. :) The word count
is reduced by about 7 words.
I haven't yet got to the serious part: checked that we've responded
to the referees' points, and completing the responses which we
haven't yet done. | 
|  | This commit does a minor copyedit of "peer-review/1-answer.txt",
mostly just at the top, plus some hashes to highlight an
unanswered concern; and removes the @ symbols (and full stops)
from email addresses in the peer review email in order to reduce
our feeding of email harvesters (spiders that collect email addresses
for spammers). | 
|  | Raul's added point on the answer to the referee was very good, so I edited
it a little to be more clear (and removed his name).
Also, after looking in a few parts of the text, I fixed a few typos. | 
|  | With this commit, I make several minor changes to the text of the final
paper. They are not important, but minor modifications like avoiding
contractions (don't -> do not, and so on). | 
|  | With this commit, I am just adding several minor corrections to the
answer to the referees. They are very minor typos. I would only
emphasize the fact that in Maneage there is the "Minimal complexity"
criteria, and because of that, even if the project is not able to be
executed in the future, the interested reader could have a look at the
analysis steps (because it is in plain text). Note that I put "Raul" at
the beginning of the line, so my name should have to be removed in the
final document to be sent to the referees. | 
|  | A new directory has been added at the top of the project's source called
'peer-review'. The raw reviews of the paper by the editors and referees has
been added there as '1-review.txt'. All the main points raised by the
referees have been listed in a numbered list and addressed (mostly) in
'1-answers.txt'. The text of the paper now also includes all the
implemented answers to the various points. | 
|  | Until now, the core Maneage 'paper.tex' had a '\highlightchanges' macro
that defines two LaTeX macros: '\new' and '\tonote'.
When '\highlightchanges' was defined, anything that was written within
'\new' became dark green (highlighting new things that have been
added). Also, anything that was written in '\tonote' was put within a '[]'
and became dark red (to show that there is a note here that should be
addressed later).
When '\highlightchanges' wasn't defined, anything within the '\new' element
would be black (like the rest of the text), and the things in '\tonote'
would not be shown at all.
Commenting the '\newcommand{\highlightchanges}{}' line within 'paper.tex'
(to toggle the modes above) would create a different Git hash and has to be
committed.
But this different commit hash could create a false sense in the reader
that other things have also been changed and the only way they could
confirm was to actually go and look into the project history (which they
will not usually have time to do, and thus won't be able to trust the two
modes of the text).
Also, the added highlights and the note highlights were bundeled together
into one macro, so you couldn't only have one of them.
With this commit, the choice of highlighting either one of the two is now
done as two new run-time options to the './project' script (which are
passed to the Makefiles, and written into the 'project.tex' file which is
loaded into 'paper.tex'). In this way, we can generate two PDFs with the
same Git commit (project's state): one with the selected highlights and
another one without it.
This issue actually came up for me while implementing the changes here: we
need to submit one PDF to the journal/referees with highlights on the added
features. But we also need to submit another PDF to arXiv and Zenodo
without any highlights. If the PDFs have different commit hashes, the
referees may associate it with other changes in any part of the work. For
example https://oadoi.org/10.22541/au.159724632.29528907 that mentions
"Another version of the manuscript was published on arXiv: 2006.03018",
while the only difference was a few words in the abstract after the journal
complained on the abstract word-count of our first submission (where the
commit hashes matched with arXiv/Zenodo). | 
|  | With the optional appendices added recently to the paper, it was important
to go through them and make them more fitting into the paper. | 
|  | Until now, when the 'pdf-build-final' configuration variable (defined in
'reproduce/analysis/config/pdf-build.conf') was given any string a PDF
would be built. This was very confusing, because people could put a 'no'
and the PDF would still be built!
With this commit, only when this variable has a value of 'yes' will the PDF
be built. If given any other string (or no string at all), it will not
produce a PDF.
This issue was reported by Zahra Sharbaf. | 
|  | Until now we had described the basic commands on how to create and use
Docker images, but we hadn't mentioned how you can delete them.
With this commit the commands necessary for deleting Docker images have
also been added at the bottom of the section on Docker. | 
|  | Given the referee reports, after discussing with the editors of CiSE, we
decided that it is important to include the complete appendix we had before
that included a thorough review of existing tools and methods. However, the
appendix will not be published in the paper (due to the strict word-count
limit). It will only be used in the arXiv/Zenodo versions of the paper.
This actually created a technical problem: we want the commit hash of the
project source to remain the same when the paper is built with an appendix
or without it.
To fix this problem the choice of including an appendix has gone into the
'project' script as a run-time option called '--no-appendix'. So by default
(when someone just runs './project make'), the PDF will have an appendix,
but when we want to submit to the journal, or when the appendix isn't
needed for a certain reason, we can use this new option. The appendix also
has its own separate bibliography.
Some other corrections made in this commit:
 1. Some new references were added that had an '_' in their source, they
    were corrected in 'references.tex'.
 2. I noticed that 'preamble-style.tex' is not actually used in this paper,
    so it has been deleted. | 
|  | The LaTeX macro files for these two subMakefiles are created on every run
of './project make'. So their commands are also printed every time and
hardly ever will a normal user want to modify or change these.
So to avoid populating the standard output of a Maneaged project with all
these extra lines every time (possibly getting mixed with the important
analysis or LaTeX outputs), an '@' has been placed at the start of the
recipes. With an '@' at the start of the recipe, Make is instructed to not
print the commands it wants to run in the standard output. | 
|  | This commit updates README-hacking.md with the URIs for the 'elaphrocentre'
galaxy formation pipeline paper arXiv:2010.03742. This makes three papers
currently in the peer review pipeline: arXiv:2006.03018, arXiv:2007.11779,
and arXiv:2010.03742, each chronologically corresponding to various stages
of the review process. | 
|  | After a fresh build of Maneage with a newly downloaded TeXLive, I noticed
that it is complaining about not finding 'xstring.sty', apparently some
package that depeneded on it is no longer including it itself!
It is thus now added to the packages that are built by Maneage's TeXLive. | 
|  | Until now, the core Maneage branch included some configuration files for
Gnuastro's programs. This was actually a remnant of the distant past when
Maneage didn't actually build its own software and we had to rely on the
host's software versions. This file contained the configuration files
specific to Gnuastro for this project and also had a feature to avoid
checking the host's own configuration files.
However, we now build all our software ourselves with fixed configuration
files (for the version that is being installed and its version is
stored). So those extra configuration files were just extra and caused
confusion and problems in some scenarios. With this commit, those extra
files are now removed.
Also, two small issues are also addressed in parallel with this commit:
 - When running './project make clean', the 'hardware-parameters.tex' macro
   file (which is created by './project configure' is not deleted.
 - The project title is now written into the default output's PDF's
   properties (through 'hypersetup' in 'tex/src/preamble-header.tex')
   through the LaTeX macro.
All these issues were found and fixed with the help of Samane Raji. | 
|  | Until now, during the configure step it was checked if the host Operative
System were GNU/Linux, and if not, we assumed it is macOS.  However, it can
be any other different OS! With this commit, now we explicity check if the
system is GNU/Linux or Darwin (macOS). If it is not any of them, a warning
message says to the user that the host system is different from which we
have checked so far (and invite to contact us if there is any problem).
In addition to this, if the system is macOS, now it checks if Xcode is
already installed in the host system. If it is not installed, a warning
message informs the user to do that in case a problem/crash in the
configure step occurs. We have found that it is convenient to have Xcode
installed in order to avoid some problems. | 
|  | Before this commit, there were no arguments regarding machine related
specifications in the manuscript. This was needed as Mohammad Akhlaghi came
across a review of the artcile by Dylan Aïssi in which Dylan mentioned the
need for discussing CPU architecture dependence in pursuing a long-trem
archivable workflow.
With this commit, the required argument has been added in Sec.IV POC:
Maneage in the paragraph in which it is explained how 'macro files build
the core skeleton of Maneage'. Furthermore, few typos in different places
have been fixed and the 'pre-make-build.sh' has been updated with the
latest fix in Maneage core project. | 
|  | There weren't any conflicts in this merge. | 
|  | Tcl/Tk are a set of tools to provide Graphic User Interface (GUI) support
in some software. But they are not yet natively built within Maneage,
primarily because we have higher-priority work right now. GUI tools in
general aren't high on our priority list right now because GUI tools are
generally good for human interaction (which is contrary to the reproducible
philosophy), not automatic analysis (a core concept in reproducibility). So
even later, when we do include Tcl/Tk in Maneage, their direct usage will
be discouraged.
Until this commit, because we don't yet build Tcl/Tk, the default maneage
install of the statistical package R failed on a Debian Stretch, with 6227
repeats of the line:
'/usr/lib//tcl8.5/tclConfig.sh: line 2: dpkg-architecture:
command not found'
To fix this problem (atleast until Tcl/Tk is installed within Maneage), R
is now configured with the '--without-tcltk' option which fixed the
problem. Please see the description above the R installation instructions
in 'reproduce/software/make/high-level.mk' for more. | 
|  | Following the previous commit, we recognized that the 'IFS' terms are not
necessary and can be even cause problems. So all their occurances in the
scripts of Maneage have been removed with this commit. | 
|  | Until a recent commit, the IFS='"' was added at the start of the variables
in this shell script and as a result, the SPACE character wasn't being used
as a delimiter. This caused a major problem when downloading the tarballs
(all the backup servers were considered as the top link).
With this commit we removed these 'IFS' statements). Because we now check
for the existance of meta-characters in the build directory name, there is
no more problem, and also generally both the calling command and
internally, we have double-qutations around the variable names. So removal
of IFS will not affect the result in this scenario.
This bug was found by Mohammadreza Khellat. | 
|  | This paper is generally about data analysis pipelines, so the abstract now
starts with "Analysis pipelines" instead of "Reproducible workflows". I
also noticed that the sentence was mistakenly broken into multiple lines. | 
|  | Only two small conflicts came up:
 * The addition of the hardware architecture macro in 'paper.tex' (which
   was removed for now, but will be added as the referee has requested
   within the text).
 * The usage of "" around directory variables in 'paper.mk'. | 
|  | I saw this link today in the news (to be implemented from November 1st,
2020), and because it is directly related to this work, I added it. Many
people assume that simply pushing a Docker image to DockerHub is enough to
preserve it, but ignore how much it costs to maintain the storage and
network capacity. | 
|  | With the previous commit, we now build Nano by default within Maneage, and
project authors can ask to install Emacs and Vim within 'TARGETS.conf'. So
in the instructions to build within a Docker image have been removed. | 
|  | While a project is under development, the raw analysis software are not the
only necessary software in a project. We also need tools to all the edit
plain-text files within the Maneaged project. Usually people use their
operating system's plain-text editor. However, when working on the project
on a new computer, or in a container, the plain-text editors will have
different versions, or may not be present at all! This can be very annoying
and frustrating!
With this commit, Maneage now installs GNU Nano as part of the basic
tools. GNU Nano is a very simple and small plain text editor (the installed
size is only ~3.5MB, and it is friendly to new users). Therefore, any
Maneaged project can assume atleast Nano will be present (in particular
when no editor is available on the running system!). GNU Emacs and VIM
(both without extra dependencies, in particular without GUI support) are
also optionally available in 'high-level.mk' (by adding them to
'TARGETS.conf').
The basic idea for the more advanced editors (Emacs and VIM) is that
project authors can add their favorite editor while they are working on the
project, but upon publication they can remove them from 'TARGETS.conf'.
A few other minor things came up during this work and are now also fixed:
 - The 'file' program and its libraries like 'libmagic' were linking to
   system's 'libseccomp'! This dependency then leaked into Nano (which
   depends on 'libmagic'). But this is just an extra feature of 'file',
   only for the Linux kernel. Also, we have no dependency on it so far. So
   'file' is not configured to not build with 'libseccomp'.
 - A typo was fixed in the line where the physical core information is
   being read on macOS.
 - The top-level directories when running './project shell' are now quoted
   (in case they have special characters). | 
|  | Until now, no machine-related specifications were being documented in the
workflow. This information can become helpful when observing differences in
the outcome of both software and analysis segments of the workflow by
others (some software may behave differently based on host machine).
With this commit, the host machine's 'hardware class' and 'byte-order' are
collected and now available as LaTeX macros for the authors to use in the
paper. Currently it is placed in the acknowledgments, right after
mentioning the Maneage commit.
Furthermore, the project and configuration scripts are now capable of
dealing with input directory names that have SPACE (and other special
characters) by putting them inside double-quotes. However, having spaces
and metacharacters in the address of the build directory could cause
build/install failure for some software source files which are beyond the
control of Maneage. So we now check the user's given build directory
string, and if the string has any '@', '#', '$', '%', '^', '&', '*', '(',
')', '+', ';', and ' ' (SPACE), it will ask the user to provide a different
directory. | 
|  | When building Maneage inside a Docker container, in the end the users want
to extract the final outputs from the container into their host operating
system to inspect more comfortably. So with this commit, a short
examplanation has been added on how to do this.
We also noticed that it is much better if the 'Dockerfile' is stored and
run in an empty directory, otherwise, it will start parsing the full
directory and its subdirectories as the docker image's environment. | 
|  | Until now, the replicated plot had the width of the full page and the data
lineage graph was under it. Together they were covering more than half of
the height of the page! But the plot showing the number of papers with
tools really doesn't have too much detail, and all the space was being
wasted.
With this commit, the plot is now much much thinner and the data lineage
graph has been fitted to the right of it. | 
|  | Some very minor conflicts came up and were easily corrected. They were
mostly in parts that are also shared with the demonstration in the core
Maneage branch. | 
|  | The '.bbl' suffix in the comment of one call to LaTeX was incorrectly
written as '.bb'. | 
|  | Until now, './project --check-config' would only print the names of the
software that were being built. Besides that, it is also useful to know
which packages have most recently finished.
With this commit, we now print the last 5 built software packages with
'--check-config' also, and the output has also been placed in a row of '='s
to help separate it in each round. Also some more sanity checks have been
added so it doesn't print error messages. | 
|  | Until now, if the software source tarballs already existed on the system
they would be copied inside the project. However, the software source
tarballs are sometimes/mostly larger than their actual product and can
consume significant space (~375 MB in the core branch!).
With this commit, when the software are present on the system, their
symbolic link will be placed in 'BDIR/software/tarballs', not a full
copy. Also, because the tarballs in software tarball directory may
themselves be links, we use 'realpath' to find the final place of the
actual file and link to that location. Therefore if 'realpath' can't be
found (prior to installing Coreutils in Maneage), we will copy the tarballs
from the given software tarball directory. After Maneage has installed
Coreutils, the project's own 'realpath' will be used. Of course, if the
software are downloaded, their full downloaded copy will be kept in
'BDIR/software/tarballs', nothing has changed in the downloading scenario. | 
|  | It was a long time that the Maneage software versions hadn't been updated.
With this commit, the versions of all basic software were checked and 17 of
that had newer versions were updated. Also, 16 high-level programs and
libraries were updated as well as 7 Python modules. The full list is
available below.
Basic Software (affecting all projects)
---------------------------------------
bash            5.0.11 -> 5.0.18
binutils        2.32 -> 2.35
coreutils       8.31 -> 8.32
curl            7.65.3 -> 7.71.1
file            5.36 -> 5.39
gawk            5.0.1 -> 5.1.0
gcc             9.2.0 -> 10.2.0
gettext         0.20.2 -> 0.21
git             2.26.2 -> 2.28.0
gmp             6.1.2 -> 6.2.0
grep            3.3 -> 3.4
libbsd          0.9.1 -> 0.10.0
ncurses         6.1 -> 6.2
perl            5.30.0 -> 5.32.0
sed             4.7 -> 4.8
texinfo         6.6 -> 6.7
xz              5.2.4 -> 5.2.5
Custom programs/libraries
-------------------------
astrometrynet   0.77 -> 0.80
automake        0.16.1 -> 0.16.2
bison           3.6 -> 3.7
cfitsio         3.47 -> 3.48
cmake           3.17.0 -> 3.18.1
freetype        2.9 -> 2.10.2
gdb             8.3 -> 9.2
ghostscript     9.50 -> 9.52
gnuastro        0.11 -> 0.12
libgit2         0.28.2 -> 1.0.1
libidn          1.35 -> 1.36
openmpi         4.0.1 -> 4.0.4
R               3.6.2 -> 4.0.2
python          3.7.4 -> 3.8.5
wcslib          6.4 -> 7.3
yaml            0.2.2 -> 0.2.5
Python modules
--------------
cython          0.29.6 -> 0.29.21
h5py            2.9.0 -> 2.10.0
matplotlib      3.1.1 -> 3.3.0
mpi4py          3.0.2 -> 3.0.3
numpy           1.17.2 -> 1.19.1
pybind11        2.4.3 -> 2.5.0
scipy           1.3.1 -> 1.5.2 | 
|  | When the host C compiler is used (either by calling '--host-cc' or on OSs
that we can't build the GNU C Compiler), Maneage will also not build the
Fortran compiler 'gfortran'. Until now, the './project configure' script
would give a big warning about the need for 'gfortran' and the fact that it
is missing, and would for 5 seconds, but it would continue anyway.
For projects that don't need 'gfortran', this can be confusing to the users
and for those that need 'gfortran', it means that a lot of time and cpu
cycles are wasted compiling non-fortran software that are unusable in the
end.
With this commit, the 'need_gfortarn' variable has been added
'reproduce/software/shell/configure.sh', in a new part that is devoted to
project-specific features. If it equals '0', then the 'gfortran' test (and
message!) isn't done at all, but if it is set to '1', then the configure
stage will halt immediately gfortran is not found and not built.
The default operations of the core Maneage branch don't need 'gfortran', so
by default it is set to 0. But 'gfortran' is necessary for all projects
that use Numpy (Python's numeric library) for example. So if your project
needs 'gfortran', please set this new variable to 1. As mentioned in the
comments of 'configure.sh', ideally we should detect this automatically,
but we haven't had the time to implement it yet. | 
|  | One of the LaTeX macros reported by 'initialize.mk' is the git commit hash
of the most recent 'maneage' branch that the project has been branched
from. However, not all projects will retain the maneage reference. This can
happen for example when people don't push the 'maneage' reference to their
repository and then clone from their own repository to a second
computer. Therefore, until now, in such situations, Maneage would break
with an error.
With this commit, in such scenarios, a place holder string is used instead,
clearly highlighting that there is no 'maneage' reference. | 
|  | Prior to this commit, compilation of OpenMPI used the default OpenMPI
choices of deciding which libraries should be used in relating to a job
scheduler [1] (such as Slurm [2]). Given that the user on a multi-user
cluster has to accept the sysadmin's choice of a job scheduler, the
question of whether to (1) link with OpenMPI's own libraries (and increase
the reproducibility of the science project) or rather (2) link with the
sysadmin managed libraries (more likely to be compatible with the host's
job scheduler), is an open question of which the best strategy for
reproducibility needs to be debated and studied.
In this commit, strategy (1) is adopted. The options '--withpmix=internal'
and '--with-hwloc=internal' are added to the configure command. The working
assumption is that the Maneage version of OpenMPI is likely to be modern
enough to be compatible with the native job scheduler such as
Slurm. Compilation without any 'pmix' option gave a fail in at least one
case; it appears that an external pmix library was sought by the configure
script.
As of OpenMPI 4.0.1, the internal libevent library is used by default, so
there appears to be no option to force it to be chosen internally.
This commit also includes the option '--without-verbs'.  This option
removes a library related to "infiniband", "verbs", "openib" and "BTL";
this library appears to be deprecated. See [3], [4] for discussion.
Please add feedback and discussion to the Maneage task about openmpi
linking strategies (1) (internal) and (2) (external) at Savannah [5].
[1] https://en.wikipedia.org/wiki/Job_scheduler#Batch_queuing_for_HPC_clusters
[2] https://en.wikipedia.org/wiki/Slurm_Workload_Manager - To avoid a name
    clash, 'slurm-wlm' is the metapackage in Debian for the client
    commands, the compute node daemon, and the central node daemon. An
    unrelated package 'slurm' also exists.
[3] https://www-lb.open-mpi.org/faq/?category=openfabrics#ofa-device-error
[4] https://www-lb.open-mpi.org/faq/?category=building
[5] https://savannah.nongnu.org/task/index.php?15737 | 
|  | Roukema+2020 (arXiv:2007.11779) is a newly published (as preprint) paper
that uses Maneage, so it is being added to the list of published or
submitted papers in 'README-hacking.md'. The Software Heritage URL sticks
out way beyond the standard number of columns in the plain text form of the
updated 'README-hacking.md' file, when rendered using markdown, it
shouldn't look so bad.
Also, see the related task https://savannah.nongnu.org/task/index.php?15736
(Raul+2020 should be Infante-Sainz+2020) for a suggestion of a more
standard machine-readable format.
It should be mentioned and emphasised to the reader that one should very
carefully and obediently note and pay attention to the noteworthy fact that
a few distracting words [1] such as "Note that" are removed in this
commit. ;)
[1] https://en.wiktionary.org/wiki/pontification |