| Age | Commit message (Collapse) | Author | Lines | 
|---|
|  | The project configuration requires a build-directory at configuration time,
two other directories can optionally be given to avoid downloading the
project's necessary data and software. It is possible to give these three
directories as command-line options, or by interactively giving them after
running the configure script.
Until now, when these directories weren't given as command-line options,
and the running shell was non-interactive, the configure script would crash
on the line trying to interactively read the user's given directories (the
'read' command).
With this commit, all the 'read' commands for these three directories are
now put within an 'if' statement. Therefore, when 'read' fails (the shell
is non-interactive), instead of a quiet crash, a descriptive message is
printed, telling the user that cause of the problem, and suggesting a fix.
This bug was found by Michael R. Crusoe. | 
|  | Until now, the description of the input-data directory at configure time
included a description of the input data (created by reading the values of
'INPUTS.conf'). Maintaining this is easy for a single dataset, but it
becomes hard for a general project which may need many input datasets.
To avoid extra complexity (for maintaining this list), the description now
points a user of the project to the 'INPUTS.conf' file and asks them to
look inside of it for seeing the necessary data. This infact helps with the
users becoming familiar with the internal structure of Maneage and will
allow the authors to focus on not having to worry about updating the
low-level 'configure.sh' script. | 
|  | When './project configure' is run, after the basic checks of the compiler,
a small statement is printed telling the user that some configuration
questions will now be asked to start building Maneage on the system. Until
now this description was confusing: it lead the reader to think that the
local configuration (which was recommended to read before continuing) is in
another file.
With this commit, the text has been edited to explictly mention that the
description of the steps following this notice should be read
carefully. Thus avoiding that confusion.
This issue was mentioned by Michael R. Crusoe. | 
|  | The default 'paper.tex' starts by defining some macros and comments
describing them. Until now, the text was not too clear and could be
confusing for someone that is not at all familiar with Maneage.
With this commit, the comments have been edited to be more clear for a
first-time reader. For example they all start with FULL CAPS
summaries.
Two other small things were corrected in 'tex/src/preamble-necessary.tex':
 - Until now 'project.tex' was included in this preamble. However, because
   of its importance in Maneage, and prominent place in the demonstration
   plot of the paper introducing Maneage, it is now included directly in
   'paper.tex'. This also allows users to safely ignore/delete this
   preamble file if their LaTeX style is different.
 - I noticed that some macros for some astronomical software names from the
   very first commits in Maneage were still present here! They are no
   longer used, so they have been removed. | 
|  | Until now, we were saying "POSIX is defined by the IEEE", but in issue #12,
Michael Crusoe pointed out that this is not accurate. It is actually
jointly developed and operated by the IEEE, The Open Group and ISO/IEC JTC
1/SC 22, which together form the Austin Group.
So the sentence was modified to say tha the IEEE (potential publisher of
this paper) is part of the Austin Group that develops the POSIX standard.
Thanks a lot for bringing this up Michael. | 
|  | Until now, we were using three EPS (created from SVG) that were downloaded
from https://www.flaticon.com. Therefore it was necessary to acknowledge
the creators and put a link to the webpage. This consumed space in the
caption and decreased the originality of the plot.
Another problem was that the "collaboration" icon (with three people in it)
had arrows, and some of those arrows pointed downwards, make ambiguity in
relation to the top-ward arrows under the commits.
With this commit, three alternative icons are added that I made from
scratch, using Inkscape. The collaboration icon now is two figures and two
speech-bubbles, without any arrows. | 
|  | Recently, by default, Maneage will not take the title directly in the PDF,
the title should be given in the 'metdata.conf' file and it is passed onto
LaTeX as a variable. So the comment to "add project title" in the listing
could be confusing. To avoid confusing, I edited it to "Set your name as
author". The comments above the '\title' part is very complete and users
will clearly be able to modify the title if they want.
Also, we had an extra ')' in the line just under it which is now corrected. | 
|  | As described in Maneage's commit 2bd2e2f18 (which I found while testing
this project), the existing download recipe had problems when using a local
copy of the input dataset. It was first fixed here, then implemented there.
Also, to clarify things for a new user, some long comments were added at
the top of 'INPUTS.conf' to describe each of the variables, that comment
has also been put here (and is also in commit 2bd2e2f18 of Maneage). | 
|  | The text of the default paper hadn't been changed for a very long time! In
this time, three papers using Maneage have been published (which can be
very good as an example), Maneage also now has a webpage!
With these commit these examples and the webpage have been added and
generally it was also polished a little to hopefully be more useful. | 
|  | Summary of possible semantic conflicts
 1. The recipe to download input datasets has been modified. You have to
    re-set the old 'origname' variable to 'localname' (to avoid confusion)
    and the default dataset URL should now be complete (including the
    actual filename). See the newly added descriptions in 'INPUTS.conf' for
    more on this.
Until now, when the dataset was already present on the host system, a link
couldn't be made to it, causing the project to crash in the checksum
phase. This has been fixed with properly naming the main variable as
'localname' to avoid the confusion that caused it.
Some other problems have been fixed in this recipe in the meantime:
 - When the checksum is different, the expected and calculated checksums
   are printed.
 - In the default paper, we now print the full URL of the dataset, not just
   the server, so the checksum of the 'download.tex' step has been updated. | 
|  | Two words were corrected in the text that made the sentences grammatically
wrong (they were actually typos! historically they were correct, but we
later changed the later part of the sentence without fixing the first
part). | 
|  | Until now, in the 'print-copyright' function of 'initialize.mk' (that
prints a fixed set of common meta necessary in plain-text files), we were
simply printing this line:
  # Pre-print server: arXiv:1234.56789
But given that all the other elements are click-able URLs, it now prints:
  # Pre-print server: https://arxiv.org/abs/1234.56789 | 
|  | There were two small warnings that are removed with this commit:
 - In the end, when we print the number of words in the PDF, we hadn't
   accounted for the fact that 'paper.pdf' doesn't always exist (for
   example when './project make clean' is run). So a check was added to
   only print the number of words when a PDF exists.
 - I noticed that the '$(texdir)/to-publish' directory was being built both
   in 'initialize.mk' and in 'demo-plot.mk'. So the one in 'demo-plot.mk'
   has been removed. | 
|  | Some minor conflicts came up in 'initialize.mk' and 'verify.mk'. For the
former, I chose the version on Maneage, for the latter, I kept the 'master'
version on the checksums of this project, but kept the Maneage version for
the rest of the improvements there (like printing the verified files as
LaTeX comments in 'verify.tex'.
While testing the conflicts, I noticed a bug (in the LaTeX macro for the
number of years in the Menke+20 paper) in the previous build, thanks to the
verification step :-)! Fortunately it wasn't actually printed in the PDF,
so a normal reader won't recognize.
The bug was caused by the recently added meta-data/commented lines in the
'tools-per-year.txt' file: when calculating the number of years studied in
that paper, we were simply counting all the lines and we had forgot to
correct this after adding comments. As a result, the un-used LaTeX macro
file was saying that they have studied 47 years instead of the real 31
years! This element was actually used in the very first (+40 page!) draft
of the paper that was summarized to fit into the journal limits. | 
|  | The git history of the project is now archived on SoftwareHeritage and a
link to it as was added in the "Reproducible supplement" tag just under the
abstract.
Also, some corrections were also made in the text. In particular, the part
explaining the separation of software and data reproducibility was slightly
clarified to be more clear | 
|  | Possible semantic conflicts (that may not show up as Git conflicts but may
cause a crash in your project after the merge):
   1) The project title (and other basic metadata) should be set in
      'reproduce/analysis/conf/metadata.conf'. Please include this file in
      your merge (if it is ignored because of '.gitattributes'!).
   2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if
      you have added all analysis Makefiles to the '.gitattributes' file
      (thus not merging any change in them with your branch). For example
      with this command:
        git diff master...maneage -- reproduce/analysis/make/initialize.mk
   3) The old 'verify-txt-no-comments-leading-space' function has been
      replaced by 'verify-txt-no-comments-no-space'. The new function will
      also remove all white-space characters between the columns (not just
      white space characters at the start of the line). Thus the resulting
      check won't involve spacing between columns.
A common set of steps are always necessary to prepare a project for
publication. Until now, we would simply look at previous submissions and
try to follow them, but that was prone to errors and could cause
confusion. The internal infrastructure also didn't have some useful
features to make good publication possible. Now that the submission of a
paper fully devoted to the founding criteria of Maneage is complete
(arXiv:2006.03018), it was time to formalize the necessary steps for easier
submission of a project using Maneage and implement some low-level features
that can make things easier.
With this commit a first draft of the publication checklist has been added
to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018
and zenodo.3872248. To help guide users on implementing the good practices
for output datasets, the outputs of the default project shown in the paper
now use the new features). After reading the checklist, please inspect
these.
Some other relevant changes in this commit:
  - The publication involves a copy of the necessary software
    tarballs. Hence a new target ('dist-software') was also added to
    package all the project's software tarballs in one tarball for easy
    distribution.
  - A new 'dist-lzip' target has been defined for those who want to
    distribute an Lzip-compressed tarball.
  - The '\includetikz' LaTeX macro now has a second argument to allow
    configuring the '\includegraphics' call when the plot should not be
    built, but just imported. | 
|  | Upon submission to CiSE we were informed that the abstract has to be less
than 150 words to be processed. So with this commit, I am shrinking the
abstract slightly, trying to remove some points that are less important and
trying to shrink some of the sentences.
Also, to avoid confusion and be more clear, the term "temporal provenance"
has been replaced by "Recorded history". | 
|  | Until now, when the figures were built directly from EPS
('\newcommand{\makepdf}{}' was commented), they would take the full
line-width becoming a little too large! I noticed this after letting arXiv
build the PDF.
With this commit, the 'includetikz' tool takes a second argument to be a
parameter given to 'includegraphics' (which is scale in this case). | 
|  | Everything else regarding the submission to arXiv and Zenodo has been
complete, so I done a final read, making some minor edits to hopefully make
the text easier to read. | 
|  | The previous explanation was not too clear and simply following it was
confusing. The issue was that with the tarball you have three scenarios: 1)
only build the PDF using existing figures. 2) only build the PDF, but build
the figures yourself, 3) build the full Maneaged project.
Hopefully this distinction is now more clear from the README.md file. | 
|  | Some extra explanation can help the user understand the difference between
a Git-based project and a distributed tarball. | 
|  | When the project is being re-built from the tarball (not the Git
repository), the 'tex/build' and 'tex/tikz' addresses are actual
directories, not symbolic links. In this case, when someone runs './project
configure', it will complain about not being able to delete them (it
assumes they are symbolic links!).
So with this commit, we first check if they are deletable without '-r'. If
so, then they are full directories and we rename them to a backup directory
to allow the rest of the project to continue building a link there. | 
|  | This paper doesn't use pdflatex or biblatex, so it was necessary to make
some small corrections in the make-dist rule of initialize.mk. Also, while
testing the upload on arXiv, I noticed that it complains about an empty
'verify.tex' file, so that is also corrected. | 
|  | All the steps following the to-be-added (in 'README-hacking.md')
publication checklist prior to the final check from new clone have been
added:
 - 'README.md' file has been set.
 - "Reproducible supplement" was added just above the keywords, pointing to
   Zenodo.
 - A link to the to-be-uploaded data underlying the plot was added in the
   caption of the tools-per-year plot.
 - A new meta-data configuration file was added to store basic project
   metadata to be used throughout the project. This will later be taken
   into Maneage. For examle the project title is now stored here and
   written into the paper's LaTeX source and output datasets automatically.
 - Verification was activated and plot's data and LaTeX macro files are now
   automatically verified.
 - A complete metadata was added for the data underlying the plot.
 - A generic function was added in 'initialize.mk' that will automatically
   write project info and copyright in all plain-text outputs. | 
|  | The recently added description for this step in the last commit needed some
edits to be more clear and encourage re-building the project from scratch
anytime authors merge with Maneage. | 
|  | The minor conflict was with 'reproduce/software/make/high-level.mk', and in
particular because we implemented the fix to Maneage's Task #15664 in this
project first. After it was moved to the main Maneage branch some minor
stylistic corrections were done to it, thus causing the conflict. To
resolve the conflict, I simply imported the full Maneage version of the
file with this command:
  git checkout maneage -- reproduce/software/make/high-level.mk
The other conflicts were due to the deleted files (that were resolved as
described in 'README-hacking.md') and the LaTeX files that I had told
'.gitattributes' to ignore from the Maneage branch. | 
|  | When some files should not be merged, until now we were suggesting to also
add deleted files to the '.gitattributes' file. However, this feature of
Git doesn't work for deleted files and they would still show up in the
'master' branch after a merge.
So with this commit, we have added a simple AWK command to run after a
merge that will automatically detect and delete such files (using the
output of 'git status --porcelain').
Also, two minor typos were corrected in the newly added
'servers-backup.conf' file: the copyright year was wrong and there was no
new-line at the end of the file (a good convention!). | 
|  | Following a test merge, I noticed that the '.gitattributes' file is not
doing anything about the deleted files and also that all the files in
'tex/src/*.txt' should be added (they are too project-specific). So now it
only includes the files that aren't deleted.
For the files that are deleted, in the Maneage 'README-hacking.md' file, I
added an AWK command to easily remove them. | 
|  | I noticed that we hadn't include the publication of the workflow and the
advantage that Maneage provides in this regard. So it was added at the end
of the proof-of-concept section. However, it was necessary to summarize
some other parts to not increase the wordcount. | 
|  | Until now, Maneage would only build Flock before building everything else
using Make (calling 'basic.mk') in parallel. Flock was necessary to avoid
parallel downloads during the building of software (which could cause
network problems). But after recently trying Maneage on FreeBSD (which is
not yet complete, see bug #58465), we noticed that the BSD implemenation of
Make couldn't parse 'basic.mk' (in particular, complaining with the 'ifeq'
parts) and its shell also had some peculiarities.
It was thus decided to also install our own minimalist shell, Make and
compressor program before calling 'basic.mk'. In this way, 'basic.mk' can
now assume the same GNU Make features that high-level.mk and python.mk
assume. The pre-make building of software is now organized in
'reproduce/software/shell/pre-make-build.sh'.
Another nice feature of this commit is for macOS users: until now the
default macOS Make had problems for parallel building of software, so
'basic.mk' was built in one thread. But now that we can build the core
tools with GNU Make on macOS too, it uses all threads. Furthermore, since
we now run 'basic.mk' with GNU Make, we can use '.ONESHELL' and don't have
to finish every line of a long rule with a backslash to keep variables and
such.
Generally, the pre-make software are now organized like this: first we
build Lzip before anything else: it is downloaded as a simple '.tar' file
that is not compressed (only ~400kb). Once Lzip is built, the pre-make
phase continues with building GNU Make, Dash (a minimalist shell) and
Flock. All of their tarballs are in '.tar.lz'. Maneage then enters
'basic.mk' and the first program it builds is GNU Gzip (itself packaged as
'.tar.lz'). Once Gzip is built, we build all the other compression software
(all downloaded as '.tar.gz'). Afterwards, any compression standard for
other software is fine because we have it.
In the process, a bug related to using backup servers was found in
'reproduce/analysis/bash/download-multi-try' for calling outside of
'basic.mk' and removed Bash-specific features. As a result of that bug-fix,
because we now have multiple servers for software tarballs, the backup
servers now have their own configuration file in
'reproduce/software/config/servers-backup.conf'. This makes it much easier
to maintain the backup server list across the multiple places that we need
it.
Some other minor fixes:
 - In building Bzip2, we need to specify 'CC' so it doesn't use 'gcc'.
 - In building Zip, the 'generic_gcc' Make option caused a crash on FreeBSD
   (which doesn't have GCC).
 - We are now using 'uname -s' to specify if we are on a Linux kernel or
   not, if not, we are still using the old 'on_mac_os' variable.
 - While I was trying to build on FreeBSD, I noticed some further
   corrections that could help. For example the 'makelink' Make-function
   now takes a third argument which can be a different name compared to the
   actual program (used for examle to make a link to '/usr/bin/cc' from
   'gcc'.
 - Until now we didn't know if the host's Make implementation supports
   placing a '@' at the start of the recipe (to avoid printing the actual
   commands to standard output). Especially in the tarball download phase,
   there are many lines that are printed for each download which was really
   annoying. We already used '@' in 'high-level.mk' and 'python.mk' before,
   but now that we also know that 'basic.mk' is called with our custom GNU
   Make, we can use it at the start for a cleaner stdout.
 - Until now, WCSLIB assumed a Fortran compiler, but when the user is on a
   system where we can't install GCC (or has activated the '--host-cc'
   option), it may not be present and the project shouldn't break because
   of this. So with this commit, when a Fortran compiler isn't present,
   WCSLIB will be built with the '--disable-fortran' configuration option.
This commit (task #15667) was completed with help/checks by Raul
Infante-Sainz and Boud Roukema. | 
|  | These are some corrections that David sent to me by email and I am
committing here. | 
|  | Antonio Diaz Diaz (author of the Lzip program/library), has had a very
supportive role in what became Maneage in the last 4 years. For example I
really started to appreciate the value of simplicity and archivability
while reading Lzip's documentation.
Fortunately he also read a recent version of the paper that was again very
supportive. Some of the minor points he raised had already been fixed, but
using 'supplier' instead of 'server' (in the Free Software) criterion was
new so I implemented it here with this commit. With this, I am also
thanking him for all his wonderful support and encouragement in the last 4
years. | 
|  | Boud's point about a "random reader" not being a good example case was
correct. But "user" also gives it a software perspective that is ofcourse
not wrong, its can just be confusing. So I thought of changing it to
"interested reader".
In the part about the C-library dependency of high-level software, from
Boud's correction, I found out that it is very hard to convey what I wanted
to say (that separating errors due to C-library implementation and
measurement errors will be easy, because they should be on much different
scales). But I then corrected it to give it a slightly better tone while
mentioning the same thing: that with Maneage we can now accurately measure
the effect of the C library. | 
|  | Changes with this commit are mostly minor and obvious. Some worth
commenting on include:
* `technologies develop very fast` - As a general statement, this
is too jargony, since technology is much wider than just
`software`; `some technologies` makes it clear that we're referring
to the specific case of the previous sentence
* `in a functional-like paradigm, enabling exact provenance` -
While `make` is not an imperative programming language, I don't
see how `make` is `like` a functional programming language.
Classifying it as a declarative and a dataflow programming
language and as a metaprogramming language would seem to go in
the right direction [1-3]. I also couldn't see how the language
type relates to tracking exact provenance.
But since we don't want to lengthen the text, my proposal is to
put `and efficient in managing exact provenance` without trying
to explain this in terms of a taxonomy of programming languages.
[1] https://en.wikipedia.org/wiki/Functional_programming
[2] https://en.wikipedia.org/wiki/Comparison_of_multi-paradigm_programming_languages
[3] https://en.wikipedia.org/wiki/Dataflow_programming
* `A random reader` - In the scientific programming context, `random`
has quite specific meanings which we are not using here; a `reader`
has not necessarily tried to reproduce the project. So I've proposed
`A user` here - with the idea that a `user` is more likely to be someone
who has done `./project configure && ./project make`.
* `studying this is another research project` - the present tense `is`
doesn't sound so good; I've put what seems to be about the shortest
natural equivalent.
Pdf word count: 5856 | 
|  | An "internally" was added to the part about core GNU tools accounting for
the differences between POSIX-compatible systems. One extra word was also
removed in the next sentence. | 
|  | Hopefully, it is more to the point with these few word-corrections. | 
|  | Konrad raised some very interesting points in particular about the
limitations of POSIX as a fuzzy standard that does not guaratee
reproducibility. A relatively long paragraph was thus added in the
discussion to address this important point.
In order to fit it in, the paragraph on "unwanted competition" was removed
since the POSIX issue was much more relevant for a curious
reader. Throughout the text, some other parts were edited to decrease the
length of the paper while making it easier to read. | 
|  | Some of the redundant sentences have been removed and some minor edits
made. | 
|  | The changes in this commit are best shown with `git diff --word-diff`
or `git patch --word-diff`. There are about half a dozen changes
of 1-2 words or a comma, the reasons should be obvious.
The sentence with "can not just" seems to be correct formally, but
"can not only" seems to me better to warn the reader that this
is a phrase of the form "can not only do X but can also do Y";
"can not just" sounds a bit like "You cannot just enter the room
without knocking" - it doesn't require a second part. | 
|  | One major point was that following Konrad's suggestion the issue of not
being familiar with the Lisp/Scheme framework of GWL is now removed. We
actually mention the main problem we have had with Guix, but also highlight
that their solution was one of the main inspirations for this work. | 
|  | Until this commit, when the user have previous TeX tarball already
present, the project crashed when trying to re-configure, if there was a
newer version of TeX. This is because TeX are updated yearly. With this
commit, this bug has been fixed. Now, during the installation of TeX, it
checks if this problem happens. If this is the case, then it moves the
old tarball, download the new one and install it. If not, it will just
install the already present tarball or crash because of any other
reason. This probem was recurrent, and each time TeX was updated, the
previous tarball had to be removed manually. But now, with this commit,
it is done automatically. The detection and fix of this bug has been
possible with the help of Mohammad Akhlaghi, thanks! | 
|  | Until this commit, there was only a small description of me. With this
commit, I have added a small paragraph with my biography. I know we are
very restricted because of the word limit so I tried to be very short! | 
|  | With this commit, I have corrected several minor typos. | 
|  | With this commit, I did some minor changes in these Sections. Main
changes are: define the contraction `OS' from Operating System and use
only `OS' later on, and not use contractions like `isn't' | 
|  | Before this commit: Roberto's bio was about 120 words. With this
commit: it is now less than 100 words. A comment about
reproducibility has been added. | 
|  | Publishing a paper on reproducible research without making it easy for
readers to read the references would defeat the point. Of course we have to
make some compromises with some journals' reluctance to shift towards the
free world, but to satisfy scientific ethics, we should at least provide
clickable URLs to the references, preferably to the ArXiv version if
available [1], and also to the DOI, again, preferably to an open-access
version of the URL if available.
I was not able to fully get this done in the .bst file, so there's an
sed/tr hack done to the .bbl file in `reproduce/analysis/make/paper.mk` to
tidy up commas and spaces.
This commit also reverts some of the hacks in the Akhlaghi IAU Symposium
`tex/src/references.tex` entry, to match the improved .bst file,
`tex/src/IEEEtran_openaccess.bst`, provided here with a different name to
the original, in order to satisfy the LaTeX licence.
[1] https://cosmo.torun.pl/blog/arXiv_refs | 
|  | David and Raul had both reported that because 'pdftotext' wasn't available
on their system, the project failed (even though the PDF was built!). So
with this commit, we first check if the system has 'pdftotext' and call it
only if its is available.
Some minor edits were made, building upon Boud's previous commit. | 
|  | This commit provides mostly small changes. There didn't seem much point in
repeating the `lessons learned` jargon and claiming that we draw good
conclusions - insights - from our experience.  Better just state what
hypotheses we have generated from the experience rather than give the
misleading impression that our hypotheses are well-established facts. In
the comments, I put a suggested translation of what the `lessons learned`
jargon means. I seem to have first heard this term in the mainstream media
a few years after the US 2003 attack on Iraq, when a US military
representative stated that the US forces had "learned lessons" after having
started a war of aggression against Iraq. | 
|  | This commit changes two lines.
 (1) Keeping the exact quote with the clerk while having a sentence that
     makes sense in plain English cannot be done, it seems to me, without
     making the sentence a bit longer. Here's one option that seems about
     the best we can do, even though it still sounds a bit funny, because
     it's hard to write a future conditional with the present "can". Since
     it's a quote, it will probably survive the proofreaders.
 (2) Software is an uncountable noun [1], so we say "software is", like
     "water is"; "used software" sounds odd; I added "is itself" to
     emphasise that we're especially talking about the full chain of
     software for running the project. This commit modifies the "When the
     ..." sentence and hopefully sounds better.
[1] https://en.wiktionary.org/wiki/software#Noun | 
|  | To help show the simplicity of 'top-make.mk', it was included as a
listing. I also went over some of Boud's corrections and made small
edits. In particular:
 - The '\label' and '\ref' to a section were removed. I done this after
   inspecting some of their recent papers and noticing that they generally
   have a simple flow, without such redirections.
 - In the part about the RDA adoption grant, I moved the "from the
   researcher perspective" to the end. Because Austin+2017 is mainly
   focused on data-center management, not the researcher's. They do touch
   upon researcher solutions that can help data-base managers, but not
   directly the researchers. In effect with this grant, they acknowledged
   that our researcher-focused solution confirms with their criteria for
   data-base management. |