Age | Commit message (Collapse) | Author | Lines |
|
Until now, we were using three EPS (created from SVG) that were downloaded
from https://www.flaticon.com. Therefore it was necessary to acknowledge
the creators and put a link to the webpage. This consumed space in the
caption and decreased the originality of the plot.
Another problem was that the "collaboration" icon (with three people in it)
had arrows, and some of those arrows pointed downwards, make ambiguity in
relation to the top-ward arrows under the commits.
With this commit, three alternative icons are added that I made from
scratch, using Inkscape. The collaboration icon now is two figures and two
speech-bubbles, without any arrows.
|
|
Recently, by default, Maneage will not take the title directly in the PDF,
the title should be given in the 'metdata.conf' file and it is passed onto
LaTeX as a variable. So the comment to "add project title" in the listing
could be confusing. To avoid confusing, I edited it to "Set your name as
author". The comments above the '\title' part is very complete and users
will clearly be able to modify the title if they want.
Also, we had an extra ')' in the line just under it which is now corrected.
|
|
The text of the default paper hadn't been changed for a very long time! In
this time, three papers using Maneage have been published (which can be
very good as an example), Maneage also now has a webpage!
With these commit these examples and the webpage have been added and
generally it was also polished a little to hopefully be more useful.
|
|
Two words were corrected in the text that made the sentences grammatically
wrong (they were actually typos! historically they were correct, but we
later changed the later part of the sentence without fixing the first
part).
|
|
The git history of the project is now archived on SoftwareHeritage and a
link to it as was added in the "Reproducible supplement" tag just under the
abstract.
Also, some corrections were also made in the text. In particular, the part
explaining the separation of software and data reproducibility was slightly
clarified to be more clear
|
|
Possible semantic conflicts (that may not show up as Git conflicts but may
cause a crash in your project after the merge):
1) The project title (and other basic metadata) should be set in
'reproduce/analysis/conf/metadata.conf'. Please include this file in
your merge (if it is ignored because of '.gitattributes'!).
2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if
you have added all analysis Makefiles to the '.gitattributes' file
(thus not merging any change in them with your branch). For example
with this command:
git diff master...maneage -- reproduce/analysis/make/initialize.mk
3) The old 'verify-txt-no-comments-leading-space' function has been
replaced by 'verify-txt-no-comments-no-space'. The new function will
also remove all white-space characters between the columns (not just
white space characters at the start of the line). Thus the resulting
check won't involve spacing between columns.
A common set of steps are always necessary to prepare a project for
publication. Until now, we would simply look at previous submissions and
try to follow them, but that was prone to errors and could cause
confusion. The internal infrastructure also didn't have some useful
features to make good publication possible. Now that the submission of a
paper fully devoted to the founding criteria of Maneage is complete
(arXiv:2006.03018), it was time to formalize the necessary steps for easier
submission of a project using Maneage and implement some low-level features
that can make things easier.
With this commit a first draft of the publication checklist has been added
to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018
and zenodo.3872248. To help guide users on implementing the good practices
for output datasets, the outputs of the default project shown in the paper
now use the new features). After reading the checklist, please inspect
these.
Some other relevant changes in this commit:
- The publication involves a copy of the necessary software
tarballs. Hence a new target ('dist-software') was also added to
package all the project's software tarballs in one tarball for easy
distribution.
- A new 'dist-lzip' target has been defined for those who want to
distribute an Lzip-compressed tarball.
- The '\includetikz' LaTeX macro now has a second argument to allow
configuring the '\includegraphics' call when the plot should not be
built, but just imported.
|
|
Upon submission to CiSE we were informed that the abstract has to be less
than 150 words to be processed. So with this commit, I am shrinking the
abstract slightly, trying to remove some points that are less important and
trying to shrink some of the sentences.
Also, to avoid confusion and be more clear, the term "temporal provenance"
has been replaced by "Recorded history".
|
|
Until now, when the figures were built directly from EPS
('\newcommand{\makepdf}{}' was commented), they would take the full
line-width becoming a little too large! I noticed this after letting arXiv
build the PDF.
With this commit, the 'includetikz' tool takes a second argument to be a
parameter given to 'includegraphics' (which is scale in this case).
|
|
Everything else regarding the submission to arXiv and Zenodo has been
complete, so I done a final read, making some minor edits to hopefully make
the text easier to read.
|
|
All the steps following the to-be-added (in 'README-hacking.md')
publication checklist prior to the final check from new clone have been
added:
- 'README.md' file has been set.
- "Reproducible supplement" was added just above the keywords, pointing to
Zenodo.
- A link to the to-be-uploaded data underlying the plot was added in the
caption of the tools-per-year plot.
- A new meta-data configuration file was added to store basic project
metadata to be used throughout the project. This will later be taken
into Maneage. For examle the project title is now stored here and
written into the paper's LaTeX source and output datasets automatically.
- Verification was activated and plot's data and LaTeX macro files are now
automatically verified.
- A complete metadata was added for the data underlying the plot.
- A generic function was added in 'initialize.mk' that will automatically
write project info and copyright in all plain-text outputs.
|
|
I noticed that we hadn't include the publication of the workflow and the
advantage that Maneage provides in this regard. So it was added at the end
of the proof-of-concept section. However, it was necessary to summarize
some other parts to not increase the wordcount.
|
|
These are some corrections that David sent to me by email and I am
committing here.
|
|
Antonio Diaz Diaz (author of the Lzip program/library), has had a very
supportive role in what became Maneage in the last 4 years. For example I
really started to appreciate the value of simplicity and archivability
while reading Lzip's documentation.
Fortunately he also read a recent version of the paper that was again very
supportive. Some of the minor points he raised had already been fixed, but
using 'supplier' instead of 'server' (in the Free Software) criterion was
new so I implemented it here with this commit. With this, I am also
thanking him for all his wonderful support and encouragement in the last 4
years.
|
|
Boud's point about a "random reader" not being a good example case was
correct. But "user" also gives it a software perspective that is ofcourse
not wrong, its can just be confusing. So I thought of changing it to
"interested reader".
In the part about the C-library dependency of high-level software, from
Boud's correction, I found out that it is very hard to convey what I wanted
to say (that separating errors due to C-library implementation and
measurement errors will be easy, because they should be on much different
scales). But I then corrected it to give it a slightly better tone while
mentioning the same thing: that with Maneage we can now accurately measure
the effect of the C library.
|
|
Changes with this commit are mostly minor and obvious. Some worth
commenting on include:
* `technologies develop very fast` - As a general statement, this
is too jargony, since technology is much wider than just
`software`; `some technologies` makes it clear that we're referring
to the specific case of the previous sentence
* `in a functional-like paradigm, enabling exact provenance` -
While `make` is not an imperative programming language, I don't
see how `make` is `like` a functional programming language.
Classifying it as a declarative and a dataflow programming
language and as a metaprogramming language would seem to go in
the right direction [1-3]. I also couldn't see how the language
type relates to tracking exact provenance.
But since we don't want to lengthen the text, my proposal is to
put `and efficient in managing exact provenance` without trying
to explain this in terms of a taxonomy of programming languages.
[1] https://en.wikipedia.org/wiki/Functional_programming
[2] https://en.wikipedia.org/wiki/Comparison_of_multi-paradigm_programming_languages
[3] https://en.wikipedia.org/wiki/Dataflow_programming
* `A random reader` - In the scientific programming context, `random`
has quite specific meanings which we are not using here; a `reader`
has not necessarily tried to reproduce the project. So I've proposed
`A user` here - with the idea that a `user` is more likely to be someone
who has done `./project configure && ./project make`.
* `studying this is another research project` - the present tense `is`
doesn't sound so good; I've put what seems to be about the shortest
natural equivalent.
Pdf word count: 5856
|
|
An "internally" was added to the part about core GNU tools accounting for
the differences between POSIX-compatible systems. One extra word was also
removed in the next sentence.
|
|
Hopefully, it is more to the point with these few word-corrections.
|
|
Konrad raised some very interesting points in particular about the
limitations of POSIX as a fuzzy standard that does not guaratee
reproducibility. A relatively long paragraph was thus added in the
discussion to address this important point.
In order to fit it in, the paragraph on "unwanted competition" was removed
since the POSIX issue was much more relevant for a curious
reader. Throughout the text, some other parts were edited to decrease the
length of the paper while making it easier to read.
|
|
Some of the redundant sentences have been removed and some minor edits
made.
|
|
The changes in this commit are best shown with `git diff --word-diff`
or `git patch --word-diff`. There are about half a dozen changes
of 1-2 words or a comma, the reasons should be obvious.
The sentence with "can not just" seems to be correct formally, but
"can not only" seems to me better to warn the reader that this
is a phrase of the form "can not only do X but can also do Y";
"can not just" sounds a bit like "You cannot just enter the room
without knocking" - it doesn't require a second part.
|
|
One major point was that following Konrad's suggestion the issue of not
being familiar with the Lisp/Scheme framework of GWL is now removed. We
actually mention the main problem we have had with Guix, but also highlight
that their solution was one of the main inspirations for this work.
|
|
Until this commit, there was only a small description of me. With this
commit, I have added a small paragraph with my biography. I know we are
very restricted because of the word limit so I tried to be very short!
|
|
With this commit, I have corrected several minor typos.
|
|
With this commit, I did some minor changes in these Sections. Main
changes are: define the contraction `OS' from Operating System and use
only `OS' later on, and not use contractions like `isn't'
|
|
Before this commit: Roberto's bio was about 120 words. With this
commit: it is now less than 100 words. A comment about
reproducibility has been added.
|
|
Publishing a paper on reproducible research without making it easy for
readers to read the references would defeat the point. Of course we have to
make some compromises with some journals' reluctance to shift towards the
free world, but to satisfy scientific ethics, we should at least provide
clickable URLs to the references, preferably to the ArXiv version if
available [1], and also to the DOI, again, preferably to an open-access
version of the URL if available.
I was not able to fully get this done in the .bst file, so there's an
sed/tr hack done to the .bbl file in `reproduce/analysis/make/paper.mk` to
tidy up commas and spaces.
This commit also reverts some of the hacks in the Akhlaghi IAU Symposium
`tex/src/references.tex` entry, to match the improved .bst file,
`tex/src/IEEEtran_openaccess.bst`, provided here with a different name to
the original, in order to satisfy the LaTeX licence.
[1] https://cosmo.torun.pl/blog/arXiv_refs
|
|
David and Raul had both reported that because 'pdftotext' wasn't available
on their system, the project failed (even though the PDF was built!). So
with this commit, we first check if the system has 'pdftotext' and call it
only if its is available.
Some minor edits were made, building upon Boud's previous commit.
|
|
This commit provides mostly small changes. There didn't seem much point in
repeating the `lessons learned` jargon and claiming that we draw good
conclusions - insights - from our experience. Better just state what
hypotheses we have generated from the experience rather than give the
misleading impression that our hypotheses are well-established facts. In
the comments, I put a suggested translation of what the `lessons learned`
jargon means. I seem to have first heard this term in the mainstream media
a few years after the US 2003 attack on Iraq, when a US military
representative stated that the US forces had "learned lessons" after having
started a war of aggression against Iraq.
|
|
This commit changes two lines.
(1) Keeping the exact quote with the clerk while having a sentence that
makes sense in plain English cannot be done, it seems to me, without
making the sentence a bit longer. Here's one option that seems about
the best we can do, even though it still sounds a bit funny, because
it's hard to write a future conditional with the present "can". Since
it's a quote, it will probably survive the proofreaders.
(2) Software is an uncountable noun [1], so we say "software is", like
"water is"; "used software" sounds odd; I added "is itself" to
emphasise that we're especially talking about the full chain of
software for running the project. This commit modifies the "When the
..." sentence and hopefully sounds better.
[1] https://en.wiktionary.org/wiki/software#Noun
|
|
To help show the simplicity of 'top-make.mk', it was included as a
listing. I also went over some of Boud's corrections and made small
edits. In particular:
- The '\label' and '\ref' to a section were removed. I done this after
inspecting some of their recent papers and noticing that they generally
have a simple flow, without such redirections.
- In the part about the RDA adoption grant, I moved the "from the
researcher perspective" to the end. Because Austin+2017 is mainly
focused on data-center management, not the researcher's. They do touch
upon researcher solutions that can help data-base managers, but not
directly the researchers. In effect with this grant, they acknowledged
that our researcher-focused solution confirms with their criteria for
data-base management.
|
|
Possibly the least trivial edit in this commit is that the previous text
appeared to state that it's normal to find that a project prepared with
`maneage` may be ... unbuildable. Which would defeat our whole claim of
reproducibility! Obviously, `maneage` is still in a rapid development
stage and might still have significant, not-yet-detected bugs. But the
wording has to explain that this would constitute a bug in `Maneage` (in a
particular version of it), not an expected regular event. :) This commit
aims to fix that and other minor wording issues in IV.
Pdf word count 5855.
|
|
This series of commits aims to edit sections II+III, but first implements
the changes from 7bf5fcd, apart from one that conflicts in the abstract:
this commit has ``Maneage'' without `(managing+lineage)` in the abstract.
From Mohammad: this commit has been rebased after several other parallel
branches, so some things may differ from the message.
|
|
Generally they were great, but after looking through them I thought a
hand-full of them slightly changed my original idea so I am correcting them
here. Boud, if you feel the changes aren't good, let's talk about it and
find the best way forward ;-).
They are mostly clear from a '--word-diff', just some notes on the ones
that have changed the meaning:
* On the "a clerk can do it" quotation, since its so short, I think its
better to keep its original form, otherwise a reader may thing there
were paragraphs instead of the "to" and we have changed their
intention.
* In the part where we are saying that the workflow can get "separated"
from the paper, I mostly meant to highlight that the data-centers and
journals (hosts) may diverge in decades, or one of them may go
bankrupt, or etc. Hence loosing the connection. The issue of it
evolving can in theory be addressed through version control, so I think
this is a more fundamental problem.
* In the part about free software, in the list, the original point was
the free software that are used by the project, not the project itself
(after all, the project itself falls under the "Open Science" titles
that is very fashionable these days, but my point here is to those
people who claim to do "Open Science" with closed software (like
Microsoft Excel!).
|
|
This commit makes several small changes to Section III, some of
which are quite significant in terms of meaning.
It was difficult to improve the clarity without extending
the word length. Now we're at 5901 words.
|
|
This commit implements quite a few minor changes in section II.
The aim of most is to clarify the meaning and remove ambiguity.
A few changes are that the reader will normally assume that
successive sentences in a paragraph are closely related in terms
of logical flow. It is superfluous - and considered excessive -
to put too many "Therefore"'s and "Hence"'s in (at least) modern
astronomy style. These are supposed to be used when there is a
strong chain of reasoning.
One change is done in the Introduction, because if we're going to
use "solution(s)" throughout to mean "reproducible workflow
solution(s)", then we have to clearly define this as jargon for
this particular paper. It's probably preferable to RWS - reproducible
workflow solution - or RWI - reproducible workflow implementation.
But we can't just keep saying "solution" because that has many
different meanings in a scientific context.
Pdf word count = 5880
|
|
This series of commits aims to edit sections II+III,
but first implements the changes from 7bf5fcd, apart
from one that conflicts in the abstract: this commit
has ``Maneage'' without `(managing+lineage)` in the
abstract.
|
|
This commit implements most of David's changes from c76727b, but excluding
some, such as the proposal to use 'which' in a restrictive clause in the
abstract. This is allowed, but the Fowler brothers' rule tends to followed
in science writing:
https://ell.stackexchange.com/questions/5/is-there-any-difference-between-which-and-that
A few points on the abstract:
* an immediate solution = singular
* The "immediate, fast short-term" benefits sentence sounded like it was
redundantly superfluously repetitively repeating doubled-up
information. Hopefully this edit is better.
* in the %Conclusion, "solutions" is vague, like people who say
"technology" when they're only talking about software, so this edit
reminds the reader to make the sentence more self-contained and
understandable.
|
|
After a look at the PDFs of the linked papers of the previous commit and a
few 2020 papers, we noticed that the biography format of the webpage and
PDFs are different! So it is now back in its old way (which is how
biographies are presented in the PDF).
A few other minor edits were made in the text.
|
|
It appears from looking at
https://ieeexplore.ieee.org/document/5725236/authors#authors
https://ieeexplore.ieee.org/document/7878935/authors#authors
that the affiliations section needs to start with a one-phrase
definition of the author's main affiliation. In 5725236,
the typesetters/proofreaders swapped van der Walt and Colbert,
so don't be confused by that. It shows that nobody proofread
properly.
With this commit, each author's institute (single hierarchical
level) is written as the first paragraph of the author's
affiliation section. Since 5725236 allows a very-well-known
acronym, I'm guessing that IAC can be defined for Mohammad
and then re-used for the others.
I've added a brief CV for me. If necessary, we could compress
my main research together as "observational cosmology", but let's
see how we go in the word count.
I have not (yet) worked through the main text.
There is also one minor language fix - `Because is complete` was
incomplete.
Pdf word count: 5873
|
|
After one day not looking at the first draft of this new version (commit
7b008dfbb9b2), I went through the text and done some general edits to make
its presentation and logic smoother.
|
|
Before this commit: several typos were present along the text. With this
commit several typos have been corrected (types listed below) and my bio
has been added.
a) double words
b) general typos
c) comas after adverbs at the beginning of a sentence
d) contractions are removed, e.g., don't vs do not
e) three sentences in parenthesis have been removed since I think they
were out of context or unnecessary
f) etc
|
|
In time, some of the copyright license description had been mistakenly
shortened to two paragraphs instead of the original three that is
recommended in the GPL. With this commit, they are corrected to be exactly
in the same three paragraph format suggested by GPL.
The following files also didn't have a copyright notice, so one was added
for them:
reproduce/software/make/README.md
reproduce/software/bibtex/healpix.tex
reproduce/analysis/config/delete-me-num.conf
reproduce/analysis/config/verify-outputs.conf
|
|
Following the fact that the DSJ editor decided that this paper doesn't fit
into their scope, we decided to submit it to IEEE's Computing in Science
and Engineering (CiSE). So with this commit the text was re-written to fit
into their style and word-count limitations.
|
|
The paper is no longer using LuaLaTeX, but raw LaTeX (that saves a DVI), it
is so much faster! Initially I had used LuaLaTeX to use special fonts to
resemble the CODATA Data Science Journal, but all that overhead is no
longer necessary. Therefore I also removed the MANY extra LaTeX packages we
were importing. The paper builds and is able to construct one of its images
(the git-branching figure) with only 7 packages beyond the minimal
TeX/LaTeX installation. Also in terms of processing it is so much faster.
The text is just temporary now, and mainly just a place holder. With the
next commit, I'll fill it with proper text.
|
|
A few small conflicts showed up here and there. They are fixed with this
merge.
|
|
The difference between `that` and `which` is not strictly
required, but it helps clarify the difference in meaning,
which is important in science and software :).
This is best shown by an example:
* Maneage provides reproducibility, which is a good thing.
The sentence would make sense if we drop `, which is a good
thing.` The last part of the sentence is a comment rather than a
necessary part of the sentence.
* Maneage provides a quality of reproducibility that is missing
from other implementations.
The sentence would not quite make sense if we drop `that is ...`,
since we would not know what sort of quality is provided. The
fact that the quality is missing is key to the intended meaning
of the sentence.
|
|
It is also slightly shorter with this commit, without loosing anything
substantial.
|
|
No need to invent a new word (archive-able) when an existing one
(archivable) does the job.
One issue that we have not included and which perhaps we could discuss in
the paper (space permitting), is that this tool could bypass the use of
blockchains in this context.
|
|
As discussed by Boud in the previous commit, this is an important feature
that was lost in the new abstract. So I added it as a criteria.
|
|
Most are minor English tidying, e.g.
* spelling: achieving
* archivable - https://en.wiktionary.org/wiki/archivable
* `i.e.` does not look good in an abstract;
* `when` didn't sound quite right;
Comment: we no longer state one of the most interesting aspects
of Maneage - producing the draft paper that is submittable for
peer review in a way that makes it natural for the authors to
achieve automatic consistency between the calculations/analysis
and the values in the paper. But this is hard to describe in a
compact way without disrupting the overall argument of the
abstract, so it's a bit of a pity, but people will learn about it
anyway from the body of the article (or from trying out the
package!) `Peer-review verification` does not directly state
producing a pdf.
Related to this absence of talking about reproducing the *paper*,
not just the calculations, I suggest dropping `, with snapshot
\projectversion` from the abstract initially sent to the journal
(they can't stop us updating it afterwards), because without the
context of explaining that the paper itself is produced from the
package, it's not clear what the snapshot means - a snapshot of
the abstract? In the `real` paper, it makes sense, because the
reader will have access to the rest of the paper.
|