Age | Commit message (Collapse) | Author | Lines |
|
Generally they were great, but after looking through them I thought a
hand-full of them slightly changed my original idea so I am correcting them
here. Boud, if you feel the changes aren't good, let's talk about it and
find the best way forward ;-).
They are mostly clear from a '--word-diff', just some notes on the ones
that have changed the meaning:
* On the "a clerk can do it" quotation, since its so short, I think its
better to keep its original form, otherwise a reader may thing there
were paragraphs instead of the "to" and we have changed their
intention.
* In the part where we are saying that the workflow can get "separated"
from the paper, I mostly meant to highlight that the data-centers and
journals (hosts) may diverge in decades, or one of them may go
bankrupt, or etc. Hence loosing the connection. The issue of it
evolving can in theory be addressed through version control, so I think
this is a more fundamental problem.
* In the part about free software, in the list, the original point was
the free software that are used by the project, not the project itself
(after all, the project itself falls under the "Open Science" titles
that is very fashionable these days, but my point here is to those
people who claim to do "Open Science" with closed software (like
Microsoft Excel!).
|
|
This commit makes several small changes to Section III, some of
which are quite significant in terms of meaning.
It was difficult to improve the clarity without extending
the word length. Now we're at 5901 words.
|
|
This commit implements quite a few minor changes in section II.
The aim of most is to clarify the meaning and remove ambiguity.
A few changes are that the reader will normally assume that
successive sentences in a paragraph are closely related in terms
of logical flow. It is superfluous - and considered excessive -
to put too many "Therefore"'s and "Hence"'s in (at least) modern
astronomy style. These are supposed to be used when there is a
strong chain of reasoning.
One change is done in the Introduction, because if we're going to
use "solution(s)" throughout to mean "reproducible workflow
solution(s)", then we have to clearly define this as jargon for
this particular paper. It's probably preferable to RWS - reproducible
workflow solution - or RWI - reproducible workflow implementation.
But we can't just keep saying "solution" because that has many
different meanings in a scientific context.
Pdf word count = 5880
|
|
This series of commits aims to edit sections II+III,
but first implements the changes from 7bf5fcd, apart
from one that conflicts in the abstract: this commit
has ``Maneage'' without `(managing+lineage)` in the
abstract.
|
|
This commit implements most of David's changes from c76727b, but excluding
some, such as the proposal to use 'which' in a restrictive clause in the
abstract. This is allowed, but the Fowler brothers' rule tends to followed
in science writing:
https://ell.stackexchange.com/questions/5/is-there-any-difference-between-which-and-that
A few points on the abstract:
* an immediate solution = singular
* The "immediate, fast short-term" benefits sentence sounded like it was
redundantly superfluously repetitively repeating doubled-up
information. Hopefully this edit is better.
* in the %Conclusion, "solutions" is vague, like people who say
"technology" when they're only talking about software, so this edit
reminds the reader to make the sentence more self-contained and
understandable.
|
|
After a look at the PDFs of the linked papers of the previous commit and a
few 2020 papers, we noticed that the biography format of the webpage and
PDFs are different! So it is now back in its old way (which is how
biographies are presented in the PDF).
A few other minor edits were made in the text.
|
|
It appears from looking at
https://ieeexplore.ieee.org/document/5725236/authors#authors
https://ieeexplore.ieee.org/document/7878935/authors#authors
that the affiliations section needs to start with a one-phrase
definition of the author's main affiliation. In 5725236,
the typesetters/proofreaders swapped van der Walt and Colbert,
so don't be confused by that. It shows that nobody proofread
properly.
With this commit, each author's institute (single hierarchical
level) is written as the first paragraph of the author's
affiliation section. Since 5725236 allows a very-well-known
acronym, I'm guessing that IAC can be defined for Mohammad
and then re-used for the others.
I've added a brief CV for me. If necessary, we could compress
my main research together as "observational cosmology", but let's
see how we go in the word count.
I have not (yet) worked through the main text.
There is also one minor language fix - `Because is complete` was
incomplete.
Pdf word count: 5873
|
|
After one day not looking at the first draft of this new version (commit
7b008dfbb9b2), I went through the text and done some general edits to make
its presentation and logic smoother.
|
|
Before this commit: several typos were present along the text. With this
commit several typos have been corrected (types listed below) and my bio
has been added.
a) double words
b) general typos
c) comas after adverbs at the beginning of a sentence
d) contractions are removed, e.g., don't vs do not
e) three sentences in parenthesis have been removed since I think they
were out of context or unnecessary
f) etc
|
|
In time, some of the copyright license description had been mistakenly
shortened to two paragraphs instead of the original three that is
recommended in the GPL. With this commit, they are corrected to be exactly
in the same three paragraph format suggested by GPL.
The following files also didn't have a copyright notice, so one was added
for them:
reproduce/software/make/README.md
reproduce/software/bibtex/healpix.tex
reproduce/analysis/config/delete-me-num.conf
reproduce/analysis/config/verify-outputs.conf
|
|
Following the fact that the DSJ editor decided that this paper doesn't fit
into their scope, we decided to submit it to IEEE's Computing in Science
and Engineering (CiSE). So with this commit the text was re-written to fit
into their style and word-count limitations.
|
|
The paper is no longer using LuaLaTeX, but raw LaTeX (that saves a DVI), it
is so much faster! Initially I had used LuaLaTeX to use special fonts to
resemble the CODATA Data Science Journal, but all that overhead is no
longer necessary. Therefore I also removed the MANY extra LaTeX packages we
were importing. The paper builds and is able to construct one of its images
(the git-branching figure) with only 7 packages beyond the minimal
TeX/LaTeX installation. Also in terms of processing it is so much faster.
The text is just temporary now, and mainly just a place holder. With the
next commit, I'll fill it with proper text.
|
|
A few small conflicts showed up here and there. They are fixed with this
merge.
|
|
The difference between `that` and `which` is not strictly
required, but it helps clarify the difference in meaning,
which is important in science and software :).
This is best shown by an example:
* Maneage provides reproducibility, which is a good thing.
The sentence would make sense if we drop `, which is a good
thing.` The last part of the sentence is a comment rather than a
necessary part of the sentence.
* Maneage provides a quality of reproducibility that is missing
from other implementations.
The sentence would not quite make sense if we drop `that is ...`,
since we would not know what sort of quality is provided. The
fact that the quality is missing is key to the intended meaning
of the sentence.
|
|
It is also slightly shorter with this commit, without loosing anything
substantial.
|
|
No need to invent a new word (archive-able) when an existing one
(archivable) does the job.
One issue that we have not included and which perhaps we could discuss in
the paper (space permitting), is that this tool could bypass the use of
blockchains in this context.
|
|
As discussed by Boud in the previous commit, this is an important feature
that was lost in the new abstract. So I added it as a criteria.
|
|
Most are minor English tidying, e.g.
* spelling: achieving
* archivable - https://en.wiktionary.org/wiki/archivable
* `i.e.` does not look good in an abstract;
* `when` didn't sound quite right;
Comment: we no longer state one of the most interesting aspects
of Maneage - producing the draft paper that is submittable for
peer review in a way that makes it natural for the authors to
achieve automatic consistency between the calculations/analysis
and the values in the paper. But this is hard to describe in a
compact way without disrupting the overall argument of the
abstract, so it's a bit of a pity, but people will learn about it
anyway from the body of the article (or from trying out the
package!) `Peer-review verification` does not directly state
producing a pdf.
Related to this absence of talking about reproducing the *paper*,
not just the calculations, I suggest dropping `, with snapshot
\projectversion` from the abstract initially sent to the journal
(they can't stop us updating it afterwards), because without the
context of explaining that the paper itself is produced from the
package, it's not clear what the snapshot means - a snapshot of
the abstract? In the `real` paper, it makes sense, because the
reader will have access to the rest of the paper.
|
|
Boud's suggestions in the previous commit were great and really helped in
improving the tone of the abstract (and thus the whole paper shortly!),
better putting it in the big picture. I had forgot to give the exact word
limit (which was 250), so Boud had set it to a very conservative value of
190, I added around 22 words to better highlight the points we want to
make, while still being below the limit.
|
|
To make this a research article, we either have to present it as a
theoretical advance, or as an empirical advance.
An empirical research result would be something like doing a survey of
users and getting statistics of their success/failure in using the system,
and of whether their experience is consistent with the claimed properties
and principles of Maneage (e.g. success/failure in creating paper.pdf as
expected? was the user's system POSIX? did the user do the install with
non-root privileges? was this a with-network or without-network ./project
configure ?) This is doable, but would require a bit of extra work that we
are not necessarily motivated to do or have the time to do right now.
I think it's possible to present Maneage as a theoretical advance, but it
has to be worded properly. Maneage is a tool, but it's a tool that
satisfies what we can reasonably present as a unique theoretical proposal.
Here's my proposed rewrite. I've aimed at minimum word length. I've also
included (commented out) keywords for a structured research abstract -
these are just for us, as a guideline to improve the abstract.
I think "criteria" is safer than "standards". Whether a principle is good
or bad tends to lead to debate. Whether a criterion is satisfied or not is
a more objective question, independent of whether you agree with the
criterion or not.
In the rewrite below, we propose a theoretical standard and show that the
new standard can be satisfied. Maneage is *used as a tool* to prove that
the standard is not too difficult to achieve. Maneage is no longer the
subject of the paper. (That won't change the main body of the paper too
much, apart from compression, but the way it's presented will have to
change, under this proposal.)
The title would need to match this. E.g.
TITLE.1: Evidence that a higher standard of reproducibility criteria is
attainable
TITLE.2: Evidence that a rigorous standard of reproducibility criteria is
attainable
TITLE.3: Towards a more rigorous standard of reproducibility criteria
I would probably go for TITLE.3.
|
|
This abstract is a first step in order to put more focus on the research
aspects of Maneage.
|
|
Given the very strict limits of journals, we needed to remove these
sections and images. The removed images are: the
`figure-file-architecture', `figure-src-topmake' and
`figure-src-inputconf'. In total, with `wc' we now have 9019 words.
This will be futher reduced when we remove all the technical parts of the
Maneage section, in short, we will only describe the generalities, not any
specific details.
|
|
They supported my visit and talk on Maneage at the Barcelona Super
Computing center. They have also offerred to read the paper and are
providing comments.
Also, I noticed that in the author list, we had forgot to put an `,' after
Boud's name. That is also corrected here.
|
|
Until now, we were using GitLab as the main Git repository of Maneage. But
today I finally setup our own Git repository under `git.maneage.org' and
enabled a CGit web interface for a simple and fast viewing of the commits
and changes.
Since this URL is under our own control, we can always ensure that it will
point to somewhere meaningful, on any server so in the long-run its much
better than publishing the paper an explicit reliance of `gitlab.com'.
|
|
Until now, the primary Maneage URLs were under GitLab, but since we now
have a dedicated URL and Git repository, its better to transfer to this as
soon as possible. Therefore with this commit, throughout Maneage, any place
that Maneage was referenced through GitLab has been corrected.
Please correct your project's remote to point to the new repository at
`git.maneage.org/project.git', and please make sure it follows the
`maneage' branch. There is no more `master' branch on Maneage.
|
|
Reading over Boud's edits, I noticed a few other parts that I could
summarize more and corrected one or two other parts to fit the original
purpose of the sentence better.
|
|
Reduction by about 5 words.
Although it's true that the low-level tools - make, bash, gcc -
are still being actively developed, only expert users will tend
to notice the differences, and in this context, it's probably
more useful to point out that these are actively *maintained*.
(Comment: I felt that the first sentence in the Conclusion is
missing one of the obvious criteria for handling big data -
citizen control so that big data could hopefully become less
Orwellian than it is right now, with GAFAM having the main big
data databases that are used by AI researchers and will tend to
affect people's lives more than traditional "scientific"
databases. But there's no point adding this here, since the
criteria that tend to satisfy the scientific requirements
("principles") and citizens' rights tend to overlap to a fair
degree...)
|
|
Reduction of about 50 words.
There were a couple of expressions that look a bit like
some sort of software/research analysis jargon, such as
`Research Objects`, `Software Heritage`, `Machine actionable`.
Unless these are defined, capitalising them makes the reader
assume that there is some well-known formal meaning and that
s/he has to search for that him/herself. As lower case
expressions, the reader can guess some reasonable meanings
of these.
The word "embargo" was introduced for proposal 2) to handle
the third caveat.
|
|
[Compared to first submission to DSJ last week with 11436 words in raw PDF,
we have decreased the paper by ~1000 words to 10493 :-)]
As with the previous commits, the moment Boud changed the structure of
sentences, I was able to find the redundancies and remove them! This is a
fascinating feature of collaboration I had never felt before: it is so hard
to find redundancies in my own raw text, but even a minor correction by
someone else suddeny breaks my mental memories/barrier on the sentence,
allowing me to be more critical to it!
Anyway, besides such corrections, I fixed a few other things: 1) In the
DSJ's recently published papers, ther is no `~' between "Figure" and its
number. 2) I noticed that in `tex/src/figure-src-inputconf.tex' I was
actually using manually input strings for the filename, checksum and size!
This was contrary to the whole philosophy of Maneage(!), I must have rushed
and forgot! So LaTeX variables are now defined and used.
|
|
About 20 words less. The ArXiv URL is added - this adds no extra
length in words, and some readers will not be familiar with ArXiv
(although the COVID-19 pandemic has attracted attention to BiorXiv).
|
|
Increase by 5 words. We don't need to give a big warning here,
but "Permissions management" is meant to be a brief way of saying
that whether or not different users can really read/write/execute
in subdirectories will firstly depend on whether the user who
cloned Maneage has handled these permissions correctly and whether
s/he is able to allow others to edit in his/her subdirectories.
Comment: Users would have to check who else is logged in at the time,
who else is running jobs, and so on. On a supercomputer this might
make sense, to avoid unnecessary recompiles. Anyway, this edit
summary is not the place to discuss this...
|
|
Reduction by 15 words. "Branch" is fine as a verb, and "off"
is fine as a preposition; there's no need for a second preposition.
"We branched off the main forest path onto a smaller path".
|
|
Length reduction by about 15 words.
A semantically significant change is from `leading to more robust
scientific results` to `evolves in the case of exploratory
research papers, and better self-consistency in hypothesis
testing papers`.
I said this in a previous commit, but it can't hurt repeating:
In the covidian epoch (though not only), it is especially
important to distinguish bayesian type exploratory research
(typical in astronomy or searching for a good COVID-19 treatment
or vaccine) from hypothesis testing (clinical testing in
double-blind random access trials with clinical trials methods
published on a public registry prior to the trials taking
place). In the latter case, you want your results to be analysed
consistently with the plan published before the trials even
begin, and ideally you want them to be published (or at least
posted on the trial registry website) even if your results are
insignificant, to avoid a publication bias in favour of
significant results. Test homeopathy against placebos in 1000
independent experiments, analyse them all the same way, and 2-3
experiments will be significant at the 3 sigma level...
|
|
Reduction by about 7 words.
I added "internet security" as an extra reason for having all the
downloads in a single file. Modularity and minimal complexity in
themselves generally contribute to internet security, but in this
case, it's obvious that having all the communication with the
outside world managed through a single file makes internet security
management much simpler.
I replaced the "fake URL" by the real one, because at least in the
present format, the URL fits in nicely. So both `paper.tex` and
`tex/src/figure-src-inputconf.tex` are modified in this commit.
|
|
Reduction by about 20 words - minor rewording.
|
|
Idafen has helped in testing Maneage a lot during the last year and has
provided very useful feedback and suggestions.
|
|
Regarding Docker Konrad pointed out that "Linux has an excellent track
record for stability. It's more likely that the Docker itself becomes
incompatible with older containers. Docker isn't developed for
reproducibility after all".
So I tried to modify that paragraph to include this important point too. In
the process, I also shrank it a little more (without loosing anything
substantial), so it doesn't add to the paper's length.
|
|
After going through Boud's corrections, I thought it can be further
summarized without loosing any major point.
|
|
Reduction by 4 words. Minor rewording; removal of "Note that"
and "simply" (the opposite of "complicatedly"). If a checksum
is simple for a given user, then s/he already knows that; if s/he
doesn't yet know what a checksum is, then stating that it's simple
doesn't help very much. :)
|
|
Reduction of about 15 words.
The phrase "which does not need it" is removed. On its own, this is
a claim, not an explanation. If the reader is wondering why `paper.tex`
is not a produced file, then stating that the file is not needed will
not help very much. Looking at the diagram will show that `paper.tex`
is the overall article template; and the diagram strongly suggests
that values from initialize.tex, ..., are passed into verify.tex,
and from there into project.tex, which goes into paper.tex.
The phrase "files, possibly in another subMakefile" should really
be something like "files, possibly created by another subMakefile".
But this would add more words, and given that the user has full control
to modify and adapt the overall scheme (including making a mess of it),
we can safely drop the info that the scheme can be made more complicated. :)
|
|
Only 3 words are reduced in this commit, but I think the
improvements are worth it.
"Note that" and "It is worth mentioning" are phrases still quite
often used by academics (even in astronomy) that can be politely
described as "pontification" or informally as "empty blabla";
these add no meaning except "I am teaching you something and I
expect you to pay attention to what I am saying". :) There are
also less polite descriptions.
|
|
Minor rewording of 4.3 Project analysis - introduction.
Reduction of about 40 words.
4.2 `parallel` quote: s/http:/https:/
|
|
Today Konrad made the following suggestions after reading through the paper
(created from Commit 1ac5c12). Thanks a lot Konrad ;-). I tried to address
them all in this commit. Afterwards, while looking over the corrected
parts, some minor edits came up to me to remove redundant parts and add
extra points where it helps.
In particular to be able to print the International Phonetic Alphabet
(IPA), I had to include the LaTeX `TIPA' package, but it was interesting to
see that it was already available in the project as a dependency of another
package we loaded.
|
|
Boud previously pointed out that that he couldn't find a reference to the
citation, so I added it as a link over "its FAQ" (since its described in
its `doc/citation-notice-faq.txt' file). I also removed the first part of
the quote which was not really necessary, the heart of the quote is the
latter part that still remains.
|
|
I tried to make it slightly shorter, but I felt that it is important to
keep the quote from GNU Parallel and in particular the financial aid it
asks for. It will help readers feel the gravity of the sitution for this
software author. The precise citation of the quote was given in the long
version.
|
|
This reduces the length by about 70 words.
The biggest change is to remove what looks like a citation from
`parallel'. I couldn't find the citation in GNU parallel
20161222-1 (Debian/stretch), nor with search engines.
I don't think that the quote is really so useful (even assuming
it's a valid quote from somewhere): citation practices are a mix
between ethics, preparation to convince referees, citing those who
are already cited frequently, and the practicality of searching for
and verifying references against the information for which they
are used. Showing that Maneage makes citation not only easy, but
more or less automatic, bypasses some of the compromises between
practicality and ethics.
|
|
Minor rewording; a reduction of about 12 words.
|
|
Minor edits - reduces about 17 words.
|
|
This commit reduces about 25 words from the 4.1 Maneage
orchestration, aka `make`, section.
|
|
This drops the word count in the introductory part of the Maneage
section by about 15 words.
|