Age | Commit message (Collapse) | Author | Lines |
|
Today, Richard Stallman sent a mail in 'info-gnu@gnu.org' (GNU's public
announcements mailing list) about proprietary obsolescence (or planned
obsolescence) [1]. After looking into it, I saw there is actually a
Wikipedia page for this concept. Since it direclty relates to our Free
software criteria, I thought its good to use this technical term there.
[1] https://www.gnu.org/proprietary/proprietary-obsolescence.html
[2] https://en.wikipedia.org/wiki/Planned_obsolescence
|
|
I just remembered that in the paragraph we compare with Jupyter, another
important point is that with based on the modularity principle, people can
choose their favorite text editor and aren't limited to one. I also tried
to remove redundant parts to avoid adding too many extra words.
|
|
Thanks a lot Boud for adding that script in your own project and linking it
here. Since the raw file (without context of the whole project) is very
hard to understand for the users, I switched the URL to the navigable URL
the link is actually on the filename. It will always show the most recent
version of this script, not the particular snapshot of now. But infact that
is better, since we can make it better and improve it over time. Maybe even
by the end of this paper's referee review will be able to include it in
Maneage's core branch.
I also removed the link to this discussion at the first paragraph of
Section IV (proof of concept). Since that is just the introduction, and
going into this level of detail there could be confusing for the
readers. Having the name of the script in the proper place is more direct
and understandable for the readers.
Thanks again Boud for the nice work on this ;-).
|
|
This commit adds the SWH URL of the statistical verification
script to the paper and tidies up the corresponding answer in
'1-answer.txt'. The script file includes more extensive
documentation than the earlier 'make' version of the method.
|
|
While going through Mohammad-reza's recent two commits, I noticed that we
had missed an importnat discussion on modularity in this version of the
paper (discussing how file management should also be modular resulting in
cheaper archival, and thus better longevity), so a few sentences were added
under criteria 2 (Modularity).
Mohammad-reza's edits were also generally very good and helped clarify many
points. I only reset the part that we discuss the problems with POSIX, and
not being able to produce bitwise reproducible software (which systems like
Guix work very hard at, and thus need root permissions). I felt the edit
missed the main point here (that while bitwise reproducibility of the
software is good, it is not always necessary).
|
|
Before this commit, there were discussions in different sections related to
POSIX compliance and features. Since the relevant Cmpleteness criterion has
been changed to execution within a Unix-like OS, such dicussions had to be
modifies as well.
With this commit, the parts that were related to condition (1) of the
Completeness criterion have been modified to be relevant to new Unix-like
OS requirement. Also, few spelling problems were fixed.
|
|
Before this commit, condition (1) for the Completeness criterion was
referring to POSIX compliance. POSIX is a very detailed dynamic standard
which goes under revision continuously and not a lot of operating systems,
GNU/Linux included are completely/officially POSIX-compliant. Furthermore,
not all sections of the huge 4000 pages standard are really important
specifically to the current Maneage functionality.
With this commit, condition (1) has been replaced by a looser condition of
execution within a Unix-like OS. Also since the term environment might have
been mistaken with the term "Operating Environment", it was replaced by the
unmistakable term "environment variables" in conditions (3) and (5). Last
but not least, condition (2) was made more restrict by adding ASCII
encoding as the condition for storing the plain text files.
TO-DO:
POSIX could contain valuable ideas regarding portability of programming
practices. These can be taken advantage of later in providing necessary and
sufficient conditions for project completeness. Another idea could be to
make LFS construct or something else as a sharp definition for what we mean
by minimal Unix-like OS.
|
|
Some minor conflicts that came up during the merge were fixed.
|
|
Until now, Maneage only provided the commit hashes (of the project and
Maneage) as LaTeX macros to use in your paper. However, they are too
cryptic and not really human friendly (unless you have access to the Git
history on a computer).
With this commit, to make things easier for the readers, the date of both
commits are also available as LaTeX macros for use in the paper. The date
of the Maneage commit is also included in the acknowledgements.
Also, the paragraph above the acknowledgements has been updated with better
explanation on why adding this acknowledgement in the science papers is
good/necessary.
|
|
This only concerns the TeX sources in the default branch. In case you don't
use them, there should only be a clean conflict in 'paper.tex' (that is
obvious and easy to fix). Conflicts may only happen in some of the
'tex/src/preamble-*.tex' files if you have actually changed them for your
project. But generally any conflict that does arise by this commit with
your project branch should be very clear and easy to fix and test.
In short, from now on things will even be easier: any LaTeX configuration
that you want to do for your project can be done in
'tex/src/preamble-project.tex', so you don't have to worry about any other
LaTeX preamble file. They are either templates (like the ones for PGFPlots
and BibLaTeX) or low-level things directly related to Maneage. Until now,
this distinction wasn't too clear.
Here is a summary of the improvements:
- Two new options to './project make': with '--highlight-new' and
'--highlight-notes' it is now possible to activate highlighting on the
command-line. Until now, there was a LaTeX macro for this at the start
of 'paper.tex' (\highlightchanges). But changing that line would change
the Git commit hash, making it hard for the readers to trust that this
is the same PDF. With these two new run-time options, the printed commit
hash will not changed.
- paper.tex: the sentences are formatted as one sentence per line (and one
line per sentence). This helps in version controlling narrative and
following the changes per sentence. A description of this format (and
its advantages) is also included in the default text.
- The internal Maneage preambles have been modified:
- 'tex/src/preamble-header.tex' and 'tex/src/preamble-style.tex' have
been merged into one preamble file called
'tex/src/preamble-maneage-default-style.tex'. This helps a lot in
simply removing it when you use a journal style file for example.
- Things like the options to highlight parts of the text are now put in
a special 'tex/src/preamble-maneage.tex'. This helps highlight that
these are Maneage-specific features that are independent of the style
used in the paper.
- There is a new 'tex/src/preamble-project.tex' that is the place you
can add your project-specific customizations.
|
|
These can help a first-time reader of 'paper.tex'.
|
|
Until now, the Maneage-only features of LaTeX where mixed with
'tex/src/preamble-project.tex' (which is reserved for project-specific
things). But we want to move the highlighting features (that have started
here) into the core Maneage branch, so its best for these Maneage-specific
features to be in a Maneage-specific preamble file.
With this commit, a hew 'tex/src/preamble-maneage.tex' has been created for
this purpose and the highlighting modes have been put in there. In the
process, I noticed that 'tex/src/preamble-project.tex' doesn't have a
copyright! This has been corrected.
|
|
Roberto sent me his summarized CV which is now being included and I also
removed the extra statements about non-degree things from Raul and my own
biography (like mentioning Gnuastro, and scientific interests). To be
short, we are only mentioning degrees and positions. For Raul, I added his
M.Sc institute.
|
|
After Mohammad-reza sent me his commit on an improved definition for
longevity, we had an indepth discussion (through a video-conference) to
avoid complexities in the terminology, while staying on point and
word-count.
In this commit/merge, I am including the improved version of the definition
of longevity, and the newly added term "functionality" (instead of
"usability" that Mohammad-reza was originally complaining to).
|
|
The paragraph was slightly shortened, while keeping the main points.
|
|
Before this commit, Longetivity was defined on the basis of the term
usability. Although the scope and context of the term has been mentioned
right after its use, this could have caused confusion with the keyword
"usability" in the field of software engineering.
With this commit, Longetivity definition has been rephrased in a way that
it would not require "usability". Furthermore, since longetivity would
logically require the availability of the machines and platforms during the
time of re-use, this has been explicitly mentioned in the definition.
|
|
Following Boud's great suggestion, I also summarized my CV to be less
than 40 words.
|
|
Following Boud's great suggestion, I also summarized my CV to be less than
40 words.
|
|
This commit provides shorter CVs for me (Boud) + David
in order to get closer to the 6500 word limit. Our CVs
are the least significant part of the paper.
|
|
The only issue that still remains is how to address statistical
reproducibility, and I am in touch with Boud to do this in the best way
possible (it has been highlighted with '#####'s in the answers.
|
|
There is an answer for all the referee points now. I also did some minor
edits in the paper. But we are still over the limit by around 250 words.
The only remaining point that is not yet addressed (and has '####' around
it) is the discussion on parallelization and its effect on reproducibility.
|
|
This commit is intended to be submittable quality.
Point 56 was removed, and the later points renumbered,
because it was a point of Reviewer 5 described what we
have done - it was not a criticism to respond do. :)
The current word count (without abstract and references)
is 6091.
|
|
Copyediting of points 16 to 32 (paper.tex +
peer-review/1-answer.txt) is done in this commit.
TODO list:
2. paper lacking focus
9. tidy up README-hacking.md for appearance on website
App B.G. similar to Figure ?? - ref missing
29. website: README-hacking.md and tutorial "on same page"
|
|
This commit updates "paper.tex" and "peer-review/1-answer.txt"
for the first 15 (out of 59!) reviewer points, excluding
points 2 (not yet done) and 9 (README-hacking.md needs
tidying).
A fix to "reproduce/analysis/make/paper.mk" for the
links in the appendices is also done in this commit (the same
algorithm as for paper.tex is added). The links in the appendices
are not (yet) clickable.
|
|
This commit tidies up minor aspects of the language in the text
marked by "\new", e.g. a "wokflow" would be fine for Chinese
cooking, but is a little off-topic for Maneage. :) The word count
is reduced by about 7 words.
I haven't yet got to the serious part: checked that we've responded
to the referees' points, and completing the responses which we
haven't yet done.
|
|
Raul's added point on the answer to the referee was very good, so I edited
it a little to be more clear (and removed his name).
Also, after looking in a few parts of the text, I fixed a few typos.
|
|
With this commit, I make several minor changes to the text of the final
paper. They are not important, but minor modifications like avoiding
contractions (don't -> do not, and so on).
|
|
A new directory has been added at the top of the project's source called
'peer-review'. The raw reviews of the paper by the editors and referees has
been added there as '1-review.txt'. All the main points raised by the
referees have been listed in a numbered list and addressed (mostly) in
'1-answers.txt'. The text of the paper now also includes all the
implemented answers to the various points.
|
|
Until now, the core Maneage 'paper.tex' had a '\highlightchanges' macro
that defines two LaTeX macros: '\new' and '\tonote'.
When '\highlightchanges' was defined, anything that was written within
'\new' became dark green (highlighting new things that have been
added). Also, anything that was written in '\tonote' was put within a '[]'
and became dark red (to show that there is a note here that should be
addressed later).
When '\highlightchanges' wasn't defined, anything within the '\new' element
would be black (like the rest of the text), and the things in '\tonote'
would not be shown at all.
Commenting the '\newcommand{\highlightchanges}{}' line within 'paper.tex'
(to toggle the modes above) would create a different Git hash and has to be
committed.
But this different commit hash could create a false sense in the reader
that other things have also been changed and the only way they could
confirm was to actually go and look into the project history (which they
will not usually have time to do, and thus won't be able to trust the two
modes of the text).
Also, the added highlights and the note highlights were bundeled together
into one macro, so you couldn't only have one of them.
With this commit, the choice of highlighting either one of the two is now
done as two new run-time options to the './project' script (which are
passed to the Makefiles, and written into the 'project.tex' file which is
loaded into 'paper.tex'). In this way, we can generate two PDFs with the
same Git commit (project's state): one with the selected highlights and
another one without it.
This issue actually came up for me while implementing the changes here: we
need to submit one PDF to the journal/referees with highlights on the added
features. But we also need to submit another PDF to arXiv and Zenodo
without any highlights. If the PDFs have different commit hashes, the
referees may associate it with other changes in any part of the work. For
example https://oadoi.org/10.22541/au.159724632.29528907 that mentions
"Another version of the manuscript was published on arXiv: 2006.03018",
while the only difference was a few words in the abstract after the journal
complained on the abstract word-count of our first submission (where the
commit hashes matched with arXiv/Zenodo).
|
|
With the optional appendices added recently to the paper, it was important
to go through them and make them more fitting into the paper.
|
|
Given the referee reports, after discussing with the editors of CiSE, we
decided that it is important to include the complete appendix we had before
that included a thorough review of existing tools and methods. However, the
appendix will not be published in the paper (due to the strict word-count
limit). It will only be used in the arXiv/Zenodo versions of the paper.
This actually created a technical problem: we want the commit hash of the
project source to remain the same when the paper is built with an appendix
or without it.
To fix this problem the choice of including an appendix has gone into the
'project' script as a run-time option called '--no-appendix'. So by default
(when someone just runs './project make'), the PDF will have an appendix,
but when we want to submit to the journal, or when the appendix isn't
needed for a certain reason, we can use this new option. The appendix also
has its own separate bibliography.
Some other corrections made in this commit:
1. Some new references were added that had an '_' in their source, they
were corrected in 'references.tex'.
2. I noticed that 'preamble-style.tex' is not actually used in this paper,
so it has been deleted.
|
|
Before this commit, there were no arguments regarding machine related
specifications in the manuscript. This was needed as Mohammad Akhlaghi came
across a review of the artcile by Dylan Aïssi in which Dylan mentioned the
need for discussing CPU architecture dependence in pursuing a long-trem
archivable workflow.
With this commit, the required argument has been added in Sec.IV POC:
Maneage in the paragraph in which it is explained how 'macro files build
the core skeleton of Maneage'. Furthermore, few typos in different places
have been fixed and the 'pre-make-build.sh' has been updated with the
latest fix in Maneage core project.
|
|
This paper is generally about data analysis pipelines, so the abstract now
starts with "Analysis pipelines" instead of "Reproducible workflows". I
also noticed that the sentence was mistakenly broken into multiple lines.
|
|
Only two small conflicts came up:
* The addition of the hardware architecture macro in 'paper.tex' (which
was removed for now, but will be added as the referee has requested
within the text).
* The usage of "" around directory variables in 'paper.mk'.
|
|
I saw this link today in the news (to be implemented from November 1st,
2020), and because it is directly related to this work, I added it. Many
people assume that simply pushing a Docker image to DockerHub is enough to
preserve it, but ignore how much it costs to maintain the storage and
network capacity.
|
|
Until now, no machine-related specifications were being documented in the
workflow. This information can become helpful when observing differences in
the outcome of both software and analysis segments of the workflow by
others (some software may behave differently based on host machine).
With this commit, the host machine's 'hardware class' and 'byte-order' are
collected and now available as LaTeX macros for the authors to use in the
paper. Currently it is placed in the acknowledgments, right after
mentioning the Maneage commit.
Furthermore, the project and configuration scripts are now capable of
dealing with input directory names that have SPACE (and other special
characters) by putting them inside double-quotes. However, having spaces
and metacharacters in the address of the build directory could cause
build/install failure for some software source files which are beyond the
control of Maneage. So we now check the user's given build directory
string, and if the string has any '@', '#', '$', '%', '^', '&', '*', '(',
')', '+', ';', and ' ' (SPACE), it will ask the user to provide a different
directory.
|
|
Until now, the replicated plot had the width of the full page and the data
lineage graph was under it. Together they were covering more than half of
the height of the page! But the plot showing the number of papers with
tools really doesn't have too much detail, and all the space was being
wasted.
With this commit, the plot is now much much thinner and the data lineage
graph has been fitted to the right of it.
|
|
To help in the documentation, the Git hash of the Maneage branch commit
that the project has most recently merged with (or branched from) is now
also provided as a LaTeX macro ('\maneageversion').
It is calculated in 'reproduce/analysis/make/initialize.mk' (in the recipe
to 'initialize.tex').
|
|
In the previous commit, the modified abstract of the acknowledgments only
included the URL of Maneage, but its more formal to cite the Maneage paper,
the URL is already present in the paper.
|
|
Until now, the acknowledgment section didn't contain the new name of
Maneage and it also included an acknowledgment of Gnuastro (which is not
appropriate for a general project which may not use Gnuastro).
With this commit this is fixed.
|
|
This was pointed out by Mervyn O'Luing.
|
|
Mervyn had read the paper and provided some interesting thoughts that I
tried to implement. Mervyn's comments are shown below. I just haven't
addressed the last point yet, because I am affraid it may make the text too
long (we are already on the boundary of the word-limit). We have already
discussed that it is a good research topic, and have hopefully triggered
the curiosity of the readers to test it ;-).
-------------------
Page 2: Regarding Criterion 1: Completeness. A project must be self
contained? So this includes not requiring root or administrator
privileges. This suggests that the project is only made open after the
development has been completed?
Regarding Criterion 5: 'a clerk can do it' -- in the pc world that we live
in could this be taken as a disparaging comment?
Page 5: 'The C library is linked with all programs, and this dependence can
hypothetically hinder exact reproducibility of results, but we have not
encountered this so far.' - what do you think might happen if this does
affect reproducibility? Do you have a plan to deal with this? Or are you
going to wait until you hear of such cases as the number will probably be
small? Have you done probability analysis to show that the rates are likely
to be very small? Or should you have a disclaimer with maneage?
|
|
Until now, the Zenodo identifier was manually written in the paper. But now
we have the Zenodo DOI in 'metadata.conf', so its much more robust to get
it from there (in case updated versions of the paper is published).
|
|
I visited the AMIGA group in January this year and we had some very useful
discussion on Maneage.
|
|
After going through Terry's corrections, some things were clarified
more. Technically, I realized that many new-lines were introduced and
corrected them. Also, in Roberto's biography, I noticed that compared to
the others it has too much non-reproducibility details, so I removed the
redundant parts for this paper.
|
|
Terry is an astronomer at IAC's Scientific Editorial Service and kindly
agreed to review this paper for us and actually pushed this commit. I am
just adding a commit message here.
|
|
Marios had read the first draft of the paper (Commit f990bba) and provided
valuable feedback (shown below) that ultimately helped in the current
version. But because of all the work that was necessary in those days, I
forgot to actually thank him in the acknowledgment, while I had implemented
most of his thoughts.
Following Marios' thoughts on the Git branching figure, with this commit, I
am also adding a few sentences at the end of the caption with a very rough
summary of Git.
I also changed the branch commit-colors to shades of brown (incrementally
becoming lighter as higher-level branches are shown) to avoid the confusion
with the blue and green signs within the schematic papers shown in the
figure.
Marios' comments (April 28th, 2020, on Commit f990bba)
------------------------------------------------------
I think the structure of the paper is more or less fine. There are two
places that I thought could be improved:
1) Section 3 (Principles) was somewhat confusing to me in the way that it
was structured. I think the main source of confusion is the mixing of what
Maeage is about and what other programs have done. I would suggest to
separate the two. I would have short intro for the section, similar to what
you have now. However, I would suggest to highlight the underlying goals
motivating the principles that follow: reproducibility, open science,
something else? Then I would go into the details of the seven principles.
Some of the principles are less clear to me than others. For example, why
is simplicity a guiding principle? Then some other principles appear to be
related, for example modularity, minimal complexity and scalability to my
eyes are not necessarily separate.
Finally, I would separate the comparison with other software and either
dedicate a section to that somewhere toward the end of the paper (perhaps a
subsection for section 5) or at least condense it and put it as a closing
paragraph for Section 3. As it is now I think it draws focus from Maneage
and also includes some repetitions.
2) Section 4 (Maneage) was at times confusing because it is written, I
think in part as a demonstration of Maneage (i.e., including examples that
showed how Maneage was used to write this or other papers) and a
manual/description of the software. I wonder whether these two aspects can
be more cleanly separated. Perhaps it would be possible to first have a
section 4 where each of the modules/units of Maneage are listed and
explained and then have the following section discuss a working example of
Maneage using this or another paper.
3) I found Figure 7 [the git branching figure] and its explanation not very
intuitive. This probably has to do with my zero knowledge of github and how
versioning there works, but perhaps the description can be a bit more "user
friendly" even for those who are not familiar with the tool.
4) I find Section 6 to be rather inconsequential. It does not add anything
and it more or less is just a summary of what was discussed. I would
personally remove it and include a very short summary of the
ideals/principles/goals of Maneage at the beginning of Section 5, before
the discussion.
|
|
The default 'paper.tex' starts by defining some macros and comments
describing them. Until now, the text was not too clear and could be
confusing for someone that is not at all familiar with Maneage.
With this commit, the comments have been edited to be more clear for a
first-time reader. For example they all start with FULL CAPS
summaries.
Two other small things were corrected in 'tex/src/preamble-necessary.tex':
- Until now 'project.tex' was included in this preamble. However, because
of its importance in Maneage, and prominent place in the demonstration
plot of the paper introducing Maneage, it is now included directly in
'paper.tex'. This also allows users to safely ignore/delete this
preamble file if their LaTeX style is different.
- I noticed that some macros for some astronomical software names from the
very first commits in Maneage were still present here! They are no
longer used, so they have been removed.
|
|
Until now, we were saying "POSIX is defined by the IEEE", but in issue #12,
Michael Crusoe pointed out that this is not accurate. It is actually
jointly developed and operated by the IEEE, The Open Group and ISO/IEC JTC
1/SC 22, which together form the Austin Group.
So the sentence was modified to say tha the IEEE (potential publisher of
this paper) is part of the Austin Group that develops the POSIX standard.
Thanks a lot for bringing this up Michael.
|
|
Until now, we were using three EPS (created from SVG) that were downloaded
from https://www.flaticon.com. Therefore it was necessary to acknowledge
the creators and put a link to the webpage. This consumed space in the
caption and decreased the originality of the plot.
Another problem was that the "collaboration" icon (with three people in it)
had arrows, and some of those arrows pointed downwards, make ambiguity in
relation to the top-ward arrows under the commits.
With this commit, three alternative icons are added that I made from
scratch, using Inkscape. The collaboration icon now is two figures and two
speech-bubbles, without any arrows.
|