Age | Commit message (Collapse) | Author | Lines |
|
Until now, the Zenodo identifier for the project was for the second arXiv
release (after the first referee reports). However, since the paper has
been published, it hasn't been updated on arXiv and its necessary to make a
"final" arXiv publication.
With this commit, a new Zenodo DOI has been reserved for the third release
and is now being used.
|
|
Until now, Maneage had undergone some updates.
With this commit, those updates have been imported and the conflicts that
resulted were fixed. They were all cosmetic and had no effect on the
analysis. The most significant one was about the change in the format of
'INPUTS.conf'.
In the process, I also noticed that the IEEEtran LaTeX package is now
called 'ieeetran' (the 'tlmgr' of TeXLive 2022 was failing).
|
|
Until now, the './project make clean' command would only clean (remove) the
PDF file from the top source directory. However, if a user would run LaTeX
outside of Maneage, many extra latex output such as *.aux, *.log, *.synctex
and etc would be produced in the top source directory. These files can
interfere with './project make'.
With this commit, when './project make clean' is run, any possibly existing
LaTeX temporary files will also be deleted from the top source directory.
This problem was first reported by Matin Torkian.
|
|
Until now, one had to follow the instructions from [1] to prepare a
standard software tarball before merging with the low-level
tarballs-software repository [2]. The script only worked for '.tar.gz'
suffix and was only available as a comment on Savannah (in [1]).
With this commit, the script has been imported into Maneage as
'reproduce/software/shell/tarball-prepare.sh' to simplify future software
updates. It work with all supported '.tar.*' suffixes (of the upstream
tarball repository) and will convert the tarballs to Maneage's standard
format. Also, this script has a minimal argument parser and can skip the
tarballs that are already unpacked, allowing faster tests.
This script was used to update the versions of:
Coreutiles 9.0 --> 9.1
Git 2.34 --> 2.36
Emacs 27.2 --> 28.1
The main motive behind this update was Git which announced a vulnerability
issue [3] and suggested an update to the latest version as soon as
possible. More detail is described in this github blog [4], but in summary,
it was a security issue on multi-user systems that has been found and fixed
by Git developers. Since Maneage is often installed on such shared systems,
it was important to make this update. GNU Coreutils and GNU Emacs were also
updated because they are also commonly used.
The following improvements have also done with this commit:
- .gitignore: ignore emacs auto-save files (that end with a '#')
- README-hacking.md: In the checklist for updating the Maneage branch, the
no-longer-necessary '--decorate' option of Git was removed from the
command to check the general branch history.
[1] https://savannah.nongnu.org/task/?15699
[2] https://git.maneage.org/tarballs-software.git/
[3] https://lore.kernel.org/git/xmqqv8veb5i6.fsf@gitster.g/
[4] https://github.blog/2022-04-12-git-security-vulnerability-announced/
|
|
Until now, the bibliography was only re-built when 'tex/src/references.tex'
was modified. This is useful in many regular cases because building the
bibliography can slow down the build and it is in-efficient to built it in
every edit of the text of the paper. However, it can be inconvenient when a
change in the paper's bibliography is necessary, without actually editing
'references.tex' (for example when you are removing a citation from the
text).
This happens because Make is only sensitive to file modification time. In
this case, Make does not see the need to create a new 'bib' file because
the 'tex/src/reference' is not changed, and only the 'paper.tex' is
changed. Make is totally 'blind' to the new 'citation' defined in
'paper.tex'.
As a workaround, until now users were forced to manually change the
'tex/src/references.tex' file modification date: either by altering the
content, or using the 'touch' command.
With this commit, the '--refresh-bib' is added to './project' arguments to
address this issue. It will just 'touch' the 'tex/src/references.tex' file
before calling Make. In effect, this will 'force' Make to create the
bibliography file, even if 'tex/src/references.tex' hasn't been updated.
|
|
SUMMARY: it is necessary to update your 'INPUTS.conf' and 'download.mk'.
Until now, adding an input file involved several steps that needed manual
(and inconvenient!) intervention: for every file, you needed to define four
variables in 'INPUTS.conf', and in 'reproduce/analysis/make/download.mk'
you had to use a (complex for large number of files) shell 'if/elif/else'
condition to link the names of the input files to those variables. Besides
inconvenience, this could cause bugs (typos!). Furthermore, a basic MD5
checksum was used for verifying the files.
With this commit, a new structure has been defined for 'INPUTS.conf' that
(thanks to some pretty useful GNU Make features), removes the need for
users to manually edit 'reproduce/analysis/make/download.mk', and reduces
the number of variables necessary for each file to three (from
four). Furthermore, we now use the SHA256 checksum for input data
validation.
Regarding the trick used in 'INPUTS.conf' (form the newly added description
in 'download.mk'): In GNU Make, '.VARIABLES' "... expands to a list of the
names of all global variables defined so far" (from the "Other Special
Variables" section of the GNU Make manual). Assuming that the pattern
'INPUT-%-sha256' is only used for input files, we find all the variables
that contain the input file names (the '%' is the filename). Finally, using
the pattern-substitution function ('patsubst'), we remove the fixed string
at the start and end of the variable name.
Steps you need to take:
- INPUTS.conf: translate your old format to the new format (after
carefully reading the description in the comments at the start of the
file). After applying the new standards, you don't need to use the
variables of 'INPUTS.conf' directly in your Makefiles! For example if
one of your input datasets is called 'abc.fits', the checksum variable
will be 'INPUT-abc.fits-sha256' and in your high-level Makefiles, you
can simply set '$(indir)/abc.fits' as a prerequisite (like you probably
did already).
- reproduce/analysis/make/download.mk: for the definition and rule of
'inputdatasets', simply use the Maneage branch, and remove anything you
had added in your project.
In the process, I also noticed that 'README-hacking.md' still referred to
'master' as the main project branch, while we have used 'main' in the paper
(and is the common convention with Git).
|
|
Until now, the definition of the prepare directory was wrong (not in the
'analysis' directory of the build directory). I noticed this after an
update of the Maneage branch of one project that requires the prepare step.
With this commit, this problem has been fixed.
|
|
Until now, the 'double dash' (i.e. \texttt{--}) in the default 'paper.tex'
would only print one (longer) dash in the output pdf.
With this commit, the double dashes are replaced with '-{}-' in the LaTeX
source as a workaround suggested by Stefan Kottwitz in [1].
[1] https://latex.org/forum/viewtopic.php?f=44&t=4670&start=0
|
|
This commit primarily affects the configuration step of Maneage'd projects,
and in particular, updated versions of the many of the software (see
P.S.). So it shouldn't affect your high-level analysis other than the
version bumps of the software you use (and the software's possibly
improve/changed behavior).
The following software (and thus their dependencies) couldn't be updated as
described below:
- Cryptography: isn't building because it depends on a new
setuptools-rust package that has problems
(https://savannah.nongnu.org/bugs/index.php?61731), so it has been
commented in 'versions.conf'.
- SecretStorage: because it depends on Cryptography.
- Keyring: because it depends on SecretStorage.
- Astroquery: because it depends on Keyring.
This is a "squashed" commit after rebasing a development branch of 60
commits corresponding to a roughly two-month time interval. The following
people contributed to this branch.
- Boudewijn Roukema added all the R software infrastructure and the R
packages, as well as greatly helping in fixing many bugs during the
update.
- Raul Infante-Sainz helped in testing and debugging the build.
- Pedram Ashofteh Ardakani found and fixed a bug.
- Zahra Sharbaf helped in testing and found several bugs.
Below a description of the most noteworthy points is given.
- Software tarballs: all updated software now have a unified format
tarball (ustar; if not possible, pax) and unified compression (Lzip) in
Maneage's software repository in Zenodo
(https://doi.org/10.5281/zenodo.3883409). For more on this See
https://savannah.nongnu.org/task/?15699 . This won't affect any extra
software you would like to add; you can use any format recognized by
GNU Tar, and all common compression algorithms. This new requirement is
only for software that get merged to the core Maneage branch.
- Metastore (and thus libbsd and libmd) moved to highlevel: Metastore
(and the packages it depends on) is a high-level product that is only
relevant during the project development (like Emacs!): when the user
wants the file meta data (like dates) to be unchanged after checking
out branches. So it should be considered a high-level software, not
basic. Metastore also usually causes many more headaches and error
messages, so personally, I have stopped using it! Instead I simply
merge my branches in a separate clone, then pull the merge commit: in
this way, the files of my project aren't re-written during the checkout
phase and therefore their dates are untouched (which can conflict with
Make's dates on configuration files).
- The un-official cloned version of Flex (2.6.4-91 until this commit) was
causing problems in the building of Netpbm, so with this commit, it has
been moved back to version 2.6.4.
- Netpbm's official page had version 10.73.38 as the latest stable
tarball that was just released in late 2021. But I couldn't find our
previously-used version 10.86.99 anywhere (to see when it was released
and why we used it! Its at last more than one year old!). So the
official stable version is being used now.
- Improved instructions in 'README.md' for building software environment
in a Docker container (while having project source and output data
products on the local system; including the usage of the host's
'/dev/shm' to speed up temporary operations).
- Until now, the convention in Maneage was to put eight SPACE characters
before the comment lines within recipes. This was done because by
default GNU Emacs (also many other editors) show a TAB as eight
characters. However, in other text editors, online browsers, or even
the Git diff, a TAB can correspond to a different number of
characters. In such cases, the Maneage recipes wouldn't look too
interesting (the comments and the recipe commands would show a
different indentation!).
With this commit, all the comment lines in the Makefiles within the
core Maneage branch have a hash ('#') as their first character and a
TAB as the second. This allows the comment lines in recipes to have the
same indentation as code; making the code much more easier to read in a
general scenario including a 'git diff' (editor agnostic!).
P.S. List of updated software with their old and new versions
- Software with no version update are not mentioned.
- The old version of newly added software are shown with '--'.
Name (Basic) Old version New version
------------ ----------- -----------
Bzip2 1.0.6 1.0.8
CURL 7.71.1 7.79.1
Dash 0.5.10.2 0.5.11.5
File 5.39 5.41
Flock 0.2.3 0.4.0
GNU Bash 5.0.18 5.1.8
GNU Binutils 2.35 2.37
GNU Coreutils 8.32 9.0
GNU GCC 10.2.0 11.2.0
GNU M4 1.4.18 1.4.19
GNU Readline 8.0 8.1.1
GNU Tar 1.32 1.34
GNU Texinfo 6.7 6.8
GNU diffutils 3.7 3.8
GNU findutils 4.7.0 4.8.0
GNU gmp 6.2.0 6.2.1
GNU grep 3.4 3.7
GNU gzip 1.10 1.11
GNU libunistring 0.9.10 1.0
GNU mpc 1.1.0 1.2.1
GNU mpfr 4.0.2 4.1.0
GNU nano 5.2 6.0
GNU ncurses 6.2 6.3
GNU wget 1.20.3 1.21.2
Git 2.28.0 2.34.0
Less 563 590
Libxml2 2.9.9 2.9.12
Lzip 1.22-rc2 1.22
OpenSLL 1.1.1a 3.0.0
Patchelf 0.10 0.13
Perl 5.32.0 5.34.0
Podlators -- 4.14
Name (Highlevel) Old version New version
---------------- ----------- -----------
Apachelog4cxx 0.10.0-603 0.12.1
Astrometry.net 0.80 0.85
Boost 1.73.0 1.77.0
CFITSIO 3.48 4.0.0
Cmake 3.18.1 3.21.4
Eigen 3.3.7 3.4.0
Expat 2.2.9 2.4.1
FFTW 3.3.8 3.3.10
Flex 2.6.4-91 2.6.4
Fontconfig 2.13.1 2.13.94
Freetype 2.10.2 2.11.0
GNU Astronomy Utilities 0.12 0.16.1-e0f1
GNU Autoconf 2.69.200-babc 2.71
GNU Automake 1.16.2 1.16.5
GNU Bison 3.7 3.8.2
GNU Emacs 27.1 27.2
GNU GDB 9.2 11.1
GNU GSL 2.6 2.7
GNU Help2man 1.47.11 1.48.5
Ghostscript 9.52 9.55.0
ICU -- 70.1
ImageMagick 7.0.8-67 7.1.0-13
Libbsd 0.10.0 0.11.3
Libffi 3.2.1 3.4.2
Libgit2 1.0.1 1.3.0
Libidn 1.36 1.38
Libjpeg 9b 9d
Libmd -- 1.0.4
Libtiff 4.0.10 4.3.0
Libx11 1.6.9 1.7.2
Libxt 1.2.0 1.2.1
Netpbm 10.86.99 10.73.38
OpenBLAS 0.3.10 0.3.18
OpenMPI 4.0.4 4.1.1
Pixman 0.38.0 0.40.0
Python 3.8.5 3.10.0
R 4.0.2 4.1.2
SWIG 3.0.12 4.0.2
Util-linux 2.35 2.37.2
Util-macros 1.19.2 1.19.3
Valgrind 3.15.0 3.18.1
WCSLIB 7.3 7.7
Xcb-proto 1.14 1.14.1
Xorgproto 2020.1 2021.5
Name (Python) Old version New version
------------- ----------- -----------
Astropy 4.0 5.0
Beautifulsoup4 4.7.1 4.10.0
Beniget -- 0.4.1
Cffi 1.12.2 1.15.0
Cryptography 2.6.1 36.0.1
Cycler 0.10.0 0.11.0+}
Cython 0.29.21 0.29.24
Esutil 0.6.4 0.6.9
Extension-helpers -- 0.1
Galsim 2.2.1 2.3.3
Gast -- 0.5.3
Jinja2 -- 3.0.3
MPI4py 3.0.3 3.1.3
Markupsafe -- 2.0.1
Numpy 1.19.1 1.21.3
Packaging -- 21.3
Pillow -- 8.4.0
Ply -- 3.11
Pyerfa -- 2.0.0.1
Pyparsing 2.3.1 3.0.4
Pythran -- 0.11.0
Scipy 1.5.2 1.7.3
Setuptools 41.6.0 58.3.0
Six 1.12.0 1.16.0
Uncertainties 3.1.2 3.1.6
Wheel -- 0.37.0
Name (R) Old version New version
-------- ----------- -----------
Cli -- 2.5.0
Colorspace -- 2.0-1
Cowplot -- 1.1.1
Crayon -- 1.4.1
Digest -- 0.6.27
Ellipsis -- 0.3.2
Fansi -- 0.5.0
Farver -- 2.1.0
Ggplot2 -- 3.3.4
Glue -- 1.4.2
GridExtra -- 2.3
Gtable -- 0.3.0
Isoband -- 0.2.4
Labeling -- 0.4.2
Lifecycle -- 1.0.0
Magrittr -- 2.0.1
MASS -- 7.3-54
Mgcv -- 1.8-36
Munsell -- 0.5.0
Pillar -- 1.6.1
R-Pkgconfig -- 2.0.3
R6 -- 2.5.0
RColorBrewer -- 1.1-2
Rlang -- 0.4.11
Scales -- 1.1.1
Tibble -- 3.1.2
Utf8 -- 1.2.1
Vctrs -- 0.3.8
ViridisLite -- 0.4.0
Withr -- 2.4.2
|
|
As part of Commit 87b510bc, an Emacs spell check was run on the
paper. However, during the process, the Jupyter add-on name 'nbextensions'
was mistakenly "corrected" to "extension's"! With this commit, it has been
corrected to its correct name.
The commit message was edited to add more clarity/context, also Florian's
name has been added in the acknowledgments by Mohammad.
|
|
This commit provides a hack/correction to the unwrapped GCC source files
that sym-links the generic file 'libgcc/unwind-generic.h' to the two
directories in which a file includes "unwind.h" or <unwind.h>. The aim is
that the gcc compilation system uses this header file from the internal gcc
source files instead of searching for a system-level file 'unwind.h'.
This commit also unaliases two 'ls' commands in some build recipes of
'basic.mk' in case the host system (normally at user level) has aliased the
command to something like 'ls -F'. In the situation that sometimes occurs
of library files being given executable status, the '-F' decorative option
could lead to an asterisk being included in a string that is not expected
to contain asterisks. If the system shell does not contain the 'alias'
command at all, then a fallback of 'true' should provide safe
behaviour. The notation of the 'sed' command is also clarified.
This solves bug #61240: https://savannah.nongnu.org/bugs/index.php?61240
|
|
Until now, the 'RPATH' variable (specifying where to look for shared
libraries) wasn't being set in the 'libcrypto' library of OpenSSL (it was
only set for the 'libssl' library).
Also, Gettext used the host Emacs for some operations during installation
that could cause the following crash (because we are giving priority to
local libraries, which the host Emacs doesn't recognize):
emacs: /BDIR/libcrypto.so.1.1: version `OPENSSL_1_1_1b' not found
(required by /lib64/libk5crypto.so.3)
With this commit both these bugs have been fixed: 1) Patchelf is run on the
'libcrypto' library also and 2) we pass the '--without-emacs' configuration
option to the configure script of Gettext.
These bugs were found by Elham Saremi.
|
|
Antonio kindly proposed these corrections (mostly in Appendix A, but one
also at the start of Appendix B). They are fixed with this commit.
|
|
While looking at the affiliations, I noticed that "France" was missing in
my Lyon affiliation! Also, for both Boud and myself it was necessary to put
a '.' after 'Univ' because its short for University and not a full word.
|
|
On systems that allow it (like GNU/Linux systems), Maneage will build the
necessary software in shared memory (a directory that is actually in the
RAM, not on an SSD/HDD, on GNU/Linux systems, it is '/dev/shm'). This
allows Maneage to operate faster and not harm the HDD/SSD with all the
temporary writing of many small files.
Until now, we would only check that this directory exists and that it has
enough space. However, some systems also set the 'noexec' flag on shared
memory for security reasons [1]. This causes Maneage to crash upon building
of the software in later phases.
With this commit, at the very start of the configuration step, and after
all other shared-memory checks are done, a dummy executable script file is
created there and its execution is tested. If it doesn't work, shared
memory will not be used at all.
In the process, the steps dealing with the software building directory in
the configure script have been brought in one place and comments were added
to further clarify every step.
This commit was initially done by Boud Roukema and later edited by Mohammad
Akhlaghi.
[1] https://web.archive.org/web/20210624192819/https://serverfault.com/questions/72356/how-useful-is-mounting-tmp-noexec
|
|
While having a fast glance at Appendix A, I noticed two small parts that
could be improved by adding a 'from' and using 'Maneage' instead of "the
proposed solution". They are corrected with this commit.
|
|
I just(!) noticed that in the CiSE version of the paper, they replaced the
"Towards" (first word in the title) with "Toward" (removing the
's'). According to thorough history provided by the Merriam-Webster
dictionary[1], the difference is mainly because of US/British English.
Also, they have slightly changed the capitalizations of the "long-term"
phrase, from "Long-term" that we had initially used to "Long-Term". I have
no particular opinion on this and accept their judgement.
To keep things in line with the published paper, I am correcting both these
issues in our version of the paper also (that will later go in arXiv).
https://www.merriam-webster.com/words-at-play/toward-towards-usage
|
|
This commit changes the rather confused sentence ending "is, thus, not any
the less valuable as itself" to "often as valuable as the result
itself". This clarifies the intended meaning. The error was unfortunately
missed by the proofreaders of our article.
|
|
In the old versions of this paper, the two components of Figure 1 were
under each other, so we referred to them as "top" and "bottom"! However, we
later put them beside each other (by shrinking the data graph), so they
became "left" and "right".
I just noticed that within the main body of the text, in one place, we were
still mistakenly saying "bottom"! So with this commit, it has been changed
to "right".
Unfortunately this has gone into the final publication on CiSE, but it is
important to fix such minor issues anyway (the good thing with having a Git
history!), we also haven't yet put the final upload on arXiv.
|
|
In the discussion on criteria that Popper lacks, the last mentioned
criteria "including the narrative" is written in such a way that can
confuse readers into thinking that only a single criteria is
lacking. Hyphenating ('including-the-narrative') has been applied to make
the sentence less likely to be misunderstood.
The ending of the first paragraph in the "Generational gaps" item in
Appendix A.G ("... every few years is not practically possible.") sounds
like "not almost possible". So it can cause confusions. Endings that are
much clearer include:
* is impractical.
* is not possible in practice.
* is not practical.
* is not possible practically.
[meaning 2. is less likely in this case]
I've selected the first option, also replacing "they" by "scientists" to
avoid the misinterpretation that "programming languages ... have their own
science field to focus on".
This commit and the previous one were "amended" by Mohammad (compared to
the original commits that Boud had sent).
|
|
This commit does several small copyediting fixes in the body of the
appandices which should improve their readability.
|
|
Based on a reasonable suggestion on ethical reasoning [1], this commit
replaces the git.sdf.org + archive.today pair of URLs in the footnote on
Github's unethical aspects, with a single archive.org URL, which contains
the original URL, making this sufficient for readers wishing to check
either the live or archived versions.
[1] https://social.privacytools.io/@resist1984/106403926114506533
https://social.privacytools.io/@resist1984/106403932399114639
|
|
This commit adds a few sentences in relation to the first known attempt to
store and make available git repository hosting ephemera (GHTorrent,
introduced to us by Roberto Di Cosmo). Since one of the two sponsors of
GHTorrent is Microsoft, both the ethics and practical aspects of this in
the context of reproducibility and scientific ethics as expressed by the
international scientific community are rather unclear, so a link to one of
the well-known lists of practical and ethical issues with Github is
included.
A minor fix is made in 'tex/src/appendix-existing-solutions.tex', since the
word 'data' is plural (singular is 'datum').
|
|
This is the version of the project that will be published in Computing in
Science and Engineering (CiSE), Volume 23, Issue 3, Pages 82--91.
|
|
After going through Boud's corrections and edits in the previous commit, I
thought some minor clarifications would be necessary, and they are
implemented in this commit.
Also, in preparation for submission to the journal, the top-level software
heritage ID has been corrected to the latest commit on Software Heritage.
|
|
This commit makes several copyediting changes to the appendices and to the
supplement.tex introduction to the appendices.
The ArXiv unofficially increased upload limit of 50 Mb comes from a tweet:
https://nitter.fdn.fr/arxiv/status/1286381643893268483 (archive:
https://archive.today/PdxhT) but not listed on official ArXiv pages. So it
seems safer not to quote a value. The very old value was 0.5 Mb - out of
respect to people with low bandwidth, especially scientists in poor
countries. Tweets are generally not acceptable as "reliable sources" in
en.Wikipedia.
|
|
David made suggested some minor edits that are now implemented (most
importantly that he would not like to be associated with an ORCID ID).
I also "saved" a new Zenodo DOI for the final submission of this paper to
Zenodo, but "after" obtaining the page number information and other minor
things.
|
|
Until now the appendix only touched upon the archival aspects of scholarly
research producs (data, code, narrative). To help in clarity, the context
of this section has been improved, giving more explanations and examples.
|
|
After Boud posted a notice about Maneage in an online forum [1], Rémi
Rampin and Vicky Rampin (from the ReproZip project) replied with some notes
about our review of ReproZip in Appendix B. We are very grateful to both
Rémi and Vicky for looking into it and for their comments, their
contribution has been gratefully acknowledged with this commit.
The relevant comments are listed below and have been addressed in this
commit (see the 'diff' of this commit).
- [Rémi Rampin] ReproZip can capture the build step if you want it to,
it's just another command. So if you want to trace "make" and "pip
install" etc before tracing your actual experiment, you will have all
that build information.
- [Rémi Rampin] Bundle size is easily fixed by not putting terabyte-sized
data in the bundle, which is done by editing a simple configuration
file.
- [Vicky Rampin] Not all the files in the bundle are compiled/binary files
[in relation to the old sentence "ReproZip just copies the
binary/compiled files used in a project"].
[1] https://framapiaf.org/@boud/106296894758145705
|
|
This commit updates some of the publication data
in README-hacking.md : Peper+Roukema (2021) is now
published in MNRAS and Akhlaghi+ (2021) is published online and
very close to getting a conventional volume and page number. :)
See task
https://savannah.nongnu.org/task/?15736
for ideas of how to make a more systematic publication
list instead one managed by prose text. There
are already too many non-automated places for publication
lists where we have to copy/paste our publication data again
and again and again and ...
This commit also adds the softwareheritage ID that we have in the
content of Akhlaghi+2021 (without the extra context, because as a
URL that's very long). There are plenty of arguments to be made
each way for different versions of the swh IDS. One advantage of
the 'rev' ID is that the hash is the original (full) git hash,
which is what I've done for the elaphrocentre and subpoisson
papers.
|
|
Once a year, the texlive update system becomes incompatible with the
version from the previous year. Since a texlive install failure is
considered non-fatal by 'high-level.mk', so until now, the user could miss
the printed message and mistakenly believe that the configure is valid.
This commit explicitly adds a 10-second delay that should be enough for a
user who does the 'configure --existing-conf' step alone to notice that
there is a TeX Live problem. It also adds the explicit instruction of how
to allow an update from an earlier year's texlive installer to the warning
message (by deleting '.build/software/tarballs/install-tl-unx.tar.gz'). I
had to rediscover this a few times for old Maneage installs.
Also, a few lines in 'reproduce/software/shell/configure.sh' were indented
with a TAB (that is not recommended because TAB is displayed with different
widths on different browsers). So while doing this commit, those TABs were
also converted to a space.
|
|
This commit contains minor fixes in Appendix B.
ReproZip: As Vicky Rampin points out [1], ReproZip typically also
includes non-binary files, so I removed "just" and improved
the wording.
Popper: the Popper URL that we gave is obsolete; at Wayback
Machine it redirects to getpopper.io [2], so I've updated this;
and I've fixed up the wording ('off of' only exists in US
English).
[1] https://octodon.social/@VickyRampin/106298214313216228
[2] https://web.archive.org/web/20210425223605/http://falsifiable.us/
|
|
|
|
This commit adds a few extremely brief and incomplete paragraphs
on archiving, including URLs, as what is now subsection D of
Appendix A.
|
|
A few days, CiSE gave us a proof of the edited text and formatted
PDF. After comparing the edited text with our text, I noticed some minor
editorial issues that have been corrected in this commit. The parts that
were wrong (or could be improved in the proof) have been listed and will be
submitted to the journal.
In particular, following the recommendation from the editor, the
biographies were extended with a full listing of each author's affiliation,
I also added our ORCID IDs in the biographies.
|
|
Until now, the paragraph impilied implicitly that the 'n2t.net' link is the
only way to access SWHIDs. Also, context/content duality wasn't too clear
in the end where I had mentioned to click on the digital format SWHID.
With this commit, I tried to edit it and avoid these two sources of
confusion.
|
|
The most basic way to resolve a Software Heritage identifier (SWHID) is to
prefix it with 'https://archive.softwareheritage.org'. However, Roberto Di
Cosmo informed me that SWHIDs are also resolved by 'n2t.net' and
'identifiers.org'.
With this commit, on the first occurance of an SWHID, I added some
explanation of how to resolve it by adding 'http://n2t.org' (since it was
the shorter option).
Some further minor edits were made:
- In the manuscript submission information, instead of "published on
IEEE", I wrote "first published online". The journal name is available
on the top of every page and doesn't include "IEEE", so this hopefully
avoids some confusion for people who don't know CiSE is published by
IEEE.
- The URL with the link to Ubuntu images was moved to footnotes to help
the readablity and better type-setting of the paragraph. A minor edit
was then made in that paragraph to shrink the paragraph by two words
that had occupied a whole line in its end.
- The first comment line in the second listing (Git commands to start a
new branch from Maneage) was slightly edited to avoid the term 'main'
(which could be confused with the branch name after 'git checkout -b
main').
- In the acknowledgements, the paragraph on Maneage commit/branch
information was moved at the top so the people and institutions are
acknowledged immediately after each other.
- Some minor edits were made in the Spanish acknowledgements to fit with
new project names.
|
|
Until now, the SWHIDs were not accessible in the print version of the
paper, they were only hidden as hyper-links within the PDF for readers to
click on. This is not a robust way to use the fruits of Software Heritage
and was kindly highlighted by Roberto Di Cosmo (principle investigator of
Software Heritage) after a first look at the paper.
With this commit, following the recommendation of Roberto, all the URLs are
corrected to print the raw SWHID as a footnote (for example
'swh:1:dir:...', for directories, or 'swh:1:cnt:...', for
contents/files). The click-able link of the SWHID also contains the context
(for example "origin" and etc).
In the process I noticed that the paper submission/acceptance info was not
filled and was also a footnote (which would not be seen if not cited). So
this information (received, accepted and published on IEEE) is now taken
just under the author list on the first page heading.
|
|
Until now, while the series of steps mentioned in 'README.md' were
complete, they had some implicit thing in them that made it a little hard
to run as a checklist (the commands to do some basic things weren't
included). Also, it was recommending to run a long 'docker run ...'
command, which wasn't too user friendly.
With this commit, the series of steps is now a complete checklist,
containing every step. Also, the checklist now recommends putting the long
'docker run' command inside a script called 'docker-run' that will also do
a 'sudo' internally (thus making things very easy for a first-time user).
Also, since the 'docker-run' script contains host OS-specific directory
names, it should not be under control, so it has been added to the
'.gitignore' file in case users decide to keep this same name (which is
recommended).
|
|
The DOI of the paper has been minted by IEEE, so as a step to finalize this
paper, it has been added to the REAMEME.md and the header of all PDF
pages. Along with the DOI in the header, the arXiv and Zenodo links are
also added to the header (they are small, and won't bother the reading).
|
|
Some minor conflicts (all expected from the commit messages in the Maneage
branch) occurred but were easily fixed.
|
|
Summary:
- Use the new name of this variable in your Makefiles.
- In 'metadata.conf', remove fixed URL prefixes for DOIs
('https://doi.org/') or arXiv ('https://arxiv.org/abs').
Until now, the Make variable that would print the general metadata (of
whole project) into each to-be-published dataset was called
'print-copyright'! But it now does much more than simply printing the
copyright, it will also print a lot of metadata like arXiv ID, Zenodo DOI
and etc into plain-text outputs. The out-dated name could thus be
misleading and cause confusions.
With this commit, the variable is therefore called
'print-general-metadata'. After merging your project with the Maneage
branch, please replace any usage of 'print-copyright' to
'print-general-metadata'.
Also with this commit, 'README-hacking.md' mentions 'metadata.conf' and
'print-general-metadata' in the "Publication checklist" section and reminds
you to keep the first up to date, and use the second in your
to-be-published datasets.
|
|
In the project's 'metadata.conf', we also have an option to store the
journal DOI of the project (that will later be printed in the output file
products). So now that the paper's DOI has been set by the journal, it was
time to add it in the project too.
While looking at the usage of the metadata, I noticed that the "Publication
checklist" of 'README-hacking.md' didn't talk about it. In fact, the part
about putting metadata went into a lot of detail without even mentioning
the generic 'print-general-metadata' variable (previously called
'print-copyright') that is created in 'initialize.mk'. So I removed those
extra points and just recommended using this variable for plain-text files
and putting similar info in other formats.
Some other minor changes were made:
- The metadata now doesn't need the fixed 'https://doi.org/' prefix (to
make it consistent with the arXiv identifier). Inside 'initialize.mk',
there are now two variables called 'doi-prefix-url' and
'arxiv-prefix-url' that contain the fixed prefix.
- The 'print-copyright' name was clearly outdated for all the extra
metadata that this variable created (including the copyright). So its
name was changed to 'print-general-metadata'.
The generic Maneage changes will be taken into Maneage after this (they
were tested here).
|
|
In the previous commit, I had forgot to put a '-f' before the 'git add'!
Becauase '.txt' files are set to be ignored in Git by default (they are
marked in '.gitignore').
With this commit this file is now added into the project history.
|
|
The email notice of the final acceptance of this paper in CiSE has been
included in the project and the stylistic points that were raised by the
editor in chief (EiC) have also been implemented. The most important points
were:
- Including citations within the text structure (as if they would be
footnotes), so things like "see \cite{...}" should have been changed.
- Hyperlinks should be printed as footnotes (because the journal gets
actually printed).
Also, to avoid the second listing breaking between pages, it has been moved
to after the next paragraph.
|
|
Being immutable doesn't necessary mean that something is always present, so
an "always present" was also added for the reason we recommend a Git
hash. The end of the sentence was also slightly summarized to allow the
extra few words.
The re-wording of the conclusion of Active papers, was great! I just
changed the "likely" to "possible", because as Konrad mentioned in Commit
a63900bc5a8, he is now using Guix.
|
|
These are minor last minute copyedits for recently added text,
e.g. a git hash is not literally a timestamp.
|
|
Roberto has recently moved to a new position as professor in the
Universidad Internacional de La Rioja. With this commit, his short bio and
email address have thus been updated in the main paper to reflect this.
|
|
Until now, we were primarily linking people to the Gitlab fork of this
paper. However, since this paper is part of Maneage, its main repository is
on Maneage's own server at http://git.maneage.org/paper-concept.git
With this commit therefore, all the gitlab.com URLs have been corrected to
owr own Git server.
While looking into Git-related points, I also noticed that in the demo code
listing showing how to clone Maneage and start a new project, we were using
Git's old/depreciated 'master' name. Git (and almost all common
repositories) now use 'main' as the default branch name, so this has also
been corrected here.
|
|
I attended one of Peter Wittenburg's talks in the context of RDA on the
Canonical Workflow Frameworks for Research (CWFR). Afterwards I got in
touch with him about Maneage and this paper. He kindly read the paper was
very supportive of it with positive/encouraging feedback.
It was thanks to that discussion that I added CWFR in the discussion (in
the previous commit). But since that commit was focused on IAA's
suggestions, I am acknowledging Peter here.
|