diff options
author | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2021-04-09 01:08:31 +0100 |
---|---|---|
committer | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2021-04-09 02:00:18 +0100 |
commit | a63900bc5a83052081e6ca6bcc0a2bb4ee5a860e (patch) | |
tree | 15c7dcdff040b4c60110547de71d08ff0f5fadd0 /reproduce/src | |
parent | 55d6570aecc5f442399262b7faa441d16ccd4556 (diff) |
Comments by Konrad Hinsen implemented
Konrad had kindly gone through the paper and the appendices with very good
feedback that is now being addressed in the paper (thanks a lot Konrad!):
- IPOL recently also allows Python code. So the respective parts of the
description of IPOL have been updated. To address the dependency issue, I
also added a sentence that only certain dependencies (with certain
versions) are acceptable.
- On Active Papers (AP: which is written by Konrad) corrections were made
based on the following parts of his comments:
- "The fundamental issue with ActivePapers is its platform dependence on
either Java or Python, neither of which is attractive."
- "The one point which is overemphasized, in my opinion, is the necessity
to download large data files if some analysis script refers to it. That
is true in the current implementation (which I consider a research
prototype), but not a fundamental feature of the approach. Implementing
an on-demand download strategy is not particularly complicated, it just
needs to be done, and it wasn't a priority for my own use cases."
- "A historical anecdote: you mention that HDF View requires registering
for download. This is true today, but wasn't when I started
ActivePapers. Otherwise I'd never have built on HDF5. What happened is
that the HDF Group, formerly part of NCSA and thus a public research
infrastructure, was turned into a semi-commercial entity. They have
committed to keeping the core HDF5 library Open Source, but not any of
the tooling around it. Many users have moved away from HDF5 as a
consequence. The larger lesson is that Richard Stallman was right: if
software isn't GPLed, then you never know what will happen to it in the
future."
- On Guix, some further clarification was added to address Konrad's quote
below (with a link to the blog-post mentioned there). In short, I
clarified that I mean storing the Guix commit hash with any respective
high-level analysis change is the extra step.
- "I also looked at the discussion of Nix and Guix, which is what I am
mainly using today. It is mostly correct as well, the one exception
being the claim that 'it is up to the user to ensure that their created
environment is recorded properly for reproducibility in the
future'. The environment is *recorded* in all detail,
automatically. What requires some effort is extracting a human-readable
description of that environment. For Guix, I have described how to do
this in a blog post
(https://guix.gnu.org/en/blog/2020/reproducible-computations-with-guix/),
and in less detail in a recent CiSE paper
(https://hal.archives-ouvertes.fr/hal-02877319). There should
definitely be a better user interface for this, but it's no more than a
user interface issue. What is pretty nice in Guix by now is the user
interface for re-creating an environment, using the "guix time-machine"
subcommand."
- The sentence on Software Heritage being based on Git was reworded to fit
this comment of Konrad: "The plural sounds quite optimistic. As far as I
know, SWH is the only archive of its kind, and in view of the enormous
resources and long-time commitments it requires, I don't expect to see a
second one."
- When introducing hashes, Konrad suggested the following useful paper that
shows how they are used in content-based storage:
DOI:10.1109/MCSE.2019.2949441
- On Snakemake, Konrad had the following comment: "[A system call in Python
is] No slower than from bash, or even from any C code. Meaning no slower
than Make. It's the creation of a new process that takes most of the
time." So the point was just shifted to the many quotations necessary for
calling external programs and how it is best suited for a Python-based
project.
In addition some minor typos that I found during the process are also
fixed.
Diffstat (limited to 'reproduce/src')
0 files changed, 0 insertions, 0 deletions