diff options
| author | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2021-04-09 01:08:31 +0100 | 
|---|---|---|
| committer | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2021-04-09 02:00:18 +0100 | 
| commit | a63900bc5a83052081e6ca6bcc0a2bb4ee5a860e (patch) | |
| tree | 15c7dcdff040b4c60110547de71d08ff0f5fadd0 /reproduce/src | |
| parent | 55d6570aecc5f442399262b7faa441d16ccd4556 (diff) | |
Comments by Konrad Hinsen implemented
Konrad had kindly gone through the paper and the appendices with very good
feedback that is now being addressed in the paper (thanks a lot Konrad!):
- IPOL recently also allows Python code. So the respective parts of the
  description of IPOL have been updated. To address the dependency issue, I
  also added a sentence that only certain dependencies (with certain
  versions) are acceptable.
- On Active Papers (AP: which is written by Konrad) corrections were made
  based on the following parts of his comments:
  - "The fundamental issue with ActivePapers is its platform dependence on
    either Java or Python, neither of which is attractive."
  - "The one point which is overemphasized, in my opinion, is the necessity
    to download large data files if some analysis script refers to it. That
    is true in the current implementation (which I consider a research
    prototype), but not a fundamental feature of the approach. Implementing
    an on-demand download strategy is not particularly complicated, it just
    needs to be done, and it wasn't a priority for my own use cases."
  - "A historical anecdote: you mention that HDF View requires registering
    for download. This is true today, but wasn't when I started
    ActivePapers. Otherwise I'd never have built on HDF5. What happened is
    that the HDF Group, formerly part of NCSA and thus a public research
    infrastructure, was turned into a semi-commercial entity.  They have
    committed to keeping the core HDF5 library Open Source, but not any of
    the tooling around it. Many users have moved away from HDF5 as a
    consequence. The larger lesson is that Richard Stallman was right: if
    software isn't GPLed, then you never know what will happen to it in the
    future."
- On Guix, some further clarification was added to address Konrad's quote
  below (with a link to the blog-post mentioned there). In short, I
  clarified that I mean storing the Guix commit hash with any respective
  high-level analysis change is the extra step.
  - "I also looked at the discussion of Nix and Guix, which is what I am
    mainly using today. It is mostly correct as well, the one exception
    being the claim that 'it is up to the user to ensure that their created
    environment is recorded properly for reproducibility in the
    future'. The environment is *recorded* in all detail,
    automatically. What requires some effort is extracting a human-readable
    description of that environment. For Guix, I have described how to do
    this in a blog post
    (https://guix.gnu.org/en/blog/2020/reproducible-computations-with-guix/),
    and in less detail in a recent CiSE paper
    (https://hal.archives-ouvertes.fr/hal-02877319). There should
    definitely be a better user interface for this, but it's no more than a
    user interface issue. What is pretty nice in Guix by now is the user
    interface for re-creating an environment, using the "guix time-machine"
    subcommand."
- The sentence on Software Heritage being based on Git was reworded to fit
  this comment of Konrad: "The plural sounds quite optimistic. As far as I
  know, SWH is the only archive of its kind, and in view of the enormous
  resources and long-time commitments it requires, I don't expect to see a
  second one."
- When introducing hashes, Konrad suggested the following useful paper that
  shows how they are used in content-based storage:
  DOI:10.1109/MCSE.2019.2949441
- On Snakemake, Konrad had the following comment: "[A system call in Python
  is] No slower than from bash, or even from any C code. Meaning no slower
  than Make. It's the creation of a new process that takes most of the
  time." So the point was just shifted to the many quotations necessary for
  calling external programs and how it is best suited for a Python-based
  project.
In addition some minor typos that I found during the process are also
fixed.
Diffstat (limited to 'reproduce/src')
0 files changed, 0 insertions, 0 deletions
