aboutsummaryrefslogtreecommitdiff
path: root/tex/src/appendix-existing-solutions.tex
diff options
context:
space:
mode:
authorBoud Roukema <boud@cosmo.torun.pl>2021-06-08 19:18:19 +0200
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2021-06-08 18:33:57 +0100
commit6f7f00fb3fb4c14c85890bff6bd89485ccc1ff92 (patch)
tree5d0cc9429310f82011241d20c6656e5130ea1469 /tex/src/appendix-existing-solutions.tex
parentc1bc4ebfa064f61fb53bc44d221fc059b7e4abfb (diff)
Several minor edits, removed exact value of arXiv's size-limit
This commit makes several copyediting changes to the appendices and to the supplement.tex introduction to the appendices. The ArXiv unofficially increased upload limit of 50 Mb comes from a tweet: https://nitter.fdn.fr/arxiv/status/1286381643893268483 (archive: https://archive.today/PdxhT) but not listed on official ArXiv pages. So it seems safer not to quote a value. The very old value was 0.5 Mb - out of respect to people with low bandwidth, especially scientists in poor countries. Tweets are generally not acceptable as "reliable sources" in en.Wikipedia.
Diffstat (limited to 'tex/src/appendix-existing-solutions.tex')
-rw-r--r--tex/src/appendix-existing-solutions.tex16
1 files changed, 8 insertions, 8 deletions
diff --git a/tex/src/appendix-existing-solutions.tex b/tex/src/appendix-existing-solutions.tex
index 9710df8..58a3518 100644
--- a/tex/src/appendix-existing-solutions.tex
+++ b/tex/src/appendix-existing-solutions.tex
@@ -426,21 +426,21 @@ ReproZip\footnote{\inlinecode{\url{https://www.reprozip.org}}}\citeappendix{chir
The tracking is done at the kernel system-call level, so any file that is accessed during the running of the project is identified.
The tracked files can be packaged into a \inlinecode{.rpz} bundle that can then be unpacked into another system.
-ReproZip is therefore very good to take a ``snapshot'' of the running environment, at one moment into a single file.
+ReproZip is therefore very good for storing a ``snapshot'' of the running environment, at a single moment, into a single file.
However, the bundle can become very large when many/large datasets are involved, or if the software environment is complex (many dependencies).
-Furthermore, since the binary software libraries are directly copied, it can only be re-run on a systems with a similar CPU architecture.
-Furthermore, ReproZip copies all files used in a project, without a way of knowing how the software was built (its provenance).
+Furthermore, since the binary software libraries are directly copied, it can only be re-run on a systems with a compatible CPU architecture.
+Another problem is that ReproZip copies all files used in a project, without (by default) a way of knowing how the software was built (its provenance).
-As mentioned in this paper, and also Oliveira et al. \citeappendix{oliveira18} the question of ``how'' the environment was built is critical for understanding the results; simply having the binaries cannot necessarily be useful in many contexts.
-It is possible to include the build instructions of the used software within the project to be ReproZip'd, but that will again simply bloat the bundle due to the many temporary files that are created during the build of a software, add complexity and slow down the project's running time.
+As mentioned in this paper, and also Oliveira et al. \citeappendix{oliveira18}, the question of ``how'' the environment was built is critical to understanding the results; having only the binaries is not useful in many contexts.
+It is possible to include the build instructions of the software used within the project to be ReproZip'd, but this risks bloating the bundle with the many temporary files that are created during the build of the software, adding complexity and slowing down the project's running time.
For the data, it is similarly not possible to extract which data server they came from.
Hence two projects that each use a 1-terabyte dataset will need a full copy of that same 1-terabyte file in their bundle, making long-term preservation extremely expensive.
Such files can be excluded from the bundle through modifications in the configuration file.
-But this will only add complexity since a higher-level script over ReproZip needs to be written to make sure that the data and bundle are used together or check the integrity of the data.
+However, this will add complexity if a higher-level script is written above ReproZip to make sure that the data and bundle are used together or to check the integrity of the data.
-Finally, because it is only a snapshot of one moment in a project's history, preserving the connection between the ReproZip'd bundles of various points in a project's history is not easily possible (for example when software or data are updated, or new analysis methods are used).
-In other words, a ReproZip user will have to define their own subjective archival method to preserve the various black-boxs of their project as it evolves, and tracking what has changed between them is not trivial.
+Finally, because it is only a snapshot of one moment in a project's history, preserving the connection between the ReproZip'd bundles of various points in a project's history is likely to be difficult (for example, when software or data are updated, or when analysis methods are modified).
+In other words, a ReproZip user will have to personally define an archival method to preserve the various black boxes of the project as it evolves, and tracking what has changed between the versions is not trivial.