aboutsummaryrefslogtreecommitdiff
path: root/tex
diff options
context:
space:
mode:
authorBoud Roukema <boud@cosmo.torun.pl>2021-05-25 18:35:30 +0200
committerBoud Roukema <boud@cosmo.torun.pl>2021-05-25 18:35:30 +0200
commit6d83f32451f3765834e631a3ed55ff41d01ebca0 (patch)
treefb3c2c904d488db3560a4c58d06343c4f8172c7b /tex
parent6ab5696481366659f0120f05033b5ed7bbd944a8 (diff)
Brief notes on archiving as Appendix A.D
This commit adds a few extremely brief and incomplete paragraphs on archiving, including URLs, as what is now subsection D of Appendix A.
Diffstat (limited to 'tex')
-rw-r--r--tex/src/appendix-existing-tools.tex33
1 files changed, 33 insertions, 0 deletions
diff --git a/tex/src/appendix-existing-tools.tex b/tex/src/appendix-existing-tools.tex
index 3aba534..1062aba 100644
--- a/tex/src/appendix-existing-tools.tex
+++ b/tex/src/appendix-existing-tools.tex
@@ -324,6 +324,39 @@ The team can host the Git history on a web page and collaborate through that.
There are several Git hosting services for example \href{http://codeberg.org}{codeberg.org}, \href{http://gitlab.com}{gitlab.com}, \href{http://bitbucket.org}{bitbucket.org} or \href{http://github.com}{github.com} (among many others).
Storing the changes in binary files is also possible in Git, however it is most useful for human-readable plain-text sources.
+
+
+
+
+
+
+
+\subsection{Archiving}
+\label{appendix:archiving}
+
+Long-term, bytewise, checksummed archiving of software research projects is necessary for a project to be reproducible many decades later.
+The Wayback Machine\footnote{\inlinecode{\url{https://archive.org}}} and similar services such as Archive Today\footnote{\inlinecode{\url{https://archive.today}}} provide on-demand long-term archiving of web pages, which is a critically important service for preserving the history of the World Wide Web.
+However, research project software archiving requires the preservation of files and metadata about the files, not of web pages.
+This is commonly done in public research repositories such as Zenodo\footnote{\inlinecode{\url{https://zenodo.org}}}, which publishes md5sums of uploaded files, freezes them as a DOI-identified version of record, and provides convenient maintenance of metadata by the uploading user.
+Universities now regularly provide their own repositories,\footnote{E.g. \inlinecode{\url{https://repozytorium.umk.pl}}} many of which are registered with the \emph{Open Archives Initiative} that aims at repository interoperability.\footnote{\inlinecode{\url{https://www.openarchives.org/Register/BrowseSites}}}
+
+For preserving the full editing records of a software project, \emph{Software Heritage}\citeappendix{dicosmo18} is especially useful.
+Software Heritage allows a user to anonymously nominate the URL of a git (or cvs) commit history of any project and request that it be archived.
+The Software Heritage scripts (themselves free-licensed) download the repository and allow the repository as a whole or individual files to be accessed using a URI.
+
+The {\LaTeX} and figure source files for the final research paper itself are also best archived on a preprint server such as ArXiv\footnote{\inlinecode{\url{https://arXiv.org}}}, which pioneered the archiving of research papers.
+ArXiv recommends that the figures of a research paper are provided in postscript, a plain-text format, to maximise long-term longevity, and (normally) provides the source package and both postscript and pdf formats of the paper by email and on the web.
+ArXiv provides long-term stable URIs, allowing versions, for each accepted research preprint.\footnote{\inlinecode{\url{https://arxiv.org/help/arxiv_identifier}}}
+
+An open question in archiving the full sequence of steps that go into a quantitative scientific research project is how to or whether to preserve ``scholarly ephemera'' in scientific software development.
+This refers to discussion about the software such as reports on bugs or proposals of adding features, which are usually referred to as ``Issues'', and ``pull requests'', which propose that a change be ``pulled'' into the main branch of a software development repository by the core developers.
+These ephemera are not part of the git commit history of a software project, but add wider context and understanding beyond the commit history itself, and provide a record that could be used to allocate intellectual credit.
+For these reasons, the \emph{Investigating \& Archiving the Scholarly Git Experience} (IASGE) project proposes that the empemera should be archived as well as the git repositories themselves.\footnote{\inlinecode{\href{https://investigating-archiving-git.gitlab.io/updates/define-scholarly-ephemera}{https://investigating-archiving-git.gitlab.io/updates/}}\\\inlinecode{\href{https://investigating-archiving-git.gitlab.io/updates/define-scholarly-ephemera}{define-scholarly-ephemera}}}
+
+
+
+
+
\subsection{Job management}
\label{appendix:jobmanagement}
Any analysis will involve more than one logical step.