aboutsummaryrefslogtreecommitdiff
path: root/tex/src/appendix-existing-tools.tex
diff options
context:
space:
mode:
Diffstat (limited to 'tex/src/appendix-existing-tools.tex')
-rw-r--r--tex/src/appendix-existing-tools.tex73
1 files changed, 37 insertions, 36 deletions
diff --git a/tex/src/appendix-existing-tools.tex b/tex/src/appendix-existing-tools.tex
index a773322..8ad97ef 100644
--- a/tex/src/appendix-existing-tools.tex
+++ b/tex/src/appendix-existing-tools.tex
@@ -50,9 +50,9 @@ Therefore, a process that is run inside a virtual machine can be much slower tha
An advantage of VMs is that they are a single file that can be copied from one computer to another, keeping the full environment within them if the format is recognized.
VMs are used by cloud service providers, enabling fully independent operating systems on their large servers where the customer can have root access.
-VMs were used in solutions like SHARE \citeappendix{vangorp11} (which was awarded second prize in the Elsevier Executable Paper Grand Challenge of 2011 \citeappendix{gabriel11}), or in suggested reproducible papers like \citeappendix{dolfi14}.
+VMs were used in solutions like SHARE\citeappendix{vangorp11} (which was awarded second prize in the Elsevier Executable Paper Grand Challenge of 2011\citeappendix{gabriel11}), or in some suggested reproducible papers\citeappendix{dolfi14}.
However, due to their very large size, these are expensive to maintain, thus leading SHARE to discontinue its services in 2019.
-The URL to the VM file \texttt{provenance\_machine.ova} that is mentioned in \citeappendix{dolfi14} is also not currently accessible (we suspect that this is due to size and archival costs).
+The URL to the VM file \texttt{provenance\_machine.ova} that is mentioned in Dolfi et al.\citeappendix{dolfi14} is also not currently accessible (we suspect that this is due to size and archival costs).
\subsubsection{Containers}
\label{appendix:containers}
@@ -74,7 +74,7 @@ We review some of the most common container solutions: Docker, Singularity, and
An important drawback of Docker for high-performance scientific needs is that it runs as a daemon (a program that is always running in the background) with root permissions.
This is a major security flaw that discourages many high-performance computing (HPC) facilities from providing it.
-\item {\bf\small Singularity:} Singularity \citeappendix{kurtzer17} is a single-image container (unlike Docker, which is composed of modular/independent images).
+\item {\bf\small Singularity:} Singularity\citeappendix{kurtzer17} is a single-image container (unlike Docker, which is composed of modular/independent images).
Although it needs root permissions to be installed on the system (once), it does not require root permissions every time it is run.
Its main program is also not a daemon, but a normal program that can be stopped.
These features make it much safer for HPC administrators to install compared to Docker.
@@ -87,20 +87,20 @@ We review some of the most common container solutions: Docker, Singularity, and
Generally, VMs or containers are good solutions to reproducibly run/repeating an analysis in the short term (a couple of years).
However, their focus is to store the already-built (binary, non-human readable) software environment.
Because of this, they will be large (many Gigabytes) and expensive to archive, download, or access.
-Recall the two examples above for VMs in Section \ref{appendix:virtualmachines}. But this is also valid for Docker images, as is clear from Dockerhub's recent decision to delete images of free accounts that have not been used for more than 6 months.
-Meng \& Thain \citeappendix{meng17} also give similar reasons on why Docker images were not suitable in their trials.
+Recall the two examples above for VMs in Section \ref{appendix:virtualmachines}. But this is also valid for Docker images, as is clear from Dockerhub's recent decision to a new consumpiton-based payment model.
+Meng \& Thain\citeappendix{meng17} also give similar reasons on why Docker images were not suitable in their trials.
On a more fundamental level, VMs or containers do not store \emph{how} the core environment was built.
This information is usually in a third-party repository, and not necessarily inside the container or VM file, making it hard (if not impossible) to track for future users.
-This is a major problem in relation to the proposed completeness criteria and is also highlighted as an issue in terms of long term reproducibility by \citeappendix{oliveira18}.
+This is a major problem in relation to the proposed completeness criteria and is also highlighted as an issue in terms of long term reproducibility by Oliveira et al.\citeappendix{oliveira18}.
-The example of \inlinecode{Dockerfile} of \cite{mesnard20} was previously mentioned in
+The example of \inlinecode{Dockerfile} of Mesnard \& Barba\cite{mesnard20} was previously mentioned in
\ifdefined\separatesupplement
the main body of this paper, when discussing the criteria.
\else
in Section \ref{criteria}.
\fi
-Another useful example is the \href{https://github.com/benmarwick/1989-excavation-report-Madjedbebe/blob/master/Dockerfile}{\inlinecode{Dockerfile}} of \citeappendix{clarkso15} (published in June 2015) which starts with \inlinecode{FROM rocker/verse:3.3.2}.
+Another useful example is the \inlinecode{Dockerfile}\footnote{\inlinecode{\href{https://github.com/benmarwick/1989-excavation-report-Madjedbebe/blob/master/Dockerfile}{https://github.com/benmarwick/1989-excavation-report-}\\\href{https://github.com/benmarwick/1989-excavation-report-Madjedbebe/blob/master/Dockerfile}{Madjedbebe/blob/master/Dockerfile}}} of Clarkson et al.\citeappendix{clarkso15} (published in June 2015) which starts with \inlinecode{FROM rocker/verse:3.3.2}.
When we tried to build it (November 2020), we noticed that the core downloaded image (\inlinecode{rocker/verse:3.3.2}, with image ``digest'' \inlinecode{sha256:c136fb0dbab...}) was created in October 2018 (long after the publication of that paper).
In principle, it is possible to investigate the difference between this new image and the old one that the authors used, but that would require a lot of effort and may not be possible when the changes are not available in a third public repository or not under version control.
In Docker, it is possible to retrieve the precise Docker image with its digest, for example, \inlinecode{FROM ubuntu:16.04@sha256:XXXXXXX} (where \inlinecode{XXXXXXX} is the digest, uniquely identifying the core image to be used), but we have not seen this often done in existing examples of ``reproducible'' \inlinecode{Dockerfiles}.
@@ -111,7 +111,7 @@ ISO files are pre-built binary files with volumes of hundreds of megabytes and n
For example, the archives of Debian\footnote{\inlinecode{\url{https://cdimage.debian.org/mirror/cdimage/archive/}}} or Ubuntu\footnote{\inlinecode{\url{http://old-releases.ubuntu.com/releases}}} provide older ISO files.
The concept of containers (and the independent images that build them) can also be extended beyond just the software environment.
-For example, \citeappendix{lofstead19} propose a ``data pallet'' concept to containerize access to data and thus allow tracing data back to the application that produced them.
+For example, Lofstead et al.\citeappendix{lofstead19} propose a ``data pallet'' concept to containerize access to data and thus allow tracing data back to the application that produced them.
In summary, containers or VMs are just a built product themselves.
If they are built properly (for example building a Maneage'd project inside a Docker container), they can be useful for immediate usage and fast-moving of the project from one system to another.
@@ -145,7 +145,7 @@ Both are discussed in more detail below.
Package managers are the second component in any workflow that relies on containers or VMs for an independent environment, and the starting point in others that use the host's file system (as discussed above in Section \ref{appendix:independentenvironment}).
In this section, some common package managers are reviewed, in particular those that are most used by the reviewed reproducibility solutions of Appendix \ref{appendix:existingsolutions}.
-For a more comprehensive list of existing package managers, see \href{https://en.wikipedia.org/wiki/List_of_software_package_management_systems}{Wikipedia}.
+For a more comprehensive list of existing package managers, see Wikipedia\footnote{\inlinecode{\href{https://en.wikipedia.org/wiki/List\_of\_software\_package\_management\_systems}{https://en.wikipedia.org/wiki/List\_of\_software\_package\_}\\\href{https://en.wikipedia.org/wiki/List\_of\_software\_package\_management\_systems}{management\_systems}}}.
Note that we are not including package managers that are specific to one language, for example \inlinecode{pip} (for Python) or \inlinecode{tlmgr} (for \LaTeX).
\subsubsection{Operating system's package manager}
@@ -163,7 +163,7 @@ Requesting a special version of that special software does not fully address the
Hence a fixed version of the dependencies must also be specified.
In robust package managers like Debian's \inlinecode{apt} it is possible to fully control (and later reproduce) the built environment of a high-level software.
-Debian also archives all packaged high-level software in its Snapshot\footnote{\inlinecode{\url{https://snapshot.debian.org/}}} service since 2005 which can be used to build the higher-level software environment on an older OS \citeappendix{aissi20}.
+Debian also archives all packaged high-level software in its Snapshot\footnote{\inlinecode{\url{https://snapshot.debian.org/}}} service since 2005 which can be used to build the higher-level software environment on an older OS\citeappendix{aissi20}.
Therefore it is indeed theoretically possible to reproduce the software environment only using archived operating systems and their own package managers, but unfortunately, we have not seen it practiced in (reproducible) scientific papers/projects.
In summary, the host OS package managers are primarily meant for the low-level operating system components.
@@ -181,17 +181,17 @@ They can therefore only be run on GNU/Linux operating systems.
\subsubsection{Nix or GNU Guix}
\label{appendix:nixguix}
-Nix\footnote{\inlinecode{\url{https://nixos.org}}} \citeappendix{dolstra04} and GNU Guix\footnote{\inlinecode{\url{https://guix.gnu.org}}} \citeappendix{courtes15} are independent package managers that can be installed and used on GNU/Linux operating systems, and macOS (only for Nix, prior to macOS Catalina).
+Nix\footnote{\inlinecode{\url{https://nixos.org}}}\citeappendix{dolstra04} and GNU Guix\footnote{\inlinecode{\url{https://guix.gnu.org}}}\citeappendix{courtes15} are independent package managers that can be installed and used on GNU/Linux operating systems, and macOS (only for Nix, prior to macOS Catalina).
Both also have a fully functioning operating system based on their packages: NixOS and ``Guix System''.
GNU Guix is based on the same principles of Nix but implemented differently, so we focus the review here on Nix.
-The Nix approach to package management is unique in that it allows exact dependency tracking of all the dependencies, and allows for multiple versions of software, for more details see \citeappendix{dolstra04}.
+The Nix approach to package management is unique in that it allows exact dependency tracking of all the dependencies, and allows for multiple versions of software, for more details see Dolstra et al.\citeappendix{dolstra04}.
In summary, a unique hash is created from all the components that go into the building of the package (including the instructions on how to build the software).
That hash is then prefixed to the software's installation directory.
-As an example from \citeappendix{dolstra04}: if a certain build of GNU C Library 2.3.2 has a hash of \inlinecode{8d013ea878d0}, then it is installed under \inlinecode{/nix/store/8d013ea878d0-glibc-2.3.2} and all software that is compiled with it (and thus need it to run) will link to this unique address.
+As an example Dolstra et al.\citeappendix{dolstra04}: if a certain build of GNU C Library 2.3.2 has a hash of \inlinecode{8d013ea878d0}, then it is installed under \inlinecode{/nix/store/8d013ea878d0-glibc-2.3.2} and all software that is compiled with it (and thus need it to run) will link to this unique address.
This allows for multiple versions of the software to co-exist on the system, while keeping an accurate dependency tree.
-As mentioned in \citeappendix{courtes15}, one major caveat with using these package managers is that they require a daemon with root privileges (failing our completeness criteria).
+As mentioned in Court{\'e}s \& Wurmus\citeappendix{courtes15}, one major caveat with using these package managers is that they require a daemon with root privileges (failing our completeness criteria).
This is necessary ``to use the Linux kernel container facilities that allow it to isolate build processes and maximize build reproducibility''.
This is because the focus in Nix or Guix is to create bit-wise reproducible software binaries and this is necessary for the security or development perspectives.
However, in a non-computer-science analysis (for example natural sciences), the main aim is reproducible \emph{results} that can also be created with the same software version that may not be bit-wise identical (for example when they are installed in other locations, because the installation location is hard-coded in the software binary or for a different CPU architecture).
@@ -213,10 +213,10 @@ Conda is able to maintain an approximately independent environment on an operati
Conda tracks the dependencies of a package/environment through a YAML formatted file, where the necessary software and their acceptable versions are listed.
However, it is not possible to fix the versions of the dependencies through the YAML files alone.
This is thoroughly discussed under issue 787 (in May 2019) of \inlinecode{conda-forge}\footnote{\inlinecode{\url{https://github.com/conda-forge/conda-forge.github.io/issues/787}}}.
-In that discussion, the authors of \citeappendix{uhse19} report that the half-life of their environment (defined in a YAML file) is 3 months, and that at least one of their dependencies breaks shortly after this period.
-The main reply they got in the discussion is to build the Conda environment in a container, which is also the suggested solution by \citeappendix{gruning18}.
+In that Github discussion, the authors of Uhse et al.\citeappendix{uhse19} report that the half-life of their environment (defined in a YAML file) is 3 months, and that at least one of their dependencies breaks shortly after this period.
+The main reply they got in the discussion is to build the Conda environment in a container, which is also the suggested solution by Gr\"uning et al.\citeappendix{gruning18}.
However, as described in Appendix \ref{appendix:independentenvironment}, containers just hide the reproducibility problem, they do not fix it: containers are not static and need to evolve (i.e., get re-built) with the project.
-Given these limitations, \citeappendix{uhse19} are forced to host their conda-packaged software as tarballs on a separate repository.
+Given these limitations, Uhse et al.\citeappendix{uhse19} are forced to host their conda-packaged software as tarballs on a separate repository.
Conda installs with a shell script that contains a binary-blob (+500 megabytes, embedded in the shell script).
This is the first major issue with Conda: from the shell script, it is not clear what is in this binary blob and what it does.
@@ -252,7 +252,7 @@ As reviewed above, the low-level dependence of Conda on the host operating syste
However, these same factors are major caveats in a scientific scenario, where long-term archivability, readability, or usability are important. % alternative to `archivability`?
\subsubsection{Spack}
-Spack is a package manager that is also influenced by Nix (similar to GNU Guix), see \citeappendix{gamblin15}.
+Spack\citeappendix{gamblin15} is a package manager that is also influenced by Nix (similar to GNU Guix).
But unlike Nix or GNU Guix, it does not aim for full, bit-wise reproducibility and can be built without root access in any generic location.
It relies on the host operating system for the C library.
@@ -302,7 +302,7 @@ In this way, later processing stages can make sure that they can safely be used,
Solutions to keep track of a project's history have existed since the early days of software engineering in the 1970s and they have constantly improved over the last decades.
Today the distributed model of ``version control'' is the most common, where the full history of the project is stored locally on different systems and can easily be integrated.
There are many existing version control solutions, for example, CVS, SVN, Mercurial, GNU Bazaar, or GNU Arch.
-However, currently, Git is by far the most commonly used in individual projects, such that Software Heritage \citeappendix{dicosmo18} (an archival system aiming for long term preservation of software) is also modeled on Git.
+However, currently, Git is by far the most commonly used in individual projects, such that Software Heritage\citeappendix{dicosmo18} (an archival system aiming for long term preservation of software) is also modeled on Git.
Git is also the foundation upon which this paper's proof of concept (Maneage) is built.
Hence we will just review Git here, but the general concept of version control is the same in all implementations.
@@ -316,7 +316,7 @@ the figure on Git in the main body of the paper).
Figure \ref{fig:branching}).
\fi
For example \inlinecode{f4953cc\-f1ca8a\-33616ad\-602ddf\-4cd189\-c2eff97b} is a commit identifier in the Git history of this project.
-Through the content-based storage concept, similar hash structures can be used to identify data \citeappendix{hinsen20}.
+Through the content-based storage concept, similar hash structures can be used to identify data\citeappendix{hinsen20}.
Git commits are commonly summarized by the checksum's first few characters, for example, \inlinecode{f4953cc} of the example above.
With Git, making parallel ``branches'' (in the project's history) is very easy and its distributed nature greatly helps in the parallel development of a project by a team.
@@ -365,14 +365,14 @@ While it is not impossible, because of the high-level nature of scripts, it is n
\subsubsection{Make}
\label{appendix:make}
-Make was originally designed to address the problems mentioned above for scripts \citeappendix{feldman79}.
+Make was originally designed to address the problems mentioned above for scripts\citeappendix{feldman79}.
In particular, it was originally designed in the context of managing the compilation of software source code that are distributed in many files.
With Make, the source files of a program that have not been changed are not recompiled.
Moreover, when two source files do not depend on each other, and both need to be rebuilt, they can be built in parallel.
This was found to greatly help in debugging software projects, and in speeding up test builds, giving Make a core place in software development over the last 40 years.
The most common implementation of Make, since the early 1990s, is GNU Make.
-Make was also the framework used in the first attempts at reproducible scientific papers \cite{claerbout1992,schwab2000}.
+Make was also the framework used in the first attempts at reproducible scientific papers\cite{claerbout1992,schwab2000}.
Our proof-of-concept (Maneage) also uses Make to organize its workflow.
Here, we complement that section with more technical details on Make.
@@ -391,7 +391,7 @@ Going deeper into the syntax of Make is beyond the scope of this paper, but we r
\subsubsection{Snakemake}
\label{appendix:snakemake}
Snakemake is a Python-based workflow management system, inspired by GNU Make (discussed above).
-It is aimed at reproducible and scalable data analysis \citeappendix{koster12}\footnote{\inlinecode{\url{https://snakemake.readthedocs.io/en/stable}}}.
+It is aimed at reproducible and scalable data analysis\citeappendix{koster12}\footnote{\inlinecode{\url{https://snakemake.readthedocs.io/en/stable}}}.
It defines its own language to implement the ``rule'' concept of Make within Python.
Technically, using complex shell scripts (to call software in other languages) in each step will involve a lot of quotations that make the code hard to read and maintain.
It is therefore most useful for Python-based projects.
@@ -424,10 +424,10 @@ The former will conflict with other system tools that assume \inlinecode{python}
This can also be problematic when a Python analysis library, may require a Python version that conflicts with the running SCons.
\subsubsection{CGAT-core}
-CGAT-Core is a Python package for managing workflows, see \citeappendix{cribbs19}.
+CGAT-Core\citeappendix{cribbs19} is a Python package for managing workflows.
It wraps analysis steps in Python functions and uses Python decorators to track the dependencies between tasks.
-It is used in papers like \citeappendix{jones19}.
-However, as mentioned in \citeappendix{jones19} it is good for managing individual outputs (for example separate figures/tables in the paper, when they are fully created within Python).
+It is used in papers like Jones et al.\citeappendix{jones19}.
+However, as mentioned there it is primarily good for managing individual outputs (for example separate figures/tables in the paper, when they are fully created within Python).
Because it is primarily designed for Python tasks, managing a full workflow (which includes many more components, written in other languages) is not trivial.
Another drawback with this workflow manager is that Python is a very high-level language where future versions of the language may no longer be compatible with Python 3, that CGAT-core is implemented in (similar to how Python 2 programs are not compatible with Python 3).
@@ -438,7 +438,7 @@ Hence in the GWL paradigm, software installation and usage does not have to be s
GWL has two high-level concepts called ``processes'' and ``workflows'' where the latter defines how multiple processes should be executed together.
\subsubsection{Nextflow (2013)}
-Nextflow\footnote{\inlinecode{\url{https://www.nextflow.io}}} \citeappendix{tommaso17} workflow language with a command-line interface that is written in Java.
+Nextflow\footnote{\inlinecode{\url{https://www.nextflow.io}}} workflow language\citeappendix{tommaso17} with a command-line interface that is written in Java.
\subsubsection{Generic workflow specifications (CWL and WDL)}
\label{appendix:genericworkflows}
@@ -455,7 +455,7 @@ Each software/tool/paradigm has its own learning curve, which is not easy for a
Most workflow management tools and the reproducible workflow solutions that depend on them are, yet another language/paradigm that has to be mastered by researchers and thus a heavy burden.
Furthermore as shown above (and below) high-level tools will evolve very fast causing disruptions in the reproducible framework.
-A good example is Popper \citeappendix{jimenez17} which initially organized its workflow through the HashiCorp configuration language (HCL) because it was the default in GitHub.
+A good example is Popper\citeappendix{jimenez17} which initially organized its workflow through the HashiCorp configuration language (HCL) because it was the default in GitHub.
However, in September 2019, GitHub dropped HCL as its default configuration language, so Popper is now using its own custom YAML-based workflow language, see Appendix \ref{appendix:popper} for more on Popper.
@@ -492,7 +492,7 @@ In summary, IDEs are generally very specialized tools, for special projects and
\subsubsection{Jupyter}
\label{appendix:jupyter}
-Jupyter (initially IPython) \citeappendix{kluyver16} is an implementation of Literate Programming \citeappendix{knuth84}.
+Jupyter\citeappendix{kluyver16} (initially IPython) is an implementation of Literate Programming \citeappendix{knuth84}.
Jupyter's name is a combination of the three main languages it was designed for: Julia, Python, and R.
The main user interface is a web-based ``notebook'' that contains blobs of executable code and narrative.
Jupyter uses the custom built \inlinecode{.ipynb} format\footnote{\inlinecode{\url{https://nbformat.readthedocs.io/en/latest}}}.
@@ -514,7 +514,7 @@ Both are critical for scientific processing, especially the latter: when a web b
This is further exacerbated by the fact that binary data (for example images) are not directly supported in JSON and have to be converted into a much less memory-efficient textual encoding.
Finally, Jupyter has an extremely complex dependency graph: on a clean Debian 10 system, Pip (a Python package manager that is necessary for installing Jupyter) required 19 dependencies to install, and installing Jupyter within Pip needed 41 dependencies.
-\citeappendix{hinsen15} reported such conflicts when building Jupyter into the Active Papers framework (see Appendix \ref{appendix:activepapers}).
+Hinsen\citeappendix{hinsen15} reported such conflicts when building Jupyter into the Active Papers framework (see Appendix \ref{appendix:activepapers}).
However, the dependencies above are only on the server-side.
Since Jupyter is a web-based system, it requires many dependencies on the viewing/running browser also (for example special JavaScript or HTML5 features, which evolve very fast).
As discussed in Appendix \ref{appendix:highlevelinworkflow} having so many dependencies is a major caveat for any system regarding scientific/long-term reproducibility.
@@ -527,7 +527,7 @@ In summary, Jupyter is most useful in manual, interactive, and graphical operati
\subsection{Project management in high-level languages}
\label{appendix:highlevelinworkflow}
Currently, the most popular high-level data analysis language is Python.
-R is closely tracking it and has superseded Python in some fields, while Julia \citeappendix{bezanson17} is quickly gaining ground.
+R is closely tracking it and has superseded Python in some fields, while Julia\citeappendix{bezanson17} is quickly gaining ground.
These languages have themselves superseded previously popular languages for data analysis of the previous decades, for example, Java, Perl, or C++.
All are part of the C-family programming languages.
In many cases, this means that the language's execution environment are themselves written in C, which is the language of modern operating systems.
@@ -542,7 +542,7 @@ Because of their nature, higher-level languages evolve very fast, creating incom
The most prominent example is the transition from Python 2 (released in 2000) to Python 3 (released in 2008).
Python 3 was incompatible with Python 2 and it was decided to abandon the former by 2015.
However, due to community pressure, this was delayed to January 1st, 2020.
-The end-of-life of Python 2 caused many problems for projects that had invested heavily in Python 2: all their previous work had to be translated, for example, see \citeappendix{jenness17} or Appendix \ref{appendix:sciunit}.
+The end-of-life of Python 2 caused many problems for projects that had invested heavily in Python 2: all their previous work had to be translated, for example, see Jenness\citeappendix{jenness17} or Appendix \ref{appendix:sciunit}.
Some projects could not make this investment and their developers decided to stop maintaining it, for example VisTrails (see Appendix \ref{appendix:vistrails}).
The problems were not just limited to translation.
@@ -565,10 +565,11 @@ The evolution of high-level languages is extremely fast, even within one version
For example, packages that are written in Python 3 often only work with a special interval of Python 3 versions.
For example Snakemake and Occam which can only be run on Python versions 3.4 and 3.5 or newer respectively, see Appendices \ref{appendix:snakemake} and \ref{appendix:occam}.
This is not just limited to the core language, much faster changes occur in their higher-level libraries.
-For example version 1.9 of Numpy (Python's numerical analysis module) discontinued support for Numpy's predecessor (called Numeric), causing many problems for scientific users \citeappendix{hinsen15}.
+For example version 1.9 of Numpy (Python's numerical analysis module) discontinued support for Numpy's predecessor (called Numeric), causing many problems for scientific users\citeappendix{hinsen15}.
On the other hand, the dependency graph of tools written in high-level languages is often extremely complex.
-For example, see Figure 1 of \cite{alliez19}, it shows the dependencies and their inter-dependencies for Matplotlib (a popular plotting module in Python).
+For example, see Figure 1 of Alliez et al.\cite{alliez19}.
+It shows the dependencies and their inter-dependencies for Matplotlib (a popular plotting module in Python).
Acceptable version intervals between the dependencies will cause incompatibilities in a year or two, when a robust package manager is not used (see Appendix \ref{appendix:packagemanagement}).
Since a domain scientist does not always have the resources/knowledge to modify the conflicting part(s), many are forced to create complex environments with different versions of Python and pass the data between them (for example just to use the work of a previous PhD student in the team).
@@ -582,7 +583,7 @@ merely installing the Python installer (\inlinecode{pip}) on a Debian system (wi
As of this writing, the \inlinecode{pip3 install popper} and \inlinecode{pip2 install sciunit2} commands for installing each, required 17 and 26 Python modules as dependencies.
It is impossible to run either of these solutions if there is a single conflict in this very complex dependency graph.
This problem actually occurred while we were testing Sciunit: even though it was installed, it could not run because of conflicts (its last commit was only 1.5 years old), for more see Appendix \ref{appendix:sciunit}.
-\citeappendix{hinsen15} also report a similar problem when attempting to install Jupyter (see Appendix \ref{appendix:editors}).
+Hinsen\citeappendix{hinsen15} also report a similar problem when attempting to install Jupyter (see Appendix \ref{appendix:editors}).
Of course, this also applies to tools that these systems use, for example Conda (which is also written in Python, see Appendix \ref{appendix:packagemanagement}).
\subsubsection{Generational gap}