aboutsummaryrefslogtreecommitdiff
path: root/tex
diff options
context:
space:
mode:
Diffstat (limited to 'tex')
-rw-r--r--tex/src/appendix-existing-tools.tex16
1 files changed, 8 insertions, 8 deletions
diff --git a/tex/src/appendix-existing-tools.tex b/tex/src/appendix-existing-tools.tex
index f22b4ba..885de8b 100644
--- a/tex/src/appendix-existing-tools.tex
+++ b/tex/src/appendix-existing-tools.tex
@@ -396,7 +396,7 @@ For example, it is first necessary to download a dataset and do some preparation
Each one of these is a logically independent step, which needs to be run before/after the others in a specific order.
Hence job management is a critical component of a research project.
-There are many tools for managing the sequence of jobs, below we review the most common ones that are also used the existing reproducibility solutions of Appendix \ref{appendix:existingsolutions} and Maneage.
+There are many tools for managing the sequence of jobs, below we review the most common ones that are also used in the existing reproducibility solutions of Appendix \ref{appendix:existingsolutions} and Maneage.
\subsubsection{Manual operation with narrative}
\label{appendix:manual}
@@ -514,7 +514,7 @@ With these standards, ideally, translators can be written between the various wo
In conclusion, shell scripts and Make are very common and extensively used by users of Unix-based OSs (which are most commonly used for computations).
They have also existed for several decades and are robust and mature.
-Many researchers that use heavy computations are also already familiar with them and have already used them already (to different levels).
+Many researchers that use heavy computations are also already familiar with them and have used them (to different levels).
As we demonstrated above in this appendix, the list of necessary tools for the various stages of a research project (an independent environment, package managers, job organizers, analysis languages, writing formats, editors, etc) is already very large.
Each software/tool/paradigm has its own learning curve, which is not easy for a natural or social scientist for example (who need to put their primary focus on their own scientific domain).
Most workflow management tools and the reproducible workflow solutions that depend on them are, yet another language/paradigm that has to be mastered by researchers and thus a heavy burden.
@@ -536,9 +536,9 @@ Here we review some common methods that are currently used.
\subsubsection{Text editors}
The most basic way to edit text files is through simple text editors which just allow viewing and editing such files, for example, \inlinecode{gedit} on the GNOME graphic user interface.
-However, working with simple plain text editors like \inlinecode{gedit} can be very frustrating since its necessary to save the file, then go to a terminal emulator and execute the source files.
+However, working with simple plain text editors like \inlinecode{gedit} can be very frustrating since it is necessary to save the file, then go to a terminal emulator and execute the source files.
To solve this problem there are advanced text editors like GNU Emacs that allow direct execution of the script, or access to a terminal within the text editor.
-However, editors that can execute or debug the source (like GNU Emacs), just run external programs for these jobs (for example GNU GCC, or GNU GDB), just as if those programs was called from outside the editor.
+However, editors that can execute or debug the source (like GNU Emacs), just run external programs for these jobs (for example GNU GCC, or GNU GDB), just as if those programs were called from outside the editor.
With text editors, the final edited file is independent of the actual editor and can be further edited with another editor, or executed without it.
This is a very important feature and corresponds to the modularity criterion of this paper.
@@ -547,7 +547,7 @@ Another very important advantage of advanced text editors like GNU Emacs or Vi(m
This feature is critical when working on remote systems, in particular high performance computing (HPC) facilities that do not provide a graphic user interface.
Also, the commonly used minimalistic containers do not include a graphic user interface.
Hence by default all Maneage'd projects also build the simple GNU Nano plain-text editor as part of the project (to be able to edit the source directly within a minimal environment).
-Maneage can also also optinally build GNU Emacs or Vim, but its up to the user to build them (same as their high-level science software).
+Maneage can also also optionally build GNU Emacs or Vim, but it is up to the user to build them (same as their high-level science software).
\subsubsection{Integrated Development Environments (IDEs)}
To facilitate the development of source code in special programming languages, IDEs add software building and running environments as well as debugging tools to a plain text editor.
@@ -637,10 +637,10 @@ For example, see Figure 1 of Alliez et al.\cite{alliez19}.
It shows the dependencies and their inter-dependencies for Matplotlib (a popular plotting module in Python).
Acceptable version intervals between the dependencies will cause incompatibilities in a year or two, when a robust package manager is not used (see Appendix \ref{appendix:packagemanagement}).
-Since a domain scientist does not always have the resources/knowledge to modify the conflicting part(s), many are forced to create complex environments with different versions of Python and pass the data between them (for example just to use the work of a previous PhD student in the team).
-This greatly increases the complexity of the project, even for the principal author.
+Since a domain scientist does not always have the resources/knowledge to modify the conflicting part(s), many are forced to create complex environments, with different versions of Python (sometimes on different computers), and pass the data between them (for example just to use the work of a previous PhD student in the team).
+This greatly increases the complexity/cost of the project, even for the principal author.
A well-designed reproducible workflow like Maneage that has no dependencies beyond a C compiler in a Unix-like operating system can account for this.
-However, when the actual workflow system (not the analysis software) is written in a high-level language like the examples above, this will cause problems.
+However, when the actual workflow system (not the analysis software) is written in a high-level language like the examples above, the complex dependencies of the workflow itself will inevitably cause bootstrapping problems in the future.
Another relevant example of the dependency hell is the following: installing the Python installer (\inlinecode{pip}) on a Debian system (with \inlinecode{apt install pip2} for Python 2 packages) required 32 other packages as dependencies.
\inlinecode{pip} is necessary to install Popper and Sciunit (Appendices \ref{appendix:popper} and \ref{appendix:sciunit}).