aboutsummaryrefslogtreecommitdiff
path: root/tex/src/appendix-existing-solutions.tex
diff options
context:
space:
mode:
Diffstat (limited to 'tex/src/appendix-existing-solutions.tex')
-rw-r--r--tex/src/appendix-existing-solutions.tex26
1 files changed, 12 insertions, 14 deletions
diff --git a/tex/src/appendix-existing-solutions.tex b/tex/src/appendix-existing-solutions.tex
index 56056be..5166703 100644
--- a/tex/src/appendix-existing-solutions.tex
+++ b/tex/src/appendix-existing-solutions.tex
@@ -229,18 +229,17 @@ For example the very large cost of maintaining such a system, being based on a g
The IPOL journal\footnote{\inlinecode{\url{https://www.ipol.im}}} \citeappendix{limare11} (first published article in July 2010) publishes papers on image processing algorithms as well as the the full code of the proposed algorithm.
An IPOL paper is a traditional research paper, but with a focus on implementation.
The published narrative description of the algorithm must be detailed to a level that any specialist can implement it in their own programming language (extremely detailed).
-The author's own implementation of the algorithm is also published with the paper (in C, C++, or MATLAB), the code must be commented well enough and link each part of it with the relevant part of the paper.
-The authors must also submit several example of datasets that show the applicability of their proposed algorithm.
+The author's own implementation of the algorithm is also published with the paper (in C, C++ or MATLAB/Octave and recently Python), the code can only have a very limited set of external dependencies (with pre-defined versions), must be commented well enough, and link each part of it with the relevant part of the paper.
+The authors must also submit several example datasets that show the applicability of their proposed algorithm.
The referee is expected to inspect the code and narrative, confirming that they match with each other, and with the stated conclusions of the published paper.
After publication, each paper also has a ``demo'' button on its web page, allowing readers to try the algorithm on a web-interface and even provide their own input.
IPOL has grown steadily over the last 10 years, publishing 23 research articles in 2019.
We encourage the reader to visit its web page and see some of its recent papers and their demos.
The reason it can be so thorough and complete is its very narrow scope (low-level image processing algorithms), where the published algorithms are highly atomic, not needing significant dependencies (beyond input/output of well-known formats), allowing the referees and readers to go deeply into each implemented algorithm.
-In fact, high-level languages like Perl, Python, or Java are not acceptable in IPOL precisely because of the additional complexities, such as the dependencies that they require.
However, many data-intensive projects commonly involve dozens of high-level dependencies, with large and complex data formats and analysis, so while it is modular (a single module, doing a very specific thing) this solution is not scalable.
-Furthermore, by not publishing/archiving each paper's version control history or directly linking the analysis and produced paper, it fails criteria 6 and 7.
+Furthermore, by not publishing/archiving each paper's version controlled history or directly linking the analysis and produced paper, it fails criteria 6 and 7.
Note that on the web page, it is possible to change parameters, but that will not affect the produced PDF.
A paper written in Maneage (the proof-of-concept solution presented in this paper) could be scrutinized at a similar detailed level to IPOL, but for much more complex research scenarios, involving hundreds of dependencies and complex processing of the data.
@@ -265,29 +264,28 @@ It then uses selection and rejection algorithms to find the best components usin
Active Papers\footnote{\inlinecode{\url{http://www.activepapers.org}}} attempts to package the code and data of a project into one file (in HDF5 format).
It was initially written in Java because its compiled byte-code outputs in JVM are portable on any machine \citeappendix{hinsen11}.
However, Java is not a commonly used platform today, hence it was later implemented in Python \citeappendix{hinsen15}.
+Dependence on high-level platforms (Java or Python) is therefore a fundamental issue.
In the Python version, all processing steps and input data (or references to them) are stored in an HDF5 file.
%However, it can only account for pure-Python packages using the host operating system's Python modules \tonote{confirm this!}.
When the Python module contains a component written in other languages (mostly C or C++), it needs to be an external dependency to the Active Paper.
As mentioned in \citeappendix{hinsen15}, the fact that it relies on HDF5 is a caveat of Active Papers, because many tools are necessary to merely open it.
-Downloading the pre-built ``HDF View'' binaries (a GUI browser of HDF5 files that is provided by the HDF group) is not possible anonymously/automatically (login is required).
-Installing it using the Debian or Arch Linux package managers also failed due to dependencies in our trials.
-Furthermore, as a high-level data format HDF5 evolves very fast, for example HDF5 1.12.0 (February 29th, 2020) is not usable with older libraries provided by the HDF5 team. % maybe replace with: February 29\textsuperscript{th}, 2020?
+Downloading the pre-built ``HDF View'' binaries (a GUI browser of HDF5 files that is provided by the HDF group) is not possible anonymously/automatically: as of January 2021 login is required\footnote{\inlinecode{\url{https://www.hdfgroup.org/downloads/hdfview}}} (this was not the case when Active Papers moved to HDF5).
+% From K. Hinsen in a private email to M. Akhlaghi: This is true today, but wasn't when I started ActivePapers. Otherwise I'd never have built on HDF5.
+Installing HDF View using the Debian or Arch Linux package managers also failed due to dependencies in our trials.
+Furthermore, like most high-level tools, the HDF5 library evolves very fast: on its webpage (from April 2021), it says ``Applications that were created with earlier HDF5 releases may not compile with 1.12 by default''.
While data and code are indeed fundamentally similar concepts technically\citeappendix{hinsen16}, they are used by humans differently.
The hand-written code of a large project involving Terabytes of data can be 100 kilo bytes.
-When the two are bundled together, merely seeing one line of the code, requires downloading Terabytes volume that is not needed, this was also acknowledged in \citeappendix{hinsen15}.
+When the two are bundled together in one remote file, merely seeing one line of the code, requires downloading Terabytes volume that is not needed, this was also acknowledged in \citeappendix{hinsen15}.
It may also happen that the data are proprietary (for example medical patient data).
In such cases, the data must not be publicly released, but the methods that were applied to them can.
-Furthermore, since all reading and writing is done in the HDF5 file, it can easily bloat the file to very large sizes due to temporary files.
+
+Furthermore, since all reading and writing is currently done in the HDF5 file, it can easily bloat the file to very large sizes due to temporary files.
These files can later be removed as part of the analysis, but this makes the code more complicated and hard to read/maintain.
For example the Active Papers HDF5 file of \citeappendix[in \href{https://doi.org/10.5281/zenodo.2549987}{zenodo.2549987}]{kneller19} is 1.8 giga-bytes.
-
-In many scenarios, peers just want to inspect the processing by reading the code and checking a very special part of it (one or two lines, just to see the option values to one step for example).
-They do not necessarily need to run it or obtain the output the datasets (which may be published elsewhere).
-Hence the extra volume for data and obscure HDF5 format that needs special tools for reading its plain-text internals is an issue.
-
+This is not a fundamental feature of the approach and is just how this prototype was implemented and can be improved in the future.