aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.md38
-rw-r--r--paper.tex10
-rw-r--r--reproduce/analysis/config/metadata-common.conf16
-rw-r--r--reproduce/analysis/config/verify-outputs.conf11
-rw-r--r--reproduce/analysis/make/demo-plot.mk35
-rw-r--r--reproduce/analysis/make/initialize.mk37
-rw-r--r--reproduce/analysis/make/verify.mk8
-rw-r--r--tex/src/figure-data-lineage.tex2
-rw-r--r--tex/src/figure-tools-per-year.tex4
9 files changed, 117 insertions, 44 deletions
diff --git a/README.md b/README.md
index 7216f1f..91d5527 100644
--- a/README.md
+++ b/README.md
@@ -1,12 +1,13 @@
-Reproducible source for XXXXXXXXXXXXXXXXX
-=========================================
+Reproducible source for paper introducing Maneage (MANaging data linEAGE)
+-------------------------------------------------------------------------
Copyright (C) 2018-2020 Mohammad Akhlaghi <mohammad@akhlaghi.org>\
See the end of the file for license conditions.
-This is the reproducible project source for the paper titled "**XXX XXXXX
-XXXXXX**", by XXXXX XXXXXX, YYYYYY YYYYY and ZZZZZZ ZZZZZ that is published
-in XXXXX XXXXX.
+This is the reproducible project source for the paper titled "**Towards
+Long-term and Archivable Reproducibility**", by Mohammad Akhlaghi, Raúl
+Infante-Sainz, Boudewijn F. Roukema, David Valls-Gabaud, Roberto
+Baena-Gallé.
To reproduce the results and final paper, the only dependency is a minimal
Unix-based building environment including a C compiler (already available
@@ -18,8 +19,8 @@ button to download a compressed tarball of the project). If you have
received this source from arXiv, please see the respective section below.
```shell
-$ git clone XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
-$ cd XXXXXXXXXXXXXXXXXX
+$ git clone https://gitlab.com/makhlaghi/maneage-paper
+$ cd maneage-paper
$ ./project configure
$ ./project make
```
@@ -34,8 +35,7 @@ https://maneage.org.
-Building the project
---------------------
+### Building the project
This project was designed to have as few dependencies as possible without
requiring root/administrator permissions.
@@ -52,14 +52,11 @@ requiring root/administrator permissions.
a directory given at configuration time), they will be
used. Otherwise, a downloader (`wget` or `curl`) will be necessary
to download any necessary tarball. The necessary tarballs are also
- collected in the archived project on Zenodo (link below) [[TO
- AUTHORS: UPLOAD THE SOFTWARE TARBALLS WITH YOUR DATA AND PROJECT
- SOURCE TO ZENODO OR OTHER SIMILAR SERVICES. THEN ADD THE DOI/LINK
- HERE.DON'T FORGET THAT THE SOFTWARE ARE A CRITICAL PART OF YOUR
- WORK.]]. Just unpack that tarball, and when `./project configure`
- asks for the "software tarball directory", give the address of the
- unpacked directory that has all the tarballs.
- https://doi.org/10.5281/zenodo.3408481
+ collected in the archived project on Zenodo (link below). Just
+ unpack that tarball, and when `./project configure` asks for the
+ "software tarball directory", give the address of the unpacked
+ directory that has all the tarballs.
+ https://doi.org/10.5281/zenodo.3872248
2. Configure the environment (top-level directories in particular) and
build all the necessary software for use in the next step. It is
@@ -86,8 +83,8 @@ requiring root/administrator permissions.
-Source from arXiv
------------------
+### Source from arXiv
+
If the paper is also published on arXiv, it is highly likely that the
authors also uploaded/published the full project there along with the LaTeX
sources. If you have downloaded (or plan to download) this source from
@@ -155,8 +152,7 @@ arXiv, some minor extra steps are necessary:
-Copyright information
----------------------
+### Copyright information
This file and `.file-metadata` (a binary file, used by Metastore to store
file dates when doing Git checkouts) are part of the reproducible project
diff --git a/paper.tex b/paper.tex
index 685849f..fff0e41 100644
--- a/paper.tex
+++ b/paper.tex
@@ -27,7 +27,7 @@
\input{tex/src/preamble-pgfplots.tex}
%% Title and author names.
-\title{Towards Long-term and Archivable Reproducibility}
+\title{\projecttitle}
\author{
Mohammad~Akhlaghi,
Ra\'ul Infante-Sainz,
@@ -70,9 +70,12 @@
%% CONCLUSION
We show that requiring longevity of a reproducible workflow solution is realistic, and discuss the benefits of the criteria for scientific progress, but also immediate benefits for short-term reproducibility.
This paper has itself been written in Maneage, with snapshot \projectversion.
+
+ \vspace{3mm}
+ \emph{Reproducible supplement} --- \href{https://doi.org/10.5281/zenodo.3872248}{\texttt{Zenodo.3872248}}.
\end{abstract}
-% Note that keywords are not normally used for peerreview papers.
+% Note that keywords are not normally used for peer-review papers.
\begin{IEEEkeywords}
Data Lineage, Provenance, Reproducibility, Scientific Pipelines, Workflows
\end{IEEEkeywords}
@@ -82,6 +85,8 @@ Data Lineage, Provenance, Reproducibility, Scientific Pipelines, Workflows
+
+
% For peer review papers, you can put extra information on the cover
% page as needed:
% \ifCLASSOPTIONpeerreview
@@ -293,6 +298,7 @@ Figure \ref{fig:datalineage} (bottom) is the data lineage graph that produced it
For example, \inlinecode{paper.pdf} depends on \inlinecode{project.tex} (in the build directory; generated automatically) and \inlinecode{paper.tex} (in the source directory; written manually).
The solid arrows and full-opacity built boxes correspond to this paper.
The dashed arrows and low-opacity built boxes show the scalability by adding hypothetical steps to the project.
+ The underlying data of the top plot is available at \href{https://zenodo.org/record/3872248/files/tools-per-year.txt}{zenodo.3872248/tools-per-year.txt}.
}
\end{figure*}
diff --git a/reproduce/analysis/config/metadata-common.conf b/reproduce/analysis/config/metadata-common.conf
new file mode 100644
index 0000000..7bc9fa5
--- /dev/null
+++ b/reproduce/analysis/config/metadata-common.conf
@@ -0,0 +1,16 @@
+# Metadata parameters that can be used in
+
+# Project information
+metadata-title = Towards Long-term and Archivable Reproducibility
+
+# DOIs and identifiers.
+metadata-arxiv =
+metadata-doi-zenodo = https://doi.org/10.5281/zenodo.3872248
+metadata-doi-journal =
+metadata-doi = $(metadata-doi-zenodo)
+metadata-git-repository = https://gitlab.com/makhlaghi/maneage-paper
+
+# Copyright and identifier.
+metadata-copyright-owner = Mohammad Akhlaghi <mohammad@akhlaghi.org>
+metadata-copyright = Creative Commons Attribution-ShareAlike (CC BY-SA)
+metadata-copyright-url = https://creativecommons.org/licenses/by-sa/4.0
diff --git a/reproduce/analysis/config/verify-outputs.conf b/reproduce/analysis/config/verify-outputs.conf
index e4ef479..c9287e8 100644
--- a/reproduce/analysis/config/verify-outputs.conf
+++ b/reproduce/analysis/config/verify-outputs.conf
@@ -1,2 +1,9 @@
-# To enable verification of output datasets set this variable to yes
-verify-outputs =
+# To enable verification of output datasets set this variable to 'yes'.
+#
+# Copyright (C) 2019-2020 Mohammad Akhlaghi <mohammad@akhlaghi.org>
+#
+# Copying and distribution of this file, with or without modification, are
+# permitted in any medium without royalty provided the copyright notice and
+# this notice are preserved. This file is offered as-is, without any
+# warranty.
+verify-outputs = yes
diff --git a/reproduce/analysis/make/demo-plot.mk b/reproduce/analysis/make/demo-plot.mk
index c14b83d..a149040 100644
--- a/reproduce/analysis/make/demo-plot.mk
+++ b/reproduce/analysis/make/demo-plot.mk
@@ -18,7 +18,7 @@
# Directory to host outputs
# -------------------------
-a2dir = $(texdir)/tools-per-year
+a2dir = $(texdir)/to-publish
$(a2dir):; mkdir $@
@@ -27,7 +27,7 @@ $(a2dir):; mkdir $@
# Table for Figure 1C of Menke+20
# -------------------------------
-a2mk20f1c = $(a2dir)/columns.txt
+a2mk20f1c = $(a2dir)/tools-per-year.txt
$(a2mk20f1c): $(mk20tab3) | $(a2dir)
# Remove the (possibly) produced figure that is created from this
@@ -35,12 +35,37 @@ $(a2mk20f1c): $(mk20tab3) | $(a2dir)
# multiple files with a fixed prefix.
rm -f $(tikzdir)/figure-tools-per-year*
+ # Write the column metadata in a temporary file name (appending
+ # '.tmp' to the actual target name). Once all steps are done, it is
+ # renamed to the final target. We do this because if there is an
+ # error in the middle, Make will not consider the job to be
+ # complete and will stop here.
+ echo "# Data of plot showing fraction of papers that mentioned software tools" > $@.tmp
+ echo "# per year to demonstrate the features of Maneage (MANaging data linEAGE)." >> $@.tmp
+ >> $@.tmp
+ echo "# Raw data taken from Menke+2020 (https://doi.org/10.1101/2020.01.15.908111)." \
+ >> $@.tmp
+ echo "# " >> $@.tmp
+ echo "# Column 1: YEAR [count, u16] Publication year of papers." \
+ >> $@.tmp
+ echo "# Column 2: WITH_TOOLS [frac, f32] Fraction of papers mentioning software tools." \
+ >> $@.tmp
+ echo "# Column 3: NUM_PAPERS [count, u32] Total number of papers studied in that year." \
+ >> $@.tmp
+ echo "# " >> $@.tmp
+ $(call print-copyright, $@.tmp)
+
+
# Find the maximum number of papers.
awk '!/^#/{all[$$1]+=$$2; id[$$1]+=$$3} \
END{ for(year in all) \
- print year, 100*id[year]/all[year], all[year] \
+ printf("%-7d%-10.3f%d\n", year, 100*id[year]/all[year], \
+ all[year]) \
}' $< \
- > $@
+ >> $@.tmp
+
+ # Write it into the final target
+ mv $@.tmp $@
@@ -50,7 +75,7 @@ $(a2mk20f1c): $(mk20tab3) | $(a2dir)
$(mtexdir)/demo-plot.tex: $(a2mk20f1c) $(pconfdir)/demo-year.conf
# Find the first year (first column of first row) of data.
- v=$$(awk 'NR==1{print $$1}' $(a2mk20f1c))
+ v=$$(awk '!/^#/ && c==0{c++; print $$1}' $(a2mk20f1c))
echo "\newcommand{\menkefirstyear}{$$v}" > $@
# Find the number of rows in the plotted table.
diff --git a/reproduce/analysis/make/initialize.mk b/reproduce/analysis/make/initialize.mk
index fe9c103..b0701f4 100644
--- a/reproduce/analysis/make/initialize.mk
+++ b/reproduce/analysis/make/initialize.mk
@@ -213,8 +213,9 @@ $(lockdir): | $(BDIR); mkdir $@
# we want to ensure that the file is always built in every run: it contains
# the project version which may change between two separate runs, even when
# no file actually differs.
-packagebasename := $(shell if [ -d .git ]; then \
- echo paper-$$(git describe --dirty --always --long); else echo NOGIT; fi)
+project-commit-hash := $(shell if [ -d .git ]; then \
+ echo $$(git describe --dirty --always --long); else echo NOGIT; fi)
+packagebasename := paper-$(project-commit-hash)
packagecontents = $(texdir)/$(packagebasename)
.PHONY: all clean dist dist-zip distclean clean-mmap $(packagecontents) \
$(mtexdir)/initialize.tex
@@ -373,6 +374,31 @@ dist-zip: $(packagecontents)
+# Print Copyright statement
+# -------------------------
+#
+# This statement can be used in published datasets that are in plain-text
+# format. It assumes you have already put the data-specific statements in
+# its first argument, it will supplement them with general project links.
+print-copyright = \
+ echo "\# Project title: $(metadata-title)" >> $(1); \
+ echo "\# Git commit (that produced this dataset): $(packagebasename)" >> $(1); \
+ echo "\# Project's Git repository: $(metadata-git-repository)" >> $(1); \
+ if [ x$(metadata-arxiv) != x ]; then echo "\# arXiv:$(metadata-arxiv)" >> $(1); fi; \
+ if [ x$(metadata-doi-journal) != x ]; then \
+ echo "\# DOI (Journal): $(metadata-doi-journal)" >> $(1); fi; \
+ if [ x$(metadata-doi-zenodo) != x ]; then \
+ echo "\# DOI (Zenodo): $(metadata-doi-zenodo)" >> $(1); fi; \
+ echo "\#" >> $(1); \
+ echo "\# Copyright (C) $$(date +%Y) $(metadata-copyright-owner)" >> $(1); \
+ echo "\# Dataset is available under $(metadata-copyright)." >> $(1); \
+ echo "\# License URL: $(metadata-copyright-url)" >> $(1);
+
+
+
+
+
+
# Project initialization results
# ------------------------------
#
@@ -381,8 +407,5 @@ dist-zip: $(packagecontents)
# calculated everytime the project is run. So even though this file
# actually exists, it is also aded as a `.PHONY' target above.
$(mtexdir)/initialize.tex: | $(mtexdir)
-
- # Version of the project.
- @if [ -d .git ]; then v=$$(git describe --dirty --always --long);
- else v=NO-GIT; fi
- echo "\newcommand{\projectversion}{$$v}" > $@
+ echo "\newcommand{\projecttitle}{$(metadata-title)}" > $@
+ echo "\newcommand{\projectversion}{$(project-commit-hash)}" >> $@
diff --git a/reproduce/analysis/make/verify.mk b/reproduce/analysis/make/verify.mk
index 088b3b3..1573920 100644
--- a/reproduce/analysis/make/verify.mk
+++ b/reproduce/analysis/make/verify.mk
@@ -107,14 +107,14 @@ $(mtexdir)/verify.tex: $(foreach s, $(verify-dep), $(mtexdir)/$(s).tex)
# Verify the figure datasets.
$(call verify-txt-no-comments-leading-space, \
- $(delete-num), ad345e873e6af577f0e4e7c8942cdf08)
- $(call verify-txt-no-comments-leading-space, \
- $(delete-histogram), 12a81c4c8c5f552e5ed5686453587fe8)
+ $(a2mk20f1c), 76fc5b13495c4d8e8e6f8d440304cf69)
# Verify TeX macros (the values that go into the PDF text).
for m in $(verify-check); do
file=$(mtexdir)/$$m.tex
- if [ $$m == download ]; then s=XXXXX
+ if [ $$m == download ]; then s=64da83ee3bfaa236849927cdc001f5d3
+ elif [ $$m == format ]; then s=e04d95a539b5540c940bf48994d8d45f
+ elif [ $$m == demo-plot ]; then s=2504472bd2b3f60b5a26c5f2a3a67251
else echo; echo "'$$m' not recognized."; exit 1
fi
$(call verify-txt-no-comments-leading-space, $$file, $$s)
diff --git a/tex/src/figure-data-lineage.tex b/tex/src/figure-data-lineage.tex
index fcc52d9..21c84f3 100644
--- a/tex/src/figure-data-lineage.tex
+++ b/tex/src/figure-data-lineage.tex
@@ -177,7 +177,7 @@
\ifdefined\outtwob
\node (menkedemoyear) [node-nonterminal, at={(2.67cm,4.6cm)}] {demo-year.conf};
\node (a2tex-west) [node-point, at={(1.27cm,-0.8cm)}] {};
- \node (out2b) [node-terminal, at={(2.67cm,0.3cm)}] {columns.txt};
+ \node (out2b) [node-terminal, at={(2.67cm,0.3cm)}] {tools-per-\\year.txt};
\draw [->] (out2b) -- (a2tex);
\draw [->,rounded corners] (menkedemoyear.west) -| (a2tex-west) |- (a2tex);
\fi
diff --git a/tex/src/figure-tools-per-year.tex b/tex/src/figure-tools-per-year.tex
index 240ac27..e235424 100644
--- a/tex/src/figure-tools-per-year.tex
+++ b/tex/src/figure-tools-per-year.tex
@@ -14,7 +14,7 @@
%% Linear plot, showing the number of papers mentioning tools.
\addplot+ [mark=none, very thick, green!60!black]
- table {tex/build/tools-per-year/columns.txt};
+ table {tex/build/to-publish/tools-per-year.txt};
\end{axis}
%% Add the right-side Y axis.
@@ -29,6 +29,6 @@
max space between ticks=20,
]
\addplot+ [ybar, mark=none, fill=red!50!white, red, opacity=0.25]
- table [x index=0, y index=2] {tex/build/tools-per-year/columns.txt};
+ table [x index=0, y index=2] {tex/build/to-publish/tools-per-year.txt};
\end{axis}
\end{tikzpicture}