aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMohammad Akhlaghi <mohammad@akhlaghi.org>2021-04-17 03:35:49 +0100
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2021-04-17 03:35:49 +0100
commit30bf4624adf40e9611ad8f6a0214e725b2ea88af (patch)
tree82a52ef46e439a4632c8dbebc21f863453abce76
parent566190ab49556c211c1ddf90d9ac2314c29d7800 (diff)
Finally published journal DOI added
In the project's 'metadata.conf', we also have an option to store the journal DOI of the project (that will later be printed in the output file products). So now that the paper's DOI has been set by the journal, it was time to add it in the project too. While looking at the usage of the metadata, I noticed that the "Publication checklist" of 'README-hacking.md' didn't talk about it. In fact, the part about putting metadata went into a lot of detail without even mentioning the generic 'print-general-metadata' variable (previously called 'print-copyright') that is created in 'initialize.mk'. So I removed those extra points and just recommended using this variable for plain-text files and putting similar info in other formats. Some other minor changes were made: - The metadata now doesn't need the fixed 'https://doi.org/' prefix (to make it consistent with the arXiv identifier). Inside 'initialize.mk', there are now two variables called 'doi-prefix-url' and 'arxiv-prefix-url' that contain the fixed prefix. - The 'print-copyright' name was clearly outdated for all the extra metadata that this variable created (including the copyright). So its name was changed to 'print-general-metadata'. The generic Maneage changes will be taken into Maneage after this (they were tested here).
-rw-r--r--README-hacking.md125
-rw-r--r--reproduce/analysis/config/metadata.conf8
-rw-r--r--reproduce/analysis/make/demo-plot.mk2
-rw-r--r--reproduce/analysis/make/initialize.mk10
4 files changed, 72 insertions, 73 deletions
diff --git a/README-hacking.md b/README-hacking.md
index 475f2ca..92c878e 100644
--- a/README-hacking.md
+++ b/README-hacking.md
@@ -945,14 +945,14 @@ effectively no cost in keeping multiple redundancies on different servers,
just in case one (or more) of them are discontinued in the (near/far)
future.
- - **Reserve a DOI for your dataset**: There are multiple data servers that
- give this functionality, one of the most well known and (currently!)
- well-funded is [Zenodo](https://zenodo.org) so we'll focus on it
- here. Of course, you can use any other service that provides a similar
- functionality. Once you complete these steps, you can start using/citing
- your dataset's DOI in the source of your project to finalize the rest of
- the points. With Zenodo, you can even use the given identifier
- for things like downloading.
+ - **Reserve a DOI for your datasets**: There are multiple data servers
+ that give this functionality, one of the most well known and
+ (currently!) well-funded is [Zenodo](https://zenodo.org) so we'll focus
+ on it here. Of course, you can use any other service that provides a
+ similar functionality. Once you complete these steps, you can start
+ using/citing your dataset's DOI in the source of your project to
+ finalize the rest of the points. With Zenodo, you can even use the given
+ identifier for things like downloading.
* *Start new upload*: After you log in to Zenodo, you can start a new
upload by clicking on the "New Upload button".
@@ -961,16 +961,23 @@ future.
Identifier", click on the "Reserve DOI" button.
* *Fill basic info*: You need to at least fill in the "required fields"
- (marked with a red star). You will always be able to change any
- metadata (even after you "Publish"), so don't worry too much about
- values in the fields, at this phase, its just important that they
- are not empty.
+ (marked with a red star). You will always be able to change any
+ metadata (even after you "Publish"), so don't worry too much about
+ values in the fields, at this phase, its just important that they are
+ not empty.
* *Save your project but do not yet publish*: Press the "Save" button
- (at the top or bottom of the page). Do not yet press "Publish"
- though, since that would make the project public, and freeze the DOI
- with any possible file you may have uploaded already. We will get to
- the publication phase in the next steps.
+ (at the top or bottom of the page). Do not yet press "Publish" though,
+ since that would make the project public, and freeze the DOI with any
+ possible file you may have uploaded already. We will get to the
+ publication phase in the next steps.
+
+ - **Record the metadata**: Maneage comes with a file to store all the
+ project's metadata: `reproduce/analysis/config/metadata.conf`. Open this
+ file and store all the information that you currently have: for example
+ the Zenodo DOI, project's Git repository, Copyright owner and license of
+ the data after it becomes public. Keep the empty fields in mind and
+ after obtaining them, don't forget to fill them up.
- **Request archival on SoftwareHeritage**: [Software
Heritage](https://archive.softwareheritage.org/save/) is an online
@@ -989,7 +996,7 @@ future.
- **Zenodo/SoftwareHeritage links in paper**: put links to the Zenodo-DOI
(and SoftwareHeritage source when you make it public) in your
- paper. Somewhere close the start, maybe under the keywords/abstract,
+ paper. Somewhere close to the start, maybe under the keywords/abstract,
highlighting that they are supplements for reproducibility. These help
readers easily access these resources for supplementary material
directly from your PDF paper (sources on SoftwareHeritage and
@@ -1013,14 +1020,14 @@ future.
(for example with columns separated by white-space characters) or in
the more formal [Comma-separated
values](https://en.wikipedia.org/wiki/Comma-separated_values) or CSV,
- format). In the former case, its best to set the suffixes to `.txt`
- (because most browsers/OSs will automatically know they are plain-text
- and open them without needing any other software. If you have other
- types of data (for example images, or very large tables with millions
- of rows/columns that can be inconvenient in plain-text), feel free to
- use custom binary formats, but later, in the description of your
- project on the server, add a note, explaining what software they
- should use to open them.
+ format). Generally, its best to set the suffixes to `.txt` (because
+ most browsers/OSs will automatically know they are plain-text and open
+ them without needing any other software). If you have other types of
+ data (for example images, or very large tables with millions of
+ rows/columns that can be inconvenient in plain-text), feel free to use
+ custom binary formats, but later, in the description of your project
+ on the server, add a note, explaining what software they should use to
+ open them.
* *Descriptive names*: In some papers there are many files and having
cryptic names will only confuse your readers (actually, yourself in
@@ -1033,45 +1040,21 @@ future.
to rename everything related to each figure (which is very frustrating
and prone to errors).
- * *Good metadata*: Raw data are not too useful merely as a series of
+ * *Good metadata*: Raw data are not too useful merely as a series of raw
numbers! So don't forget to have **good metadata in every file**. If
its a plain-text file, usually lines starting with a `#` are
ignored. So in the command that generates each dataset, add some extra
- information about the dataset as lines starting with `#`. A minimal
- set of recommended metadata are listed below. Feel free to add
- more. You can use a configuration file to keep this information in one
- place and automatically include them in all your output files.
-
- * *Project Title and authors*: This is very important to give a
- general perspective of the figure.
-
- * *Links to project*: For example Zenodo-DOI, Journal-DOI (after it is
- accepted), SoftwareHeritage page, arXiv-ID (or any other pre-print
- server) and ofcourse, your Git repository.
-
- * *Commit hash* of the project that produced the dataset. This
- directly links the dataset to a particular point in your project's
- history. It is stored in the `$(project-commit-hash)` variable that
- is defined in `initialize.mk`. So you can use it anywhere in your
- project.
-
- * *Same commit hashes*: each dataset may have been created at
- different phases of your project's history. If you simply upload the
- produced datasets, they may therefore have different commits on
- them. To avoid confusing your readers (and your self in the future),
- it is best that they all have the same commit hash (which will also
- be the commit hash printed in the paper). So upon publication, we
- recommend deleting all of them and running `./project make` to build
- them all with the same commit hash.
-
- * *Copyright as metadata*: people need to know if they can "use" the
- dataset (i.e., modify it), or possibly re-distribute it and their
- derived products. They also need to know how they can contact the
- creator of the datset (who is usually also the copyright owner). So
- as another metadata element, also add your name and email-address
- (or the name of the person and email of the person who was in charge
- of that part of the project), and the copyright license name and
- standard link to the fully copyright license.
+ information (the more the better!) about the dataset as lines starting
+ with `#`. Based on `reproduce/analysis/config/metadata.conf`, in
+ `initialize.mk`, Maneage will produce a default set of basic
+ information for plain-text data and will put it in the
+ `$(print-general-metadata)` variable. It is thus recommended to print
+ this variable into your plain-text file before printing the actual
+ data (so it shows on top of the file). If you are publishing your data
+ in binary formats, please add all the metadata you see in
+ `$(print-general-metadata)` into each dataset file (for example
+ keywords in the FITS format). If there are many files, its easy to
+ define a tiny shell-script to do the job on each dataset.
- **Link to figure datasets in caption**: all the datasets that go into
the plots should be uploaded directly to Zenodo so they can be
@@ -1201,9 +1184,12 @@ future.
initial/final submission to your desired journal. But we'll just add the
necessary points for arXiv submission here:
- * *Necessary links in comments*: put a link to your project's Git
- repository, Zenodo-DOI (this is not your paper's DOI, its the
- data/resources DOI), and/or SoftwareHeritage link in the comments.
+ * *Necessary links in comments*: put a link to your project's Git
+ repository, Zenodo-DOI (this is not your paper's DOI, its the
+ data/resources DOI), and/or SoftwareHeritage link in the comments.
+
+ - *Update `metadata.conf`*: Once you have your final arXiv ID (formated
+ as: `1234.56789`) put it in `reproduce/analysis/config/metadata.conf`.
- **Submission to a journal**: different journals accept submissions in
different formats, some accept LaTeX, some only want a PDF, or etc. It
@@ -1221,6 +1207,17 @@ future.
the DOI (so you don't need to upload a new version if you just want to
update the metadata).
+ - **After acceptance (before publication)**: Congratulations on the
+ acceptance! The main science content of your paper can't be changed any
+ more, but the paper will not go to the publication editor (for language
+ and style). Your approval of the final proof is necessary before the
+ paper is finally published. Some journals associate your paper's DOI
+ during this process. So before approving the final proof do these steps:
+
+ * Add the Journal DOI in `reproduce/analysis/config/metadata.conf`,
+ and re-build your final data products, so this important metadata is
+ added.
+
diff --git a/reproduce/analysis/config/metadata.conf b/reproduce/analysis/config/metadata.conf
index f77ec70..470e429 100644
--- a/reproduce/analysis/config/metadata.conf
+++ b/reproduce/analysis/config/metadata.conf
@@ -13,10 +13,10 @@
metadata-title = Towards Long-term and Archivable Reproducibility
# DOIs and identifiers.
-metadata-arxiv = 2006.03018
-metadata-doi-zenodo = https://doi.org/10.5281/zenodo.4291207
-metadata-doi-journal =
-metadata-doi = $(metadata-doi-zenodo)
+metadata-arxiv = 2006.03018
+metadata-doi-zenodo = 10.5281/zenodo.4291207
+metadata-doi-journal = 10.1109/MCSE.2021.3072860
+metadata-doi = $(metadata-doi-journal)
metadata-git-repository = http://git.maneage.org/paper-concept.git
# DATA Copyright owner and license information.
diff --git a/reproduce/analysis/make/demo-plot.mk b/reproduce/analysis/make/demo-plot.mk
index df7fabc..53e1918 100644
--- a/reproduce/analysis/make/demo-plot.mk
+++ b/reproduce/analysis/make/demo-plot.mk
@@ -45,7 +45,7 @@ $(a2mk20f1c): $(mk20tab3) | $(tex-publish-dir)
echo "# Column 3: NUM_PAPERS [count, u32] Total number of papers studied in that year." \
>> $@.tmp
echo "# " >> $@.tmp
- $(call print-copyright, $@.tmp)
+ $(call print-general-metadata, $@.tmp)
# Find the maximum number of papers.
diff --git a/reproduce/analysis/make/initialize.mk b/reproduce/analysis/make/initialize.mk
index 168010f..8af3199 100644
--- a/reproduce/analysis/make/initialize.mk
+++ b/reproduce/analysis/make/initialize.mk
@@ -463,16 +463,18 @@ $(data-publish-dir):; mkdir $@
# This statement can be used in published datasets that are in plain-text
# format. It assumes you have already put the data-specific statements in
# its first argument, it will supplement them with general project links.
-print-copyright = \
+doi-prefix-url = https://doi.org
+arxiv-prefix-url = https://arxiv.org/abs
+print-general-metadata = \
echo "\# Project title: $(metadata-title)" >> $(1); \
echo "\# Git commit (that produced this dataset): $(project-commit-hash)" >> $(1); \
echo "\# Project's Git repository: $(metadata-git-repository)" >> $(1); \
if [ x$(metadata-arxiv) != x ]; then \
- echo "\# Pre-print server: https://arxiv.org/abs/$(metadata-arxiv)" >> $(1); fi; \
+ echo "\# Pre-print: $(arxiv-prefix-url)/abs/$(metadata-arxiv)" >> $(1); fi; \
if [ x$(metadata-doi-journal) != x ]; then \
- echo "\# DOI (Journal): $(metadata-doi-journal)" >> $(1); fi; \
+ echo "\# DOI (Journal): $(doi-prefix-url)/$(metadata-doi-journal)" >> $(1); fi; \
if [ x$(metadata-doi-zenodo) != x ]; then \
- echo "\# DOI (Zenodo): $(metadata-doi-zenodo)" >> $(1); fi; \
+ echo "\# DOI (Zenodo): $(doi-prefix-url)/$(metadata-doi-zenodo)" >> $(1); fi; \
echo "\#" >> $(1); \
echo "\# Copyright (C) $$(date +%Y) $(metadata-copyright-owner)" >> $(1); \
echo "\# Dataset is available under $(metadata-copyright)." >> $(1); \