diff options
author | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2020-06-09 02:36:34 +0100 |
---|---|---|
committer | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2020-06-09 03:39:46 +0100 |
commit | 4646fa400796ac6b46e07429c43537475d953dff (patch) | |
tree | 2509a15e6d084940e0442dc6be7826412662588c | |
parent | 77aa9c1ebc26ae88b37f3127a73fd0c8973378d3 (diff) | |
parent | 623ae15c95bb8575b111709705c29b10fcf7c12b (diff) |
Imported Maneage, minor conflicts fixed, a bug found and fixed
Some minor conflicts came up in 'initialize.mk' and 'verify.mk'. For the
former, I chose the version on Maneage, for the latter, I kept the 'master'
version on the checksums of this project, but kept the Maneage version for
the rest of the improvements there (like printing the verified files as
LaTeX comments in 'verify.tex'.
While testing the conflicts, I noticed a bug (in the LaTeX macro for the
number of years in the Menke+20 paper) in the previous build, thanks to the
verification step :-)! Fortunately it wasn't actually printed in the PDF,
so a normal reader won't recognize.
The bug was caused by the recently added meta-data/commented lines in the
'tools-per-year.txt' file: when calculating the number of years studied in
that paper, we were simply counting all the lines and we had forgot to
correct this after adding comments. As a result, the un-used LaTeX macro
file was saying that they have studied 47 years instead of the real 31
years! This element was actually used in the very first (+40 page!) draft
of the paper that was summarized to fit into the journal limits.
-rw-r--r-- | README-hacking.md | 368 | ||||
-rwxr-xr-x | project | 3 | ||||
-rw-r--r-- | reproduce/analysis/config/metadata-common.conf | 16 | ||||
-rw-r--r-- | reproduce/analysis/config/metadata.conf | 25 | ||||
-rw-r--r-- | reproduce/analysis/make/demo-plot.mk | 2 | ||||
-rw-r--r-- | reproduce/analysis/make/initialize.mk | 115 | ||||
-rw-r--r-- | reproduce/analysis/make/verify.mk | 41 | ||||
-rwxr-xr-x | reproduce/software/shell/configure.sh | 15 |
8 files changed, 507 insertions, 78 deletions
diff --git a/README-hacking.md b/README-hacking.md index 0189e70..554ba6b 100644 --- a/README-hacking.md +++ b/README-hacking.md @@ -39,10 +39,11 @@ then discussed to help you navigate the files and their contents. This is followed by a checklist for the easy/fast customization of Maneage to your exciting research. We continue with some tips and guidelines on how to manage or extend your project as it grows based on our experiences with it -so far. The main body concludes with a description of possible future -improvements that are planned for Maneage (but not yet implemented). As -discussed above, we end with a short introduction on the necessity of -reproducible science in the appendix. +so far. There is also a publication checklist, describing the recommended +steps to publish your data/code. The main body concludes with a description +of possible future improvements that are planned for Maneage (but not yet +implemented). As discussed above, we end with a short introduction on the +necessity of reproducible science in the appendix. Please don't forget to share your thoughts, suggestions and criticisms. Maintaining and designing Maneage is itself a separate project, @@ -177,6 +178,12 @@ with (earlier versions of) Maneage. Previously it was simply called details may be different in them. The more recent ones can be used as a good working example. + - Akhlaghi et al. ([2020](https://arxiv.org/abs/2006.03018), + arXiv:2006.03018): The project's version controlled source is [on + Gitlab](https://gitlab.com/makhlaghi/maneage-paper), necessary software, + outputs and backup of history is available in + [zenodo.3872248](https://doi.org/10.5281/zenodo.3872248). + - Infante-Sainz et al. ([2020](https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.5317I), MNRAS, 491, 5317): The version controlled project source is available @@ -605,16 +612,18 @@ First custom commit git push origin maneage # Push 'maneage' branch to 'origin' (no tracking). ``` - 5. **Title**, **short description** and **author**: The title and basic - information of your project's output PDF paper should be added in + 5. **Title**, **short description** and **author**: You can start adding + your name (with your possible coauthors) and tentative abstract in `paper.tex`. You should see the relevant place in the preamble (prior - to `\begin{document}`. After you are done, run the `./project make` - command again to see your changes in the final PDF, and make sure that - your changes don't cause a crash in LaTeX. Of course, if you use a - different LaTeX package/style for managing the title and authors (in - particular a specific journal's style), please feel free to use it - your own methods after finishing this checklist and doing your first - commit. + to `\begin{document}`. Just note that some core project metadata like + the project tile are actually set in + `reproduce/analysis/config/metadata.conf`. So set your project title + in there. After you are done, run the `./project make` command again + to see your changes in the final PDF and make sure that your changes + don't cause a crash in LaTeX. Of course, if you use a different LaTeX + package/style for managing the title and authors (in particular a + specific journal's style), please feel free to use it your own methods + after finishing this checklist and doing your first commit. 6. **Delete dummy parts**: Maneage contains some parts that are only for the initial/test run, mainly as a demonstration of important steps, @@ -756,7 +765,17 @@ First custom commit $ git push # Push your commit to your remote. ``` - 11. **Start your exciting research**: You are now ready to add flesh and + 11. **Read the publication checklist**: The publication checklist below is + very similar to this one, but for the final phase of your project. For + now, you don't have to do any of its steps, but reading it will give + you good insight into the later stages of your project. If you already + know how you want to publish your project, you can implement many of + those steps from the start and during the actual project (in + particular how to organize your data files that go into the plots). + Making it much easier to complete that checklist when you are ready + for submission. + + 12. **Start your exciting research**: You are now ready to add flesh and blood to this raw skeleton by further modifying and adding your exciting research steps. You can use the "published works" section in the introduction (above) as some fully working models to learn @@ -885,6 +904,278 @@ Other basic customizations + + +Publication checklist +===================== + +Once your project is complete and you are ready to submit/publish the +project, we recommend the following steps to ensure the maximum FAIRness of +all your hard work (Findability, Accessibility, Interoperability, and +Reusability). This list may seem long, and may take a day or so to +complete, but please consider the fact that you have spent months/years on +your project, so it is a very small step in your over-all project! Most of +it is about organizing things that you can do during your project. So its +good to have a look at these from the start of your project. + +As you will notice, when you complete this checklist, your projects source +will be present in multiple places: Zenodo, SoftwareHeritage, arXiv, your +own Git repositories. This is a major advantage of Maneaged(!) projects: +because their source is very small (a few hundred kilobytes), there is +effectively no cost in keeping multiple redundancies on different servers, +just in case one (or more) of them are discontinued in the (near/far) +future. + + - **Reserve a DOI for your dataset**: There are multiple data servers that + give this functionality, one of the most well known and (currently!) + well-funded is [Zenodo](https://zenodo.org) so we'll focus on it + here. Ofcourse, you can use any other service that provides a similar + functionality. Once you complete these steps, you can start using/citing + your dataset's DOI in the source of your project to finalize the rest of + the points. Note that with Zenodo, you can even use the given identifier + for things like downloading. + + * *Start new upload*: After you log-in to Zenodo, you can start a new + upload by clicking on the "New Upload button". + + * *Reserve DOI*: Under the "Basic information" --> "Digital Object + Identifier", click on the "Reserve DOI" button. + + * *Fill basic info*: You need to atleast fill the "required fields" + (marked with a red star). + + * *Save your upload*: You should now be able to press the "Save" button + (at the top or bottom of the page) to finalize this step. + + - **Request archival on SoftwareHeritage**: [Software + Heritage](https://archive.softwareheritage.org/save/) is an online + project to archive source code and their development histories. It + provides wonderful features for archiving source code (not data!) and + also for citing special parts of a project's source in any point of its + history. So it blends elegantly with the purpose of Maneage. Once you + make your project's Git repository publicly accessible (no login + required to clone it), you can request that SoftwareHeritage archives + it. Its good if you do this as soon as you make your Git repository + public. When you are ready, just register your repository's address (the + same one you give to `git clone`) to in [SoftwareHeritage's save + form](https://archive.softwareheritage.org/save). + + - **Run a spell-check on `paper.tex`**: we all forget ;-)! + + - **Zenodo/SoftwareHeritage links in paper**: put links to the Zenodo-DOI + (and SoftwareHeritage source when you make it public) in your + paper. Somewhere close the start, maybe under the keywords/abstract, + highlighting that they are supplements for reproducibility. These help + readers easily access these resources for supplementary material + directly from your PDF paper (sources on SoftwareHeritage and + data/software on Zenodo). These links are more trusted/reliable in terms + of longevity than Git repositories or private webpages. + + - **Identify and properly format output data**: If you have a plot, figure + or table in your paper, you need to verify that data later and publish + that data with the paper (see the steps below for both). But before + going to those steps, its good if you polish your datasets with the + recommendations below: + + * *Keep published data in a special place*: it helps if you keep the + to-be-published data files in a special sub-directory under your build + directory. In this way, irrespective of which subMakefile builds a + published dataset, they won't be lost/scatterred in the middle of all + the project's intermediate-built files. + + * *In plain-text*: If the data are in tabular form (for example the X + and Y values in your plots), store them as a simple plain-text file + (for example with columns separated by white-space characters or in + the more formal [Comma-separated + values](https://en.wikipedia.org/wiki/Comma-separated_values), or CSV, + format). If you have other types of data (for example images, or very + large tables with millions of rows/columns that can be inconvenient in + plain-text), feel free to use custom binary formats, but later, in the + description of your project on the server, tell people what software + they should use to open them. + + * *Descriptive names*: In some papers there are many files and having + cryptic names will only confuse your readers (actually, yourself in + two years!). So set the names of the files to be as descriptive as + possible, so simply by reading the name of the file, someone who has + read the paper will understand what figure it corresponds to. In + particular, don't set names like `figure-3.txt`! In a few months you + will forget the order of the figures! Even worse, after the referee + report, you may need to re-arrange some figures and you will be forced + to rename everything related to each figure (which is very frustrating + and prone to errors). + + * *Good metadata*: Raw data are not too useful merely as a series of + numbers! So don't forget to have **good metadata in every file**. If + its a plain-text file, usually lines starting with a `#` are + ignored. So in the command that generates each dataset, add some extra + information about the dataset as lines starting with `#`. A minimal + set of recommended metadata are listed below. Feel free to add + more. You can use a configuration file to keep this information in one + place and automatically include them in all your output files. + + * *Project Title and authors*: This is very important to give a + general perspective of the figure. + + * *Links to project*: For example Zenodo-DOI, Journal-DOI (after it is + accepted), SoftwareHeritage page, arXiv-ID (or any other pre-print + server) and ofcourse, your Git repository. + + * *Commit hash* of the project that produced the dataset. This + directly links the dataset to a particular point in your project's + history. It is stored in the `$(project-commit-hash)` variable that + is defined in `initialize.mk`. So you can use it anywhere in your + project. + + * *Copyright as metadata*: people need to know if they can use the + dataset (i.e., modify it), or possibly re-distribute it and their + derived products. They also need to know how they can contact the + creator of the datset (who is usually also the copyright owner). So + as another metadata element, also add your name and email-address + (or the name of the person and email of the person who was in charge + of that part of the project), and the copyright license name and + standard link to the fully copyright license. + + - **Link to figure datasets in caption**: all the datasets that go into + the plots should be uploaded directly to Zenodo so they can be + viewed/downloaded with a simple link in the caption. For example see the + last sentence of the caption of Figure 1 in + [arXiv:2006.03018](https://arxiv.org/pdf/2006.03018.pdf), it points to + [the data](https://zenodo.org/record/3872248/files/tools-per-year.txt) + that was used to create that figure's top plot. As you see, this will + allow your paper's readers (again, most probably your future-self!) to + directly access the numbers of each visualization (plot/figure) with a + simple click in a trusted server. This also shows the major advantage of + having your data as simple plain-text where possible, as described + above. To help you keep all your to-be-visualized datasets in a single + place, Maneage has the two `tex-publish-dir` and `data-publish-dir` + directories that are defined in `reproduce/analysis/make/initialize.mk`, + see the comments above their definition for more. + + - **Verification step**: It is very important to automatically verify the + outptus of your project. Recall from the customization checklist (above) + that you can activate verification by setting the `verify-outputs` + variable to `yes` in `reproduce/analysis/config/verify-outputs.conf`. So + please activate it and look into the `reproduce/analysis/make/verify.mk` + to add the necessary steps to automatically verify your outputs. *Tip*: + you don't have to generate the checksums manually, just give a wrong + value (for example `XXXX`) so Maneage crashes! In the error message it + will then print the actual and expected checksums and you can take the + value from there. Outputs that must be verified can be listed as: + + * *subMakefile LaTeX macro files*: these LaTeX macros put numbers into + the text. You don't want your readers (actually: yourself in two + years!) to have to painfully find and check, by eye, all those tiny + numbers buried deep in the ocean of words! + + * *Final data files* (for tables, figures, or plots, or as data + release). These are the same files described above. If you have + followed the guidelines above and stored them as plain-text with + comments on top, you can use the provided function + `verify-txt-no-comments-leading-space` which takes the filename and + checksum as arguments to avoid the commented lines (which may change) + and only verify the data. If your data are in other formats, be sure + to verify them without metadata that may change (like date and etc). + + - **Fill `README.md`**: The `README.md` is *the first place* your readers + are going to look into. It already has a default text with place-holders + in the form of `XXXXXX`. Please go through it and replace the + place-holders with the relevant information/links or feel free to + add/remove anything else. Just don't forget to tell your readers in + `README.md` that they can learn about this system in the + `README-hacking.md` file (ideally close to the top, like it is now). + + - **Confirm if your project builds from scratch**: Before publishing + anything, you should see if your project can indeed reproduce itself! + So, go to a temporary directory, clone your project from its repository + and try configuring and building it from scratch in a new-temporary + build-directory. It is important to ignore the directory you developed + your project on (source and build): you may have files there that you + forgot to import into Git or depended on in the build (it + happens!). Ideally, it would be good to try it on a different computer. + + - **Confirm if `./project make dist` works**: The special target `dist` + tells the project to build a tarball that is ready to compile the LaTeX + PDF without having to do the analysis and build software. This is very + useful for servers like arXiv, or some journals. This tarball is also + one of the deliverables you want to publish on Zenodo. Once the tarball + is created, copy it to a temporary directory outside of Maneage, unpack + it and run `make` (completely ignoring Maneage's `./project` script). If + you plan to submit your paper to arXiv, the best test is to actually + start a test submission on arXiv to upload the tarball there to see if + it can build your PDF. Once it works, you can delete that temporary + submission for now. Afterwards, try configuring and building it with the + tarball by running its `./project` (from scratch and without the Git + history!). If there is a problem in any of these tests, you can modify + what goes into this tarball in `reproduce/analysis/make/initialize.mk`: + go through the steps and add the necessary components until the checks + pass. + + - **Upload all deliverables to Zenodo**: With the datasets ready, you can + now upload the following deliverables to Zenodo. Except for the data + files, put the Git hash of your Maneaged project at the moment of + publication in the filename of other uploaded files. The output files + shouldn't have a hash in their names because their URL (that goes in the + caption of the figures/tables) should be known prior to a commit, + creating a cyclic dependency! Ideally the hash should be placed just + before the final suffix, for example `paper-XXXXXXX.pdf` (where + `XXXXXXX` is the Git hash). This will clearly identify the point in + history that your file was created. + + * **paper-XXXXXXX.pdf**: you shouldn't just download data to the data + server, also upload your paper's PDF so its there with the other raw + formats. It will greatly help yourself and others. Most datacenters + (like Zenodo) actually also have a PDF viewer that will load + automatically before the list of data files. For example see + [zenodo.3408481](https://doi.org/10.5281/zenodo.3408481). + + * **`project-XXXXXXX.tar.gz`**: Or the output of `make dist` as + described above. + + * **`project-git.bundle`** This is the full Git history of the project + in one file (which you can actually clone from later!). Its + necessary to publish this with your dataset too because Git + repositories make no promise on longevity. The way to "bundle" a Git + history is described below, in summary, its this command: + ```shell + $ git bundle create my-project-git.bundle --all + ``` + + * **`software-XXXXXXX.tar.gz`**: This is effectively a copy of all the + software source code tarballs in your project's + `.build/software/tarballs`. It is necessary to upload these with + your project to avoid relying on third party servers. In the future + any one of those servers may go down and if so, your project won't + be buildable. You can generate this tarball easily with `make + dist-software`. + + * All the figure (and other) output datasets of the project. Don't + rename these files, let them have the same descriptive name + mentioned above. Also recall that a link to all these files is also + in the caption of the respective figure. + + - **Upload to [arXiv](https://arxiv.org)**: or to any other pre-print + server (if you want to). Of course, you can also do this after the + initial/final submission to your desired journal. But we'll just add the + necessary points for arXiv submission here: + + * *Necessary links in comments*: put a link to your project's Git + repository, Zenodo-DOI (this is not your paper's DOI, its the + data/resources DOI), and/or SoftwareHeritage link in the comments. + + - **Submission to a journal**: different journals accept submissions in + different formats, some accept LaTeX, some only want a PDF, or etc. It + would be good if you highlight in the cover-letter that your work is + reproducible and provide the Zenodo and Software Heritage links (if they + are public). If not, you can mention that everything is ready for such a + submission after acceptance. + + + + + + + Tips for designing your project =============================== @@ -1139,14 +1430,30 @@ for the benefit of others. pull new work that is done in Maneage. If the changes are useful for your work, you can merge them with your project to benefit from them. Just pay **very close attention** to resolving possible - **conflicts** which might happen in the merge (updated settings that - you have customized in Maneage). + **conflicts** which might happen in the merge. In particular the + "semantic conflicts" that don't show up in Git, but can potentially + break your project, for example updates to software versions, or to + internal Maneage structure. Hence read the commit messages of `git + log` carefully to **see what has changed**. The best way to check is + to first complete the steps below, then build your project from + scratch (from `./project configure` in a new build-directory). ```shell - # Go to the 'maneage' branch and import/inspect updates. + # Go to the 'maneage' branch and import updates. $ git checkout maneage $ git pull # Get recent work in Maneage - $ git log XXXXXX..XXXXXX --reverse # Inspect new work (replace XXXXXXs with hashs mentioned in output of previous command). + + # Read all the commit messages of the newly imported + # features/changes. In particular pay close attention to the ones + # starting with 'IMPORTANT': these may cause a crash in your + # project (changing something fundamental in Maneage). + # + # Replace the XXXXXXX..YYYYYYY with hashs mentioned close to start + # of the 'git pull' command outputs. + $ git log XXXXXXX..YYYYYYY --reverse + + # Have a look at the commits in the 'maneage' branch in relation + # with your project. $ git log --oneline --graph --decorate --all # General view of branches. # Go to your 'master' branch and import all the updates into @@ -1163,18 +1470,33 @@ for the benefit of others. # If any files have conflicts, open a text editor and correct the # conflict (placed in between '<<<<<<<', '=======' and '>>>>>>>'. - # When such conflicts are remoted, the file will be automatically - # removed from the "Unmerged paths" + # Once all conflicts in a file are remoted, the file will be + # automatically removed from the "Unmerged paths", so run this + # command after correcting the conflicts of each file just to make + # sure things are clean. git status - # TIP: If you want the changes in one file to be only from branch - # ('maneage' or 'master'), you can use this command: - # $ git checkout <BRANCH-NAME> -- <FILENAME> + # TIP: If you want the changes in one file to be only from a + # special branch ('maneage' or 'master', completely ignoring + # changes in the other), use this command: + # $ git checkout <BRANCH-NAME> -- <FILENAME> # When there are no more "Unmerged paths", you can commit the # merge. In the commit message, Explain any conflicts that you # fixed. git commit + + # Do a clean build of your project (to check for "Semanic + # conflicts" (not detected as a conflict by Git, but may cause a + # crash in your project). You can backup your build directory + # before running the 'distclean' target. + + # Any error in the build will be due to changes in Maneage, so look + # closely at the commits (especially the + + ./project make distclean # will DELETE ALL your build-directory!! + ./project configure -e + ./project make ``` - *Adding Maneage to a fork of your project*: As you and your colleagues @@ -90,7 +90,8 @@ Project 'make' special features. ./project make dist Produce a LaTeX-ready-to-build distribution tarball ('tar.gz') of the project. This is ready to be uploaded to servers like 'arXiv.org'. - ./project make dist-zip Similar to 'dist', but compress with '.zip'. + ./project make dist-lzip Similar to 'dist', but compress to '.tar.lz'. + ./project make dist-zip Similar to 'dist', but compress to '.zip'. With the options below you can modify the default behavior. Configure options: diff --git a/reproduce/analysis/config/metadata-common.conf b/reproduce/analysis/config/metadata-common.conf deleted file mode 100644 index 7bc9fa5..0000000 --- a/reproduce/analysis/config/metadata-common.conf +++ /dev/null @@ -1,16 +0,0 @@ -# Metadata parameters that can be used in - -# Project information -metadata-title = Towards Long-term and Archivable Reproducibility - -# DOIs and identifiers. -metadata-arxiv = -metadata-doi-zenodo = https://doi.org/10.5281/zenodo.3872248 -metadata-doi-journal = -metadata-doi = $(metadata-doi-zenodo) -metadata-git-repository = https://gitlab.com/makhlaghi/maneage-paper - -# Copyright and identifier. -metadata-copyright-owner = Mohammad Akhlaghi <mohammad@akhlaghi.org> -metadata-copyright = Creative Commons Attribution-ShareAlike (CC BY-SA) -metadata-copyright-url = https://creativecommons.org/licenses/by-sa/4.0 diff --git a/reproduce/analysis/config/metadata.conf b/reproduce/analysis/config/metadata.conf new file mode 100644 index 0000000..cddc33f --- /dev/null +++ b/reproduce/analysis/config/metadata.conf @@ -0,0 +1,25 @@ +# Project meta-data that can be used in a project's output datasets and +# final paper. Please set the values here and use them in your analysis or +# paper, don't repeat them +# +# Copyright (C) 2020 Mohammad Akhlaghi <mohammad@akhlaghi.org> +# +# Copying and distribution of this file, with or without modification, are +# permitted in any medium without royalty provided the copyright notice and +# this notice are preserved. This file is offered as-is, without any +# warranty. + +# Project information +metadata-title = Towards Long-term and Archivable Reproducibility + +# DOIs and identifiers. +metadata-arxiv = 2006.03018 +metadata-doi-zenodo = https://doi.org/10.5281/zenodo.3872248 +metadata-doi-journal = +metadata-doi = $(metadata-doi-zenodo) +metadata-git-repository = https://gitlab.com/makhlaghi/maneage-paper + +# DATA Copyright owner and license information. +metadata-copyright-owner = Mohammad Akhlaghi <mohammad@akhlaghi.org> +metadata-copyright = Creative Commons Attribution-ShareAlike (CC BY-SA) +metadata-copyright-url = https://creativecommons.org/licenses/by-sa/4.0 diff --git a/reproduce/analysis/make/demo-plot.mk b/reproduce/analysis/make/demo-plot.mk index a149040..5ddb3d7 100644 --- a/reproduce/analysis/make/demo-plot.mk +++ b/reproduce/analysis/make/demo-plot.mk @@ -79,7 +79,7 @@ $(mtexdir)/demo-plot.tex: $(a2mk20f1c) $(pconfdir)/demo-year.conf echo "\newcommand{\menkefirstyear}{$$v}" > $@ # Find the number of rows in the plotted table. - v=$$(cat $(a2mk20f1c) | wc -l) + v=$$(awk '!/^#/{c++} END{print c}' $(a2mk20f1c)) echo "\newcommand{\menkenumyears}{$$v}" >> $@ # Find the number of papers in 1996. diff --git a/reproduce/analysis/make/initialize.mk b/reproduce/analysis/make/initialize.mk index 450b673..489f9e3 100644 --- a/reproduce/analysis/make/initialize.mk +++ b/reproduce/analysis/make/initialize.mk @@ -203,6 +203,16 @@ $(lockdir): | $(BDIR); mkdir $@ +# Version and distribution tarball definitions +project-commit-hash := $(shell if [ -d .git ]; then \ + echo $$(git describe --dirty --always --long); else echo NOGIT; fi) +project-package-name := maneaged-$(project-commit-hash) +project-package-contents = $(texdir)/$(project-package-name) + + + + + # High-level Makefile management # ------------------------------ # @@ -213,12 +223,8 @@ $(lockdir): | $(BDIR); mkdir $@ # we want to ensure that the file is always built in every run: it contains # the project version which may change between two separate runs, even when # no file actually differs. -project-commit-hash := $(shell if [ -d .git ]; then \ - echo $$(git describe --dirty --always --long); else echo NOGIT; fi) -packagebasename := paper-$(project-commit-hash) -packagecontents = $(texdir)/$(packagebasename) -.PHONY: all clean dist dist-zip distclean clean-mmap $(packagecontents) \ - $(mtexdir)/initialize.tex +.PHONY: all clean dist dist-zip dist-lzip distclean clean-mmap \ + $(project-package-contents) $(mtexdir)/initialize.tex # --------- Delete for no Gnuastro --------- clean-mmap:; rm -f reproduce/config/gnuastro/mmap* @@ -262,11 +268,11 @@ distclean: clean # that is ready for building the final PDF with LaTeX. This is useful for # collaborators who only want to contribute to the text of your project, # without having to worry about the technicalities of the analysis. -$(packagecontents): paper.pdf | $(texdir) +$(project-package-contents): paper.pdf | $(texdir) # Set up the output directory, delete it if it exists and remake it # to fill with new contents. - dir=$(texdir)/$(packagebasename) + dir=$@ rm -rf $$dir mkdir $$dir @@ -304,10 +310,7 @@ $(packagecontents): paper.pdf | $(texdir) cp -r tex/img $$dir/tex/img cp tex/tikz/*.eps $$dir/tex/tikz cp -r reproduce/* $$dir/reproduce - for d in $$(find tex/build/ -mindepth 1 -maxdepth 1 -type d \ - ! -name $(packagebasename)); do - cp -r $$d $$dir/tex/build - done + cp -r tex/build/!($(project-package-name)) $$dir/tex/build # Clean up un-necessary/local files: 1) the $(texdir)/build* # directories (when building in a group structure, there will be @@ -346,32 +349,88 @@ $(packagecontents): paper.pdf | $(texdir) # Clean temporary (currently those ending in `~') files. cd $(texdir) - find $(packagebasename) -name \*~ -delete - find $(packagebasename) -name \*.swp -delete + find $(project-package-name) -name \*~ -delete + find $(project-package-name) -name \*.swp -delete # PROJECT SPECIFIC # ---------------- # Put any project specific distribution steps here. # ---------------- -# Package into `.tar.gz'. -dist: $(packagecontents) +# Package into `.tar.gz' or '.tar.lz'. +dist dist-lzip: $(project-package-contents) curdir=$$(pwd) cd $(texdir) - tar -cf $(packagebasename).tar $(packagebasename) - gzip -f --best $(packagebasename).tar - rm -rf $(packagebasename) + tar -cf $(project-package-name).tar $(project-package-name) + if [ $@ = dist ]; then + suffix=gz + gzip -f --best $(project-package-name).tar + elif [ $@ = dist-lzip ]; then + suffix=lz + lzip -f --best $(project-package-name).tar + fi + rm -rf $(project-package-name) cd $$curdir - mv $(texdir)/$(packagebasename).tar.gz ./ + mv $(texdir)/$(project-package-name).tar.$$suffix ./ # Package into `.zip'. -dist-zip: $(packagecontents) +dist-zip: $(project-package-contents) curdir=$$(pwd) cd $(texdir) - zip -q -r $(packagebasename).zip $(packagebasename) - rm -rf $(packagebasename) + zip -q -r $(project-package-name).zip $(project-package-name) + rm -rf $(project-package-name) + cd $$curdir + mv $(texdir)/$(project-package-name).zip ./ + +# Package the software tarballs. +dist-software: + curdir=$$(pwd) + cd $(BDIR) + if [ -d .git ]; then + dirname="software-$$(git describe --dirty --always --long)" + else + dirname="software-NOGIT"; + fi + mkdir $$dirname + cp -L software/tarballs/* $$dirname/ + tar -cf $$dirname.tar $$dirname + gzip -f --best $$dirname.tar + rm -rf $$dirname cd $$curdir - mv $(texdir)/$(packagebasename).zip ./ + mv $(BDIR)/$$dir.tar.gz ./ + + + + + +# Directory containing to-be-published datasets +# --------------------------------------------- +# +# Its good practice (so you don't forget in the last moment!) to have all +# the plot/figure/table data that you ultimately want to publish in a +# single directory. +# +# There are two types of to-publish data in the project. +# +# 1. Those data that also go into LaTeX (for example to give to LateX's +# PGFPlots package to create the plot internally) should be under the +# '$(BDIR)/tex' directory (because other LaTeX producers may also need +# it for example when using './project make dist'). The contents of +# this directory are directly taken into the tarball. +# +# 2. The data that aren't included directly in the LaTeX run of the paper, +# can be seen as supplements. A good place to keep them is under your +# build-directory. +# +# RECOMMENDATION: don't put the figure/plot/table number in the names of +# your to-be-published datasets! Given them a descriptive/short name that +# would be clear to anyone who has read the paper. Later, in the caption +# (or paper's tex/appendix), you will put links to the dataset on servers +# like Zenodo (see the "Publication checklist" in 'README-hacking.md'). +tex-publish-dir = $(texdir)/to-publish +data-publish-dir = $(BDIR)/data-to-publish +$(tex-publish-dir):; mkdir $@ +$(data-publish-dir):; mkdir $@ @@ -385,9 +444,10 @@ dist-zip: $(packagecontents) # its first argument, it will supplement them with general project links. print-copyright = \ echo "\# Project title: $(metadata-title)" >> $(1); \ - echo "\# Git commit (that produced this dataset): $(packagebasename)" >> $(1); \ + echo "\# Git commit (that produced this dataset): $(project-commit-hash)" >> $(1); \ echo "\# Project's Git repository: $(metadata-git-repository)" >> $(1); \ - if [ x$(metadata-arxiv) != x ]; then echo "\# arXiv:$(metadata-arxiv)" >> $(1); fi; \ + if [ x$(metadata-arxiv) != x ]; then \ + echo "\# Pre-print server: https://arxiv.org/abs/$(metadata-arxiv)" >> $(1); fi; \ if [ x$(metadata-doi-journal) != x ]; then \ echo "\# DOI (Journal): $(metadata-doi-journal)" >> $(1); fi; \ if [ x$(metadata-doi-zenodo) != x ]; then \ @@ -401,7 +461,6 @@ print-copyright = \ - # Project initialization results # ------------------------------ # @@ -410,5 +469,7 @@ print-copyright = \ # calculated everytime the project is run. So even though this file # actually exists, it is also aded as a `.PHONY' target above. $(mtexdir)/initialize.tex: | $(mtexdir) + + # Version and title of project. echo "\newcommand{\projecttitle}{$(metadata-title)}" > $@ echo "\newcommand{\projectversion}{$(project-commit-hash)}" >> $@ diff --git a/reproduce/analysis/make/verify.mk b/reproduce/analysis/make/verify.mk index fb8afc0..dd224d6 100644 --- a/reproduce/analysis/make/verify.mk +++ b/reproduce/analysis/make/verify.mk @@ -40,22 +40,34 @@ verify-print-tips = \ echo "the following project source file:"; \ echo " reproduce/analysis/make/verify.mk" -verify-txt-no-comments-leading-space = \ +# Removes following components of a plain-text file, calculates checksum +# and compares with given checksum: +# - All commented lines (starting with '#') are removed. +# - All empty lines are removed. +# - All space-characters in remaining lines are removed (so the width of +# the printed columns won't invalidate the verification). +# +# It takes three arguments: +# - First argument: Full address of file to check. +# - Second argument: Expected checksum of the file to check. +# - File name to write result. +verify-txt-no-comments-no-space = \ infile=$(strip $(1)); \ inchecksum=$(strip $(2)); \ + innobdir=$$(echo $$infile | sed -e's|$(BDIR)/||g'); \ if ! [ -f "$$infile" ]; then \ $(call verify-print-error-start); \ echo "The following file (that should be verified) doesn't exist:"; \ echo " $$infile"; \ echo; exit 1; \ fi; \ - checksum=$$(sed -e 's/^[[:space:]]*//g' \ + checksum=$$(sed -e 's/[[:space:]][[:space:]]*//g' \ -e 's/\#.*$$//' \ -e '/^$$/d' $$infile \ - | md5sum \ - | awk '{print $$1}'); \ + | md5sum \ + | awk '{print $$1}'); \ if [ x"$$inchecksum" = x"$$checksum" ]; then \ - echo "Verified: $$infile"; \ + echo "%% (VERIFIED) $$checksum $$innobdir" >> $(3); \ else \ $(call verify-print-error-start); \ $(call verify-print-tips); \ @@ -105,6 +117,15 @@ $(mtexdir)/verify.tex: $(foreach s, $(verify-dep), $(mtexdir)/$(s).tex) # Make sure that verification is actually requested. if [ x"$(verify-outputs)" = xyes ]; then + # Make sure the temporary output doesn't exist (because we want + # to append to it). We are making a temporary output target so if + # there is a crash in the middle, Make will not continue. If we + # write in the final target progressively, the file will exist, + # and its date will be more recent than all prerequisites, so + # next time the project is run, Make will continue and ignore the + # rest of the checks. + rm -f $@.tmp + # Verify the figure datasets. $(call verify-txt-no-comments-leading-space, \ $(a2mk20f1c), 76fc5b13495c4d8e8e6f8d440304cf69) @@ -114,14 +135,14 @@ $(mtexdir)/verify.tex: $(foreach s, $(verify-dep), $(mtexdir)/$(s).tex) file=$(mtexdir)/$$m.tex if [ $$m == download ]; then s=64da83ee3bfaa236849927cdc001f5d3 elif [ $$m == format ]; then s=e04d95a539b5540c940bf48994d8d45f - elif [ $$m == demo-plot ]; then s=2504472bd2b3f60b5a26c5f2a3a67251 + elif [ $$m == demo-plot ]; then s=48bffe6cf8db790c63a33302d20db77f else echo; echo "'$$m' not recognized."; exit 1 fi - $(call verify-txt-no-comments-leading-space, $$file, $$s) + $(call verify-txt-no-comments-no-space, $$file, $$s, $@.tmp) done - # Make an empty final target. - echo "%% Project outputs are verified." > $@ + # Move temporary file to final target. + mv $@.tmp $@ else - echo "%% Project outputs NOT VERIFIED!!!" > $@ + echo "% Verification was DISABLED!" > $@ fi diff --git a/reproduce/software/shell/configure.sh b/reproduce/software/shell/configure.sh index 1c7e60d..3b3c38f 100755 --- a/reproduce/software/shell/configure.sh +++ b/reproduce/software/shell/configure.sh @@ -1058,6 +1058,21 @@ fi +# If 'tex/build' and 'tex/tikz' are symbolic links then 'rm -f' will delete +# them and we can continue. However, when the project is being built from +# the tarball, these two are not symbolic links but actual directories with +# the necessary built-components to build the PDF in them. In this case, +# because 'tex/build' is a directory, 'rm -f' will fail, so we'll just +# rename the two directories (as backup) and let the project build the +# proper symbolic links afterwards. +if rm -f tex/build; then + rm -f tex/tikz +else + mv tex/tikz tex/tikz-from-tarball + mv tex/build tex/build-from-tarball +fi + + # Set the symbolic links for easy access to the top project build # directories. Note that these are put in each user's source/cloned # directory, not in the build directory (which can be shared between many |