From 623ae15c95bb8575b111709705c29b10fcf7c12b Mon Sep 17 00:00:00 2001 From: Mohammad Akhlaghi Date: Tue, 2 Jun 2020 03:45:46 +0100 Subject: IMPORTANT: Added publication checklist, improved relevant infrastructure Possible semantic conflicts (that may not show up as Git conflicts but may cause a crash in your project after the merge): 1) The project title (and other basic metadata) should be set in 'reproduce/analysis/conf/metadata.conf'. Please include this file in your merge (if it is ignored because of '.gitattributes'!). 2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if you have added all analysis Makefiles to the '.gitattributes' file (thus not merging any change in them with your branch). For example with this command: git diff master...maneage -- reproduce/analysis/make/initialize.mk 3) The old 'verify-txt-no-comments-leading-space' function has been replaced by 'verify-txt-no-comments-no-space'. The new function will also remove all white-space characters between the columns (not just white space characters at the start of the line). Thus the resulting check won't involve spacing between columns. A common set of steps are always necessary to prepare a project for publication. Until now, we would simply look at previous submissions and try to follow them, but that was prone to errors and could cause confusion. The internal infrastructure also didn't have some useful features to make good publication possible. Now that the submission of a paper fully devoted to the founding criteria of Maneage is complete (arXiv:2006.03018), it was time to formalize the necessary steps for easier submission of a project using Maneage and implement some low-level features that can make things easier. With this commit a first draft of the publication checklist has been added to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018 and zenodo.3872248. To help guide users on implementing the good practices for output datasets, the outputs of the default project shown in the paper now use the new features). After reading the checklist, please inspect these. Some other relevant changes in this commit: - The publication involves a copy of the necessary software tarballs. Hence a new target ('dist-software') was also added to package all the project's software tarballs in one tarball for easy distribution. - A new 'dist-lzip' target has been defined for those who want to distribute an Lzip-compressed tarball. - The '\includetikz' LaTeX macro now has a second argument to allow configuring the '\includegraphics' call when the plot should not be built, but just imported. --- README-hacking.md | 346 ++++++++++++++++++++- README.md | 148 +++++---- paper.tex | 45 +-- project | 3 +- reproduce/analysis/config/delete-me-num.conf | 9 - .../analysis/config/delete-me-squared-num.conf | 9 + reproduce/analysis/config/metadata.conf | 25 ++ reproduce/analysis/make/delete-me.mk | 98 ++++-- reproduce/analysis/make/initialize.mk | 135 ++++++-- reproduce/analysis/make/verify.mk | 49 ++- reproduce/software/shell/configure.sh | 17 +- tex/src/delete-me-demo.tex | 51 --- tex/src/delete-me-image-histogram.tex | 51 +++ tex/src/delete-me-squared.tex | 32 ++ tex/src/delete-me.tex | 32 -- tex/src/preamble-pgfplots.tex | 25 +- 16 files changed, 815 insertions(+), 260 deletions(-) delete mode 100644 reproduce/analysis/config/delete-me-num.conf create mode 100644 reproduce/analysis/config/delete-me-squared-num.conf create mode 100644 reproduce/analysis/config/metadata.conf delete mode 100644 tex/src/delete-me-demo.tex create mode 100644 tex/src/delete-me-image-histogram.tex create mode 100644 tex/src/delete-me-squared.tex delete mode 100644 tex/src/delete-me.tex diff --git a/README-hacking.md b/README-hacking.md index 902e544..554ba6b 100644 --- a/README-hacking.md +++ b/README-hacking.md @@ -39,10 +39,11 @@ then discussed to help you navigate the files and their contents. This is followed by a checklist for the easy/fast customization of Maneage to your exciting research. We continue with some tips and guidelines on how to manage or extend your project as it grows based on our experiences with it -so far. The main body concludes with a description of possible future -improvements that are planned for Maneage (but not yet implemented). As -discussed above, we end with a short introduction on the necessity of -reproducible science in the appendix. +so far. There is also a publication checklist, describing the recommended +steps to publish your data/code. The main body concludes with a description +of possible future improvements that are planned for Maneage (but not yet +implemented). As discussed above, we end with a short introduction on the +necessity of reproducible science in the appendix. Please don't forget to share your thoughts, suggestions and criticisms. Maintaining and designing Maneage is itself a separate project, @@ -177,6 +178,12 @@ with (earlier versions of) Maneage. Previously it was simply called details may be different in them. The more recent ones can be used as a good working example. + - Akhlaghi et al. ([2020](https://arxiv.org/abs/2006.03018), + arXiv:2006.03018): The project's version controlled source is [on + Gitlab](https://gitlab.com/makhlaghi/maneage-paper), necessary software, + outputs and backup of history is available in + [zenodo.3872248](https://doi.org/10.5281/zenodo.3872248). + - Infante-Sainz et al. ([2020](https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.5317I), MNRAS, 491, 5317): The version controlled project source is available @@ -605,16 +612,18 @@ First custom commit git push origin maneage # Push 'maneage' branch to 'origin' (no tracking). ``` - 5. **Title**, **short description** and **author**: The title and basic - information of your project's output PDF paper should be added in + 5. **Title**, **short description** and **author**: You can start adding + your name (with your possible coauthors) and tentative abstract in `paper.tex`. You should see the relevant place in the preamble (prior - to `\begin{document}`. After you are done, run the `./project make` - command again to see your changes in the final PDF, and make sure that - your changes don't cause a crash in LaTeX. Of course, if you use a - different LaTeX package/style for managing the title and authors (in - particular a specific journal's style), please feel free to use it - your own methods after finishing this checklist and doing your first - commit. + to `\begin{document}`. Just note that some core project metadata like + the project tile are actually set in + `reproduce/analysis/config/metadata.conf`. So set your project title + in there. After you are done, run the `./project make` command again + to see your changes in the final PDF and make sure that your changes + don't cause a crash in LaTeX. Of course, if you use a different LaTeX + package/style for managing the title and authors (in particular a + specific journal's style), please feel free to use it your own methods + after finishing this checklist and doing your first commit. 6. **Delete dummy parts**: Maneage contains some parts that are only for the initial/test run, mainly as a demonstration of important steps, @@ -756,7 +765,17 @@ First custom commit $ git push # Push your commit to your remote. ``` - 11. **Start your exciting research**: You are now ready to add flesh and + 11. **Read the publication checklist**: The publication checklist below is + very similar to this one, but for the final phase of your project. For + now, you don't have to do any of its steps, but reading it will give + you good insight into the later stages of your project. If you already + know how you want to publish your project, you can implement many of + those steps from the start and during the actual project (in + particular how to organize your data files that go into the plots). + Making it much easier to complete that checklist when you are ready + for submission. + + 12. **Start your exciting research**: You are now ready to add flesh and blood to this raw skeleton by further modifying and adding your exciting research steps. You can use the "published works" section in the introduction (above) as some fully working models to learn @@ -885,6 +904,278 @@ Other basic customizations + + +Publication checklist +===================== + +Once your project is complete and you are ready to submit/publish the +project, we recommend the following steps to ensure the maximum FAIRness of +all your hard work (Findability, Accessibility, Interoperability, and +Reusability). This list may seem long, and may take a day or so to +complete, but please consider the fact that you have spent months/years on +your project, so it is a very small step in your over-all project! Most of +it is about organizing things that you can do during your project. So its +good to have a look at these from the start of your project. + +As you will notice, when you complete this checklist, your projects source +will be present in multiple places: Zenodo, SoftwareHeritage, arXiv, your +own Git repositories. This is a major advantage of Maneaged(!) projects: +because their source is very small (a few hundred kilobytes), there is +effectively no cost in keeping multiple redundancies on different servers, +just in case one (or more) of them are discontinued in the (near/far) +future. + + - **Reserve a DOI for your dataset**: There are multiple data servers that + give this functionality, one of the most well known and (currently!) + well-funded is [Zenodo](https://zenodo.org) so we'll focus on it + here. Ofcourse, you can use any other service that provides a similar + functionality. Once you complete these steps, you can start using/citing + your dataset's DOI in the source of your project to finalize the rest of + the points. Note that with Zenodo, you can even use the given identifier + for things like downloading. + + * *Start new upload*: After you log-in to Zenodo, you can start a new + upload by clicking on the "New Upload button". + + * *Reserve DOI*: Under the "Basic information" --> "Digital Object + Identifier", click on the "Reserve DOI" button. + + * *Fill basic info*: You need to atleast fill the "required fields" + (marked with a red star). + + * *Save your upload*: You should now be able to press the "Save" button + (at the top or bottom of the page) to finalize this step. + + - **Request archival on SoftwareHeritage**: [Software + Heritage](https://archive.softwareheritage.org/save/) is an online + project to archive source code and their development histories. It + provides wonderful features for archiving source code (not data!) and + also for citing special parts of a project's source in any point of its + history. So it blends elegantly with the purpose of Maneage. Once you + make your project's Git repository publicly accessible (no login + required to clone it), you can request that SoftwareHeritage archives + it. Its good if you do this as soon as you make your Git repository + public. When you are ready, just register your repository's address (the + same one you give to `git clone`) to in [SoftwareHeritage's save + form](https://archive.softwareheritage.org/save). + + - **Run a spell-check on `paper.tex`**: we all forget ;-)! + + - **Zenodo/SoftwareHeritage links in paper**: put links to the Zenodo-DOI + (and SoftwareHeritage source when you make it public) in your + paper. Somewhere close the start, maybe under the keywords/abstract, + highlighting that they are supplements for reproducibility. These help + readers easily access these resources for supplementary material + directly from your PDF paper (sources on SoftwareHeritage and + data/software on Zenodo). These links are more trusted/reliable in terms + of longevity than Git repositories or private webpages. + + - **Identify and properly format output data**: If you have a plot, figure + or table in your paper, you need to verify that data later and publish + that data with the paper (see the steps below for both). But before + going to those steps, its good if you polish your datasets with the + recommendations below: + + * *Keep published data in a special place*: it helps if you keep the + to-be-published data files in a special sub-directory under your build + directory. In this way, irrespective of which subMakefile builds a + published dataset, they won't be lost/scatterred in the middle of all + the project's intermediate-built files. + + * *In plain-text*: If the data are in tabular form (for example the X + and Y values in your plots), store them as a simple plain-text file + (for example with columns separated by white-space characters or in + the more formal [Comma-separated + values](https://en.wikipedia.org/wiki/Comma-separated_values), or CSV, + format). If you have other types of data (for example images, or very + large tables with millions of rows/columns that can be inconvenient in + plain-text), feel free to use custom binary formats, but later, in the + description of your project on the server, tell people what software + they should use to open them. + + * *Descriptive names*: In some papers there are many files and having + cryptic names will only confuse your readers (actually, yourself in + two years!). So set the names of the files to be as descriptive as + possible, so simply by reading the name of the file, someone who has + read the paper will understand what figure it corresponds to. In + particular, don't set names like `figure-3.txt`! In a few months you + will forget the order of the figures! Even worse, after the referee + report, you may need to re-arrange some figures and you will be forced + to rename everything related to each figure (which is very frustrating + and prone to errors). + + * *Good metadata*: Raw data are not too useful merely as a series of + numbers! So don't forget to have **good metadata in every file**. If + its a plain-text file, usually lines starting with a `#` are + ignored. So in the command that generates each dataset, add some extra + information about the dataset as lines starting with `#`. A minimal + set of recommended metadata are listed below. Feel free to add + more. You can use a configuration file to keep this information in one + place and automatically include them in all your output files. + + * *Project Title and authors*: This is very important to give a + general perspective of the figure. + + * *Links to project*: For example Zenodo-DOI, Journal-DOI (after it is + accepted), SoftwareHeritage page, arXiv-ID (or any other pre-print + server) and ofcourse, your Git repository. + + * *Commit hash* of the project that produced the dataset. This + directly links the dataset to a particular point in your project's + history. It is stored in the `$(project-commit-hash)` variable that + is defined in `initialize.mk`. So you can use it anywhere in your + project. + + * *Copyright as metadata*: people need to know if they can use the + dataset (i.e., modify it), or possibly re-distribute it and their + derived products. They also need to know how they can contact the + creator of the datset (who is usually also the copyright owner). So + as another metadata element, also add your name and email-address + (or the name of the person and email of the person who was in charge + of that part of the project), and the copyright license name and + standard link to the fully copyright license. + + - **Link to figure datasets in caption**: all the datasets that go into + the plots should be uploaded directly to Zenodo so they can be + viewed/downloaded with a simple link in the caption. For example see the + last sentence of the caption of Figure 1 in + [arXiv:2006.03018](https://arxiv.org/pdf/2006.03018.pdf), it points to + [the data](https://zenodo.org/record/3872248/files/tools-per-year.txt) + that was used to create that figure's top plot. As you see, this will + allow your paper's readers (again, most probably your future-self!) to + directly access the numbers of each visualization (plot/figure) with a + simple click in a trusted server. This also shows the major advantage of + having your data as simple plain-text where possible, as described + above. To help you keep all your to-be-visualized datasets in a single + place, Maneage has the two `tex-publish-dir` and `data-publish-dir` + directories that are defined in `reproduce/analysis/make/initialize.mk`, + see the comments above their definition for more. + + - **Verification step**: It is very important to automatically verify the + outptus of your project. Recall from the customization checklist (above) + that you can activate verification by setting the `verify-outputs` + variable to `yes` in `reproduce/analysis/config/verify-outputs.conf`. So + please activate it and look into the `reproduce/analysis/make/verify.mk` + to add the necessary steps to automatically verify your outputs. *Tip*: + you don't have to generate the checksums manually, just give a wrong + value (for example `XXXX`) so Maneage crashes! In the error message it + will then print the actual and expected checksums and you can take the + value from there. Outputs that must be verified can be listed as: + + * *subMakefile LaTeX macro files*: these LaTeX macros put numbers into + the text. You don't want your readers (actually: yourself in two + years!) to have to painfully find and check, by eye, all those tiny + numbers buried deep in the ocean of words! + + * *Final data files* (for tables, figures, or plots, or as data + release). These are the same files described above. If you have + followed the guidelines above and stored them as plain-text with + comments on top, you can use the provided function + `verify-txt-no-comments-leading-space` which takes the filename and + checksum as arguments to avoid the commented lines (which may change) + and only verify the data. If your data are in other formats, be sure + to verify them without metadata that may change (like date and etc). + + - **Fill `README.md`**: The `README.md` is *the first place* your readers + are going to look into. It already has a default text with place-holders + in the form of `XXXXXX`. Please go through it and replace the + place-holders with the relevant information/links or feel free to + add/remove anything else. Just don't forget to tell your readers in + `README.md` that they can learn about this system in the + `README-hacking.md` file (ideally close to the top, like it is now). + + - **Confirm if your project builds from scratch**: Before publishing + anything, you should see if your project can indeed reproduce itself! + So, go to a temporary directory, clone your project from its repository + and try configuring and building it from scratch in a new-temporary + build-directory. It is important to ignore the directory you developed + your project on (source and build): you may have files there that you + forgot to import into Git or depended on in the build (it + happens!). Ideally, it would be good to try it on a different computer. + + - **Confirm if `./project make dist` works**: The special target `dist` + tells the project to build a tarball that is ready to compile the LaTeX + PDF without having to do the analysis and build software. This is very + useful for servers like arXiv, or some journals. This tarball is also + one of the deliverables you want to publish on Zenodo. Once the tarball + is created, copy it to a temporary directory outside of Maneage, unpack + it and run `make` (completely ignoring Maneage's `./project` script). If + you plan to submit your paper to arXiv, the best test is to actually + start a test submission on arXiv to upload the tarball there to see if + it can build your PDF. Once it works, you can delete that temporary + submission for now. Afterwards, try configuring and building it with the + tarball by running its `./project` (from scratch and without the Git + history!). If there is a problem in any of these tests, you can modify + what goes into this tarball in `reproduce/analysis/make/initialize.mk`: + go through the steps and add the necessary components until the checks + pass. + + - **Upload all deliverables to Zenodo**: With the datasets ready, you can + now upload the following deliverables to Zenodo. Except for the data + files, put the Git hash of your Maneaged project at the moment of + publication in the filename of other uploaded files. The output files + shouldn't have a hash in their names because their URL (that goes in the + caption of the figures/tables) should be known prior to a commit, + creating a cyclic dependency! Ideally the hash should be placed just + before the final suffix, for example `paper-XXXXXXX.pdf` (where + `XXXXXXX` is the Git hash). This will clearly identify the point in + history that your file was created. + + * **paper-XXXXXXX.pdf**: you shouldn't just download data to the data + server, also upload your paper's PDF so its there with the other raw + formats. It will greatly help yourself and others. Most datacenters + (like Zenodo) actually also have a PDF viewer that will load + automatically before the list of data files. For example see + [zenodo.3408481](https://doi.org/10.5281/zenodo.3408481). + + * **`project-XXXXXXX.tar.gz`**: Or the output of `make dist` as + described above. + + * **`project-git.bundle`** This is the full Git history of the project + in one file (which you can actually clone from later!). Its + necessary to publish this with your dataset too because Git + repositories make no promise on longevity. The way to "bundle" a Git + history is described below, in summary, its this command: + ```shell + $ git bundle create my-project-git.bundle --all + ``` + + * **`software-XXXXXXX.tar.gz`**: This is effectively a copy of all the + software source code tarballs in your project's + `.build/software/tarballs`. It is necessary to upload these with + your project to avoid relying on third party servers. In the future + any one of those servers may go down and if so, your project won't + be buildable. You can generate this tarball easily with `make + dist-software`. + + * All the figure (and other) output datasets of the project. Don't + rename these files, let them have the same descriptive name + mentioned above. Also recall that a link to all these files is also + in the caption of the respective figure. + + - **Upload to [arXiv](https://arxiv.org)**: or to any other pre-print + server (if you want to). Of course, you can also do this after the + initial/final submission to your desired journal. But we'll just add the + necessary points for arXiv submission here: + + * *Necessary links in comments*: put a link to your project's Git + repository, Zenodo-DOI (this is not your paper's DOI, its the + data/resources DOI), and/or SoftwareHeritage link in the comments. + + - **Submission to a journal**: different journals accept submissions in + different formats, some accept LaTeX, some only want a PDF, or etc. It + would be good if you highlight in the cover-letter that your work is + reproducible and provide the Zenodo and Software Heritage links (if they + are public). If not, you can mention that everything is ready for such a + submission after acceptance. + + + + + + + Tips for designing your project =============================== @@ -1148,10 +1439,21 @@ for the benefit of others. scratch (from `./project configure` in a new build-directory). ```shell - # Go to the 'maneage' branch and import/inspect updates. + # Go to the 'maneage' branch and import updates. $ git checkout maneage $ git pull # Get recent work in Maneage - $ git log XXXXXX..XXXXXX --reverse # Inspect new work (replace XXXXXXs with hashs mentioned in output of previous command). + + # Read all the commit messages of the newly imported + # features/changes. In particular pay close attention to the ones + # starting with 'IMPORTANT': these may cause a crash in your + # project (changing something fundamental in Maneage). + # + # Replace the XXXXXXX..YYYYYYY with hashs mentioned close to start + # of the 'git pull' command outputs. + $ git log XXXXXXX..YYYYYYY --reverse + + # Have a look at the commits in the 'maneage' branch in relation + # with your project. $ git log --oneline --graph --decorate --all # General view of branches. # Go to your 'master' branch and import all the updates into @@ -1183,6 +1485,18 @@ for the benefit of others. # merge. In the commit message, Explain any conflicts that you # fixed. git commit + + # Do a clean build of your project (to check for "Semanic + # conflicts" (not detected as a conflict by Git, but may cause a + # crash in your project). You can backup your build directory + # before running the 'distclean' target. + + # Any error in the build will be due to changes in Maneage, so look + # closely at the commits (especially the + + ./project make distclean # will DELETE ALL your build-directory!! + ./project configure -e + ./project make ``` - *Adding Maneage to a fork of your project*: As you and your colleagues diff --git a/README.md b/README.md index 7216f1f..137f94a 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ Reproducible source for XXXXXXXXXXXXXXXXX -========================================= +------------------------------------------------------------------------- Copyright (C) 2018-2020 Mohammad Akhlaghi \ See the end of the file for license conditions. @@ -9,13 +9,15 @@ XXXXXX**", by XXXXX XXXXXX, YYYYYY YYYYY and ZZZZZZ ZZZZZ that is published in XXXXX XXXXX. To reproduce the results and final paper, the only dependency is a minimal -Unix-based building environment including a C compiler (already available -on your system if you have ever built and installed a software from source) -and a downloader (Wget or cURL). Note that **Git is not mandatory**: if you -don't have Git to run the first command below, go to the URL given in the -command on your browser, and download the project's source (there is a -button to download a compressed tarball of the project). If you have -received this source from arXiv, please see the respective section below. +Unix-based building environment including a C and C++ compiler (already +available on your system if you have ever built and installed a software +from source) and a downloader (Wget or cURL). Note that **Git is not +mandatory**: if you don't have Git to run the first command below, go to +the URL given in the command on your browser, and download the project's +source (there is a button to download a compressed tarball of the +project). If you have received this source from arXiv or Zenodo (without +any `.git` directory inside), please see the "Building project tarball" +section below. ```shell $ git clone XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX @@ -24,18 +26,15 @@ $ ./project configure $ ./project make ``` -To learn more about the purpose, principles and technicalities of this -reproducible paper, please see `README-hacking.md`. For a general -introduction to reproducible science as implemented in this project -(through Maneage), please see Maneage project's webpage at -https://maneage.org. +This paper is made reproducible using Maneage (MANaging data linEAGE). To +learn more about its purpose, principles and technicalities, please see +`README-hacking.md`, or the Maneage webpage at https://maneage.org. -Building the project --------------------- +### Building the project This project was designed to have as few dependencies as possible without requiring root/administrator permissions. @@ -52,14 +51,15 @@ requiring root/administrator permissions. a directory given at configuration time), they will be used. Otherwise, a downloader (`wget` or `curl`) will be necessary to download any necessary tarball. The necessary tarballs are also - collected in the archived project on Zenodo (link below) [[TO - AUTHORS: UPLOAD THE SOFTWARE TARBALLS WITH YOUR DATA AND PROJECT - SOURCE TO ZENODO OR OTHER SIMILAR SERVICES. THEN ADD THE DOI/LINK - HERE.DON'T FORGET THAT THE SOFTWARE ARE A CRITICAL PART OF YOUR - WORK.]]. Just unpack that tarball, and when `./project configure` - asks for the "software tarball directory", give the address of the - unpacked directory that has all the tarballs. - https://doi.org/10.5281/zenodo.3408481 + collected in the archived project on + [https://doi.org/10.5281/zenodo.XXXXXXX](XXXXXXX). Just unpack that + tarball and you should see all the tarballs of this project's + software. When `./project configure` asks for the "software tarball + directory", give the address of the unpacked directory that has all + the tarballs. [[TO AUTHORS: UPLOAD THE SOFTWARE TARBALLS WITH YOUR + DATA AND PROJECT SOURCE TO ZENODO OR OTHER SIMILAR SERVICES. THEN + ADD THE DOI/LINK HERE. DON'T FORGET THAT THE SOFTWARE ARE A + CRITICAL PART OF YOUR WORK'S REPRODUCIBILITY.]] 2. Configure the environment (top-level directories in particular) and build all the necessary software for use in the next step. It is @@ -86,15 +86,33 @@ requiring root/administrator permissions. -Source from arXiv ------------------ + + + + + +### Building project tarball (possibly from arXiv) + If the paper is also published on arXiv, it is highly likely that the authors also uploaded/published the full project there along with the LaTeX sources. If you have downloaded (or plan to download) this source from -arXiv, some minor extra steps are necessary: +arXiv, some minor extra steps are necessary as listed below. This is +because this tarball is mainly tailored to automatic creation of the final +PDF without using Maneage (only calling LaTeX, not using the './project' +command)! + +You can directly run 'latex' on this directory and the paper will be built +with no analysis (all necessary built products are already included in the +tarball). One important feature of the tarball is that it has an extra +`Makefile` to allow easy building of the PDF paper without worring about +the exact LaTeX and bibliography software commands. -1. If the arXiv code for the paper is 1234.56789, then the downloaded - source will be called `1234.56789` (no special identification + + +#### Only building PDF using tarball (no analysis) + +1. If you got the tarball from arXiv and the arXiv code for the paper is + 1234.56789, then the downloaded source will be called `1234.56789` (no suffix). However, it is actually a `.tar.gz` file. So take these steps to unpack it to see its contents. @@ -106,15 +124,53 @@ arXiv, some minor extra steps are necessary: $ tar xf ../$arxiv.tar.gz ``` -2. arXiv removes the executable flag from the files (for its own - security). So before following the standard procedure of projects - described in the sections above, its necessary to set the executable - flag of the main project management file with this command: +2. No matter how you got the tarball, if you just want to build the PDF + paper, simply run the command below. Note that this won't actually + install any software or do any analysis, it will just use your host + operating system (assuming you already have a LaTeX installation and all + the necessary LaTeX packages) to build the PDF using the already-present + plots data. + + ```shell + $ make # Build PDF in tarball without doing analysis + ``` + +3. If you want to re-build the figures from scratch, you need to make the + following corrections to the paper's main LaTeX source (`paper.tex`): + uncomment (remove the starting `%`) the line containing + `\newcommand{\makepdf}{}`, see the comments above it for more. + + + +#### Building full project from tarball (custom software and analysis) + +As described above, the tarball is mainly geared to only building the final +PDF. A few small tweaks are necessary to build the full project from +scratch (download necessary software and data, build them and run the +analysis and finally create the final paper). + +1. If you got the tarball from arXiv, before following the standard + procedure of projects described at the top of the file above (using the + `./project` script), its necessary to set its executable flag because + arXiv removes the executable flag from the files (for its own security). ```shell $ chmod +x project ``` +2. Make the following changes in two of the LaTeX files so LaTeX attempts + to build the figures from scratch (to make the tarball; it was + configured to avoid building the figures, just using the ones that came + with the tarball). + + - `paper.tex`: uncomment (remove the starting `%`) of the line + containing `\newcommand{\makepdf}{}`, see the comments above it for + more. + + - `tex/src/preamble-pgfplots.tex`: set the `tikzsetexternalprefix` + variable value to `tikz/`, so it looks like this: + `\tikzsetexternalprefix{tikz/}`. + 3. Remove extra files. In order to make sure arXiv can build the paper (resolve conflicts due to different versions of LaTeX packages), it is sometimes necessary to copy raw LaTeX package files in the tarball @@ -125,38 +181,14 @@ arXiv, some minor extra steps are necessary: ```shell $ ls - COPYING paper.tex project README-hacking.md README.md reproduce tex - ``` - -4. To build the figures from scratch, you need to make the following - corrections to the LaTeX source files below. - - 4.1: `paper.tex`: uncomment (remove the starting `%`) of the line - containing `\newcommand{\makepdf}{}`. See the comments above it - for more information. - - 4.2: `tex/src/preamble-pgfplots.tex`: set the `tikzsetexternalprefix` - variable value to `tikz/`, so it looks like this: - `\tikzsetexternalprefix{tikz/}`. - -5. In order to let arXiv build the LaTeX paper without bothering to run the - analysis pipeline it was necessary to create and fill the two - `tex/build` and `tex/tikz` subdirectories. But to do a clean build of - the project, it is necessary for these to be symbolic links to the build - directory. So when you are first configuring the project, run it with - `--clean-texdir` (only once is enough, they will be deleted permanently - after that), for example: - - ```shell - $ ./project configure --clean-texdir + COPYING paper.tex project README-hacking.md README.md reproduce/ tex/ ``` -Copyright information ---------------------- +### Copyright information This file and `.file-metadata` (a binary file, used by Metastore to store file dates when doing Git checkouts) are part of the reproducible project diff --git a/paper.tex b/paper.tex index 967728f..0993e73 100644 --- a/paper.tex +++ b/paper.tex @@ -31,7 +31,14 @@ %% LaTeX style file, you will probably not need to set them, and can also %% replace this "Title and author information" section with the journal's %% preferred format. -\title{\large \uppercase{The paper's title goes here}} +% +%% NOTE ON TITLE: The title of the project should also be printed as +%% metadata in all output files. So it is defined with other core project +%% metadata in 'reproduce/analysis/config/metadata.conf'. That value is +%% then written in the '\projectitle' LaTeX macro and directly used +%% here. So please set your project's title in that Makefile with other +%% basic information. +\title{\large \uppercase{\projecttitle}} \author[1]{Your name} \author[2]{Coauthor one} \author[1,3]{Coauthor two} @@ -103,29 +110,29 @@ or in this way, will let you focus clearly on your science and not have to worry about fixing this or that number/name in the text. -Figure \ref{delete-me} shows a simple plot as a demonstration of creating +Figure \ref{squared} shows a simple plot as a demonstration of creating plots within \LaTeX{} (using the {\small PGFP}lots package). The minimum value in this distribution is $\deletememin$, and $\deletememax$ is the maximum. Take a look into the \LaTeX{} source and you'll see these numbers are actually macros that were calculated from the same dataset (they will change if the dataset, or function that produced it, changes). -The individual {\small PDF} file of Figure \ref{delete-me} is available -under the \texttt{tex/tikz/} directory of your build directory. You can use -this PDF file in other contexts (for example in slides showing your -progress or after publishing the work). If you want to directly use the -{\small PDF} file in the figure without having to let {\small T}i{\small - KZ} decide if it should be remade or not, you can also comment the -\texttt{makepdf} macro at the top of this \LaTeX{} source file. +The individual {\small PDF} file of Figure \ref{squared} is available under +the \texttt{tex/tikz/} directory of your build directory. You can use this +PDF file in other contexts (for example in slides showing your progress or +after publishing the work). If you want to directly use the {\small PDF} +file in the figure without having to let {\small T}i{\small KZ} decide if +it should be remade or not, you can also comment the \texttt{makepdf} macro +at the top of this \LaTeX{} source file. \begin{figure}[t] - \includetikz{delete-me} + \includetikz{delete-me-squared}{width=\linewidth} - \captionof{figure}{\label{delete-me} A very basic $X^2$ plot for + \captionof{figure}{\label{squared} A very basic $X^2$ plot for demonstration.} \end{figure} -Figure \ref{delete-me-demo} is another demonstration of showing images +Figure \ref{image-histogram} is another demonstration of showing images (datasets) using PGFPlots. It shows a small crop of an image from the Wide-Field Planetary Camera 2, on board the Hubble Space Telescope from 1993 to 2009. This cropped image is one of the sample FITS files from the @@ -134,7 +141,7 @@ webpage\footnote{\url{https://fits.gsfc.nasa.gov/fits_samples.html}}. Just as another basic reporting of measurements on this dataset within the paper without using numbers in the \LaTeX{} source, the mean is $\deletemewfpctwomean$ and the median is $\deletemewfpctwomedian$. The -skewness in the histogram of Figure \ref{delete-me-demo}(b) explains this +skewness in the histogram of Figure \ref{image-histogram}(b) explains this difference between the mean and median. The dataset was prepared for demonstration here with Gnuastro's \textsf{Convert\-Type} program and the histogram and basic statstics were generated with Gnuastro's @@ -172,13 +179,13 @@ new co-authors (who don't want to be distracted by these issues in their first time reading). \begin{figure}[t] - \includetikz{delete-me-demo} + \includetikz{delete-me-image-histogram}{width=\linewidth} - \captionof{figure}{\label{delete-me-demo} (a) An example image of the - Wide-Field Planetary Camera 2, on board the Hubble Space Telescope from - 1993 to 2009. This is one of the sample images from the FITS standard - webpage, kept as examples for this file format. (b) Histogram of pixel - values in (a).} + \captionof{figure}{\label{image-histogram} (a) An example image + of the Wide-Field Planetary Camera 2, on board the Hubble Space + Telescope from 1993 to 2009. This is one of the sample images from the + FITS standard webpage, kept as examples for this file format. (b) + Histogram of pixel values in (a).} \end{figure} diff --git a/project b/project index 6877b5a..d14e3b0 100755 --- a/project +++ b/project @@ -90,7 +90,8 @@ Project 'make' special features. ./project make dist Produce a LaTeX-ready-to-build distribution tarball ('tar.gz') of the project. This is ready to be uploaded to servers like 'arXiv.org'. - ./project make dist-zip Similar to 'dist', but compress with '.zip'. + ./project make dist-lzip Similar to 'dist', but compress to '.tar.lz'. + ./project make dist-zip Similar to 'dist', but compress to '.zip'. With the options below you can modify the default behavior. Configure options: diff --git a/reproduce/analysis/config/delete-me-num.conf b/reproduce/analysis/config/delete-me-num.conf deleted file mode 100644 index a0260b8..0000000 --- a/reproduce/analysis/config/delete-me-num.conf +++ /dev/null @@ -1,9 +0,0 @@ -# Number of samples in the demonstration analysis (to be deleted). -# -# Copyright (C) 2019-2020 Mohammad Akhlaghi -# -# Copying and distribution of this file, with or without modification, are -# permitted in any medium without royalty provided the copyright notice and -# this notice are preserved. This file is offered as-is, without any -# warranty. -delete-me-num = 50 diff --git a/reproduce/analysis/config/delete-me-squared-num.conf b/reproduce/analysis/config/delete-me-squared-num.conf new file mode 100644 index 0000000..c86f841 --- /dev/null +++ b/reproduce/analysis/config/delete-me-squared-num.conf @@ -0,0 +1,9 @@ +# Number of samples in the demonstration analysis (to be deleted). +# +# Copyright (C) 2019-2020 Mohammad Akhlaghi +# +# Copying and distribution of this file, with or without modification, are +# permitted in any medium without royalty provided the copyright notice and +# this notice are preserved. This file is offered as-is, without any +# warranty. +delete-me-squared-num = 50 diff --git a/reproduce/analysis/config/metadata.conf b/reproduce/analysis/config/metadata.conf new file mode 100644 index 0000000..533d927 --- /dev/null +++ b/reproduce/analysis/config/metadata.conf @@ -0,0 +1,25 @@ +# Project meta-data that can be used in a project's output datasets and +# final paper. Please set the values here and use them in your analysis or +# paper, don't repeat them +# +# Copyright (C) 2020 Mohammad Akhlaghi +# +# Copying and distribution of this file, with or without modification, are +# permitted in any medium without royalty provided the copyright notice and +# this notice are preserved. This file is offered as-is, without any +# warranty. + +# Project information +metadata-title = The project title goes here + +# DOIs and identifiers. +metadata-arxiv = +metadata-doi-zenodo = +metadata-doi-journal = +metadata-doi = $(metadata-doi-journal) +metadata-git-repository = http://git.maneage.org/project.git + +# DATA Copyright owner and license information. +metadata-copyright-owner = Mohammad Akhlaghi +metadata-copyright = Creative Commons Attribution-ShareAlike (CC BY-SA) +metadata-copyright-url = https://creativecommons.org/licenses/by-sa/4.0 diff --git a/reproduce/analysis/make/delete-me.mk b/reproduce/analysis/make/delete-me.mk index fa16102..f45f9ea 100644 --- a/reproduce/analysis/make/delete-me.mk +++ b/reproduce/analysis/make/delete-me.mk @@ -22,18 +22,40 @@ # Dummy dataset # ------------- # -# We will use AWK to generate a table showing X and X^2 and draw its plot. -delete-numdir = $(texdir)/delete-me-num -delete-num = $(delete-numdir)/data.txt -$(delete-numdir): | $(texdir); mkdir $@ -$(delete-num): $(pconfdir)/delete-me-num.conf | $(delete-numdir) +# Just as a demonstration(!): we will use AWK to generate a table showing X +# and X^2 and draw its plot. +# +# Note that this dataset is directly read by LaTeX to generate a plot, so +# we need to put it in the $(tex-publish-dir) directory. +dm-squared = $(tex-publish-dir)/squared.txt +$(dm-squared): $(pconfdir)/delete-me-squared-num.conf | $(tex-publish-dir) # When the plotted values are re-made, it is necessary to also - # delete the TiKZ externalized files so the plot is also re-made. - rm -f $(tikzdir)/delete-me.pdf + # delete the TiKZ externalized files so the plot is also re-made by + # PGFPlots. + rm -f $(tikzdir)/delete-me-squared.pdf + + # Write the column metadata in a temporary file name (appending + # '.tmp' to the actual target name). Once all steps are done, it is + # renamed to the final target. We do this because if there is an + # error in the middle, Make will not consider the job to be + # complete and will stop here. + echo "# Data for demonstration plot of default Maneage (MANaging data linEAGE)." > $@.tmp + echo "# It is a simple plot, showing the power of two: y=x^2! " >> $@.tmp + echo "# " >> $@.tmp + echo "# Column 1: X [arbitrary, f32] The horizontal axis numbers." \ + >> $@.tmp + echo "# Column 2: X_POW2 [arbitrary, f32] The horizontal axis to the power of two." \ + >> $@.tmp + echo "# " >> $@.tmp + $(call print-copyright, $@.tmp) # Generate the table of random values. - awk 'BEGIN {for(i=1;i<=$(delete-me-num);i+=0.5) print i, i*i; }' > $@ + awk 'BEGIN {for(i=1;i<=$(delete-me-squared-num);i+=0.5) \ + printf("%-8.1f%.2f\n", i, i*i); }' >> $@.tmp + + # Write it into the final target + mv $@.tmp $@ @@ -44,14 +66,14 @@ $(delete-num): $(pconfdir)/delete-me-num.conf | $(delete-numdir) # # For an example image, we'll make a PDF copy of the WFPC II image to # display in the paper. -delete-demodir = $(texdir)/delete-me-demo -$(delete-demodir): | $(texdir); mkdir $@ -delete-pdf = $(delete-demodir)/wfpc2.pdf -$(delete-pdf): $(delete-demodir)/%.pdf: $(indir)/%.fits | $(delete-demodir) +dm-histdir = $(texdir)/image-histogram +$(dm-histdir): | $(texdir); mkdir $@ +dm-img-pdf = $(dm-histdir)/wfpc2.pdf +$(dm-img-pdf): $(dm-histdir)/%.pdf: $(indir)/%.fits | $(dm-histdir) # When the plotted values are re-made, it is necessary to also # delete the TiKZ externalized files so the plot is also re-made. - rm -f $(tikzdir)/delete-me-wfpc2.pdf + rm -f $(tikzdir)/delete-me-image-histogram.pdf # Convert the dataset to a PDF. astconvertt --colormap=gray --fluxhigh=4 $< -h0 -o$@ @@ -63,17 +85,35 @@ $(delete-pdf): $(delete-demodir)/%.pdf: $(indir)/%.fits | $(delete-demodir) # Histogram of WFPC2 image # ------------------------ # -# For an example plot, we'll show the pixel value histogram also. -delete-histogram = $(delete-demodir)/wfpc2-hist.txt -$(delete-histogram): $(delete-demodir)/%-hist.txt: $(indir)/%.fits \ - | $(delete-demodir) +# For an example plot, we'll show the pixel value histogram also. IMPORTANT +# NOTE: because this histogram contains data that is included in a plot, we +# should publish it, so it will go into the $(tex-publish-dir). +dm-img-histogram = $(tex-publish-dir)/wfpc2-histogram.txt +$(dm-img-histogram): $(tex-publish-dir)/%-histogram.txt: $(indir)/%.fits \ + | $(tex-publish-dir) # When the plotted values are re-made, it is necessary to also # delete the TiKZ externalized files so the plot is also re-made. - rm -f $(tikzdir)/delete-me-wfpc2.pdf + rm -f $(tikzdir)/delete-me-image-histogram.pdf + + # Generate the pixel value histogram. + aststatistics --lessthan=5 $< -h0 --histogram -o$@.data + + # Put a two-line description of the dataset, copy the column + # metadata from '$@.data', and add copyright. + echo "# Histogram of example image to demonstrate Maneage (MANaging data linEAGE)." \ + > $@.tmp + echo "# Example image URL: $(WFPC2URL)/$(WFPC2IMAGE)" >> $@.tmp + echo "# " >> $@.tmp + awk '/^# Column .:/' $@.data >> $@.tmp + echo "# " >> $@.tmp + $(call print-copyright, $@.tmp) - # Generate the pixel value distribution - aststatistics --lessthan=5 $< -h0 --histogram -o$@ + # Add the column numbers in a formatted manner, rename it to the + # output and clean up. + awk '!/^#/{printf("%-15.4f%d\n", $$1, $$2)}' $@.data >> $@.tmp + mv $@.tmp $@ + rm $@.data @@ -84,9 +124,9 @@ $(delete-histogram): $(delete-demodir)/%-hist.txt: $(indir)/%.fits \ # # This is just as a demonstration on how to get analysic configuration # parameters from variables defined in `reproduce/analysis/config/'. -delete-stats = $(delete-demodir)/wfpc2-stats.txt -$(delete-stats): $(delete-demodir)/%-stats.txt: $(indir)/%.fits \ - | $(delete-demodir) +dm-img-stats = $(dm-histdir)/wfpc2-stats.txt +$(dm-img-stats): $(dm-histdir)/%-stats.txt: $(indir)/%.fits \ + | $(dm-histdir) aststatistics $< -h0 --mean --median > $@ @@ -100,11 +140,11 @@ $(delete-stats): $(delete-demodir)/%-stats.txt: $(indir)/%.fits \ # # NOTE: In LaTeX you cannot use any non-alphabetic character in a variable # name. -$(mtexdir)/delete-me.tex: $(delete-num) $(delete-pdf) $(delete-histogram) \ - $(delete-stats) +$(mtexdir)/delete-me.tex: $(dm-squared) $(dm-img-pdf) $(dm-img-histogram) \ + $(dm-img-stats) # Write the number of random values used. - echo "\newcommand{\deletemenum}{$(delete-me-num)}" > $@ + echo "\newcommand{\deletemenum}{$(delete-me-squared-num)}" > $@ # Note that since Make variables start with a `$(', if you want to # use `$' within the shell (not Make), you have to quote any @@ -116,14 +156,14 @@ $(mtexdir)/delete-me.tex: $(delete-num) $(delete-pdf) $(delete-histogram) \ # macro definition. mm=$$(awk 'BEGIN{min=99999; max=-min} !/^#/{if($$2>max) max=$$2; if($$2> $@ v=$$(echo "$$mm" | awk '{printf "%.3f", $$2}'); echo "\newcommand{\deletememax}{$$v}" >> $@ # Write the statistics of the WFPC2 image as a macro. - mean=$$(awk '{printf("%.2f", $$1)}' $(delete-stats)) + mean=$$(awk '{printf("%.2f", $$1)}' $(dm-img-stats)) echo "\newcommand{\deletemewfpctwomean}{$$mean}" >> $@ - median=$$(awk '{printf("%.2f", $$2)}' $(delete-stats)) + median=$$(awk '{printf("%.2f", $$2)}' $(dm-img-stats)) echo "\newcommand{\deletemewfpctwomedian}{$$median}" >> $@ diff --git a/reproduce/analysis/make/initialize.mk b/reproduce/analysis/make/initialize.mk index 4e317bb..19447a6 100644 --- a/reproduce/analysis/make/initialize.mk +++ b/reproduce/analysis/make/initialize.mk @@ -202,6 +202,16 @@ $(lockdir): | $(BDIR); mkdir $@ +# Version and distribution tarball definitions +project-commit-hash := $(shell if [ -d .git ]; then \ + echo $$(git describe --dirty --always --long); else echo NOGIT; fi) +project-package-name := maneaged-$(project-commit-hash) +project-package-contents = $(texdir)/$(project-package-name) + + + + + # High-level Makefile management # ------------------------------ # @@ -212,11 +222,8 @@ $(lockdir): | $(BDIR); mkdir $@ # we want to ensure that the file is always built in every run: it contains # the project version which may change between two separate runs, even when # no file actually differs. -packagebasename := $(shell if [ -d .git ]; then \ - echo paper-$$(git describe --dirty --always --long); else echo NOGIT; fi) -packagecontents = $(texdir)/$(packagebasename) -.PHONY: all clean dist dist-zip distclean clean-mmap $(packagecontents) \ - $(mtexdir)/initialize.tex +.PHONY: all clean dist dist-zip dist-lzip distclean clean-mmap \ + $(project-package-contents) $(mtexdir)/initialize.tex # --------- Delete for no Gnuastro --------- clean-mmap:; rm -f reproduce/config/gnuastro/mmap* @@ -260,11 +267,11 @@ distclean: clean # that is ready for building the final PDF with LaTeX. This is useful for # collaborators who only want to contribute to the text of your project, # without having to worry about the technicalities of the analysis. -$(packagecontents): paper.pdf | $(texdir) +$(project-package-contents): paper.pdf | $(texdir) # Set up the output directory, delete it if it exists and remake it # to fill with new contents. - dir=$(texdir)/$(packagebasename) + dir=$@ rm -rf $$dir mkdir $$dir @@ -298,7 +305,7 @@ $(packagecontents): paper.pdf | $(texdir) cp -r tex/src $$dir/tex/src cp tex/tikz/*.pdf $$dir/tex/tikz cp -r reproduce/* $$dir/reproduce - cp -r tex/build/!(paper-v*) $$dir/tex/build + cp -r tex/build/!($(project-package-name)) $$dir/tex/build # Clean up un-necessary/local files: 1) the $(texdir)/build* # directories (when building in a group structure, there will be @@ -337,32 +344,113 @@ $(packagecontents): paper.pdf | $(texdir) # Clean temporary (currently those ending in `~') files. cd $(texdir) - find $(packagebasename) -name \*~ -delete - find $(packagebasename) -name \*.swp -delete + find $(project-package-name) -name \*~ -delete + find $(project-package-name) -name \*.swp -delete # PROJECT SPECIFIC # ---------------- # Put any project specific distribution steps here. # ---------------- -# Package into `.tar.gz'. -dist: $(packagecontents) +# Package into `.tar.gz' or '.tar.lz'. +dist dist-lzip: $(project-package-contents) curdir=$$(pwd) cd $(texdir) - tar -cf $(packagebasename).tar $(packagebasename) - gzip -f --best $(packagebasename).tar - rm -rf $(packagebasename) + tar -cf $(project-package-name).tar $(project-package-name) + if [ $@ = dist ]; then + suffix=gz + gzip -f --best $(project-package-name).tar + elif [ $@ = dist-lzip ]; then + suffix=lz + lzip -f --best $(project-package-name).tar + fi + rm -rf $(project-package-name) cd $$curdir - mv $(texdir)/$(packagebasename).tar.gz ./ + mv $(texdir)/$(project-package-name).tar.$$suffix ./ # Package into `.zip'. -dist-zip: $(packagecontents) +dist-zip: $(project-package-contents) curdir=$$(pwd) cd $(texdir) - zip -q -r $(packagebasename).zip $(packagebasename) - rm -rf $(packagebasename) + zip -q -r $(project-package-name).zip $(project-package-name) + rm -rf $(project-package-name) + cd $$curdir + mv $(texdir)/$(project-package-name).zip ./ + +# Package the software tarballs. +dist-software: + curdir=$$(pwd) + cd $(BDIR) + if [ -d .git ]; then + dirname="software-$$(git describe --dirty --always --long)" + else + dirname="software-NOGIT"; + fi + mkdir $$dirname + cp -L software/tarballs/* $$dirname/ + tar -cf $$dirname.tar $$dirname + gzip -f --best $$dirname.tar + rm -rf $$dirname cd $$curdir - mv $(texdir)/$(packagebasename).zip ./ + mv $(BDIR)/$$dir.tar.gz ./ + + + + + +# Directory containing to-be-published datasets +# --------------------------------------------- +# +# Its good practice (so you don't forget in the last moment!) to have all +# the plot/figure/table data that you ultimately want to publish in a +# single directory. +# +# There are two types of to-publish data in the project. +# +# 1. Those data that also go into LaTeX (for example to give to LateX's +# PGFPlots package to create the plot internally) should be under the +# '$(BDIR)/tex' directory (because other LaTeX producers may also need +# it for example when using './project make dist'). The contents of +# this directory are directly taken into the tarball. +# +# 2. The data that aren't included directly in the LaTeX run of the paper, +# can be seen as supplements. A good place to keep them is under your +# build-directory. +# +# RECOMMENDATION: don't put the figure/plot/table number in the names of +# your to-be-published datasets! Given them a descriptive/short name that +# would be clear to anyone who has read the paper. Later, in the caption +# (or paper's tex/appendix), you will put links to the dataset on servers +# like Zenodo (see the "Publication checklist" in 'README-hacking.md'). +tex-publish-dir = $(texdir)/to-publish +data-publish-dir = $(BDIR)/data-to-publish +$(tex-publish-dir):; mkdir $@ +$(data-publish-dir):; mkdir $@ + + + + + +# Print Copyright statement +# ------------------------- +# +# This statement can be used in published datasets that are in plain-text +# format. It assumes you have already put the data-specific statements in +# its first argument, it will supplement them with general project links. +print-copyright = \ + echo "\# Project title: $(metadata-title)" >> $(1); \ + echo "\# Git commit (that produced this dataset): $(project-commit-hash)" >> $(1); \ + echo "\# Project's Git repository: $(metadata-git-repository)" >> $(1); \ + if [ x$(metadata-arxiv) != x ]; then \ + echo "\# Pre-print server: arXiv:$(metadata-arxiv)" >> $(1); fi; \ + if [ x$(metadata-doi-journal) != x ]; then \ + echo "\# DOI (Journal): $(metadata-doi-journal)" >> $(1); fi; \ + if [ x$(metadata-doi-zenodo) != x ]; then \ + echo "\# DOI (Zenodo): $(metadata-doi-zenodo)" >> $(1); fi; \ + echo "\#" >> $(1); \ + echo "\# Copyright (C) $$(date +%Y) $(metadata-copyright-owner)" >> $(1); \ + echo "\# Dataset is available under $(metadata-copyright)." >> $(1); \ + echo "\# License URL: $(metadata-copyright-url)" >> $(1); @@ -377,7 +465,6 @@ dist-zip: $(packagecontents) # actually exists, it is also aded as a `.PHONY' target above. $(mtexdir)/initialize.tex: | $(mtexdir) - # Version of the project. - @if [ -d .git ]; then v=$$(git describe --dirty --always --long); - else v=NO-GIT; fi - echo "\newcommand{\projectversion}{$$v}" > $@ + # Version and title of project. + echo "\newcommand{\projecttitle}{$(metadata-title)}" > $@ + echo "\newcommand{\projectversion}{$(project-commit-hash)}" >> $@ diff --git a/reproduce/analysis/make/verify.mk b/reproduce/analysis/make/verify.mk index 43d1472..67b3fea 100644 --- a/reproduce/analysis/make/verify.mk +++ b/reproduce/analysis/make/verify.mk @@ -40,22 +40,34 @@ verify-print-tips = \ echo "the following project source file:"; \ echo " reproduce/analysis/make/verify.mk" -verify-txt-no-comments-leading-space = \ +# Removes following components of a plain-text file, calculates checksum +# and compares with given checksum: +# - All commented lines (starting with '#') are removed. +# - All empty lines are removed. +# - All space-characters in remaining lines are removed (so the width of +# the printed columns won't invalidate the verification). +# +# It takes three arguments: +# - First argument: Full address of file to check. +# - Second argument: Expected checksum of the file to check. +# - File name to write result. +verify-txt-no-comments-no-space = \ infile=$(strip $(1)); \ inchecksum=$(strip $(2)); \ + innobdir=$$(echo $$infile | sed -e's|$(BDIR)/||g'); \ if ! [ -f "$$infile" ]; then \ $(call verify-print-error-start); \ echo "The following file (that should be verified) doesn't exist:"; \ echo " $$infile"; \ echo; exit 1; \ fi; \ - checksum=$$(sed -e 's/^[[:space:]]*//g' \ + checksum=$$(sed -e 's/[[:space:]][[:space:]]*//g' \ -e 's/\#.*$$//' \ -e '/^$$/d' $$infile \ - | md5sum \ - | awk '{print $$1}'); \ + | md5sum \ + | awk '{print $$1}'); \ if [ x"$$inchecksum" = x"$$checksum" ]; then \ - echo "Verified: $$infile"; \ + echo "%% (VERIFIED) $$checksum $$innobdir" >> $(3); \ else \ $(call verify-print-error-start); \ $(call verify-print-tips); \ @@ -105,11 +117,20 @@ $(mtexdir)/verify.tex: $(foreach s, $(verify-dep), $(mtexdir)/$(s).tex) # Make sure that verification is actually requested. if [ x"$(verify-outputs)" = xyes ]; then + # Make sure the temporary output doesn't exist (because we want + # to append to it). We are making a temporary output target so if + # there is a crash in the middle, Make will not continue. If we + # write in the final target progressively, the file will exist, + # and its date will be more recent than all prerequisites, so + # next time the project is run, Make will continue and ignore the + # rest of the checks. + rm -f $@.tmp + # Verify the figure datasets. - $(call verify-txt-no-comments-leading-space, \ - $(delete-num), ad345e873e6af577f0e4e7c8942cdf08) - $(call verify-txt-no-comments-leading-space, \ - $(delete-histogram), 12a81c4c8c5f552e5ed5686453587fe8) + $(call verify-txt-no-comments-no-space, \ + $(dm-squared), 6b6d3b0f9c351de53606507b59bca5d1, $@.tmp) + $(call verify-txt-no-comments-no-space, \ + $(dm-img-histogram), b1f9c413f915a1ad96078fee8767b16c, $@.tmp) # Verify TeX macros (the values that go into the PDF text). for m in $(verify-check); do @@ -118,9 +139,11 @@ $(mtexdir)/verify.tex: $(foreach s, $(verify-dep), $(mtexdir)/$(s).tex) elif [ $$m == delete-me ]; then s=711e2f7fa1f16ecbeeb3df6bcb4ec705 else echo; echo "'$$m' not recognized."; exit 1 fi - $(call verify-txt-no-comments-leading-space, $$file, $$s) + $(call verify-txt-no-comments-no-space, $$file, $$s, $@.tmp) done - fi - # Make an empty final target. - touch $@ + # Move temporary file to final target. + mv $@.tmp $@ + else + echo "% Verification was DISABLED!" > $@ + fi diff --git a/reproduce/software/shell/configure.sh b/reproduce/software/shell/configure.sh index 5cf813b..789ddd5 100755 --- a/reproduce/software/shell/configure.sh +++ b/reproduce/software/shell/configure.sh @@ -1046,6 +1046,21 @@ tikzdir=$texbdir/tikz if ! [ -d $tikzdir ]; then mkdir $tikzdir; fi +# If 'tex/build' and 'tex/tikz' are symbolic links then 'rm -f' will delete +# them and we can continue. However, when the project is being built from +# the tarball, these two are not symbolic links but actual directories with +# the necessary built-components to build the PDF in them. In this case, +# because 'tex/build' is a directory, 'rm -f' will fail, so we'll just +# rename the two directories (as backup) and let the project build the +# proper symbolic links afterwards. +if rm -f tex/build; then + rm -f tex/tikz +else + mv tex/tikz tex/tikz-from-tarball + mv tex/build tex/build-from-tarball +fi + + # Set the symbolic links for easy access to the top project build # directories. Note that these are put in each user's source/cloned # directory, not in the build directory (which can be shared between many @@ -1054,7 +1069,7 @@ if ! [ -d $tikzdir ]; then mkdir $tikzdir; fi # Note: if we don't delete them first, it can happen that an extra link # will be created in each directory that points to its parent. So to be # safe, we are deleting all the links on each re-configure of the project. -rm -f .build .local tex/build tex/tikz .gnuastro +rm -f .build .local .gnuastro ln -s $bdir .build ln -s $instdir .local ln -s $texdir tex/build diff --git a/tex/src/delete-me-demo.tex b/tex/src/delete-me-demo.tex deleted file mode 100644 index 1fde25d..0000000 --- a/tex/src/delete-me-demo.tex +++ /dev/null @@ -1,51 +0,0 @@ -%% Plot the demonstration image and its histogram. -% -%% Copyright (C) 2019-2020 Mohammad Akhlaghi -% -%% This file is free software: you can redistribute it and/or modify it -%% under the terms of the GNU General Public License as published by the -%% Free Software Foundation, either version 3 of the License, or (at your -%% option) any later version. -% -%% This file is distributed in the hope that it will be useful, but WITHOUT -%% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or -%% FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License -%% for more details. -% -%% You should have received a copy of the GNU General Public License along -%% with this file. If not, see . - -\begin{tikzpicture} - - %% The displayed WFPC2 image. - \node[anchor=south west] (img) at (0,0) - {\includegraphics[width=0.5\linewidth] - {tex/build/delete-me-demo/wfpc2.pdf}}; - - %% Its label - \node[anchor=south west] at (0.45\linewidth,0.45\linewidth) - {\textcolor{white}{a}}; - - %% This histogram. - \begin{axis}[at={(0.52\linewidth,0.1\linewidth)}, - no markers, - axis on top, - xmode=normal, - ymode=normal, - yticklabels={}, - scale only axis, - xlabel=Pixel value, - width=0.5\linewidth, - height=0.412\linewidth, - enlarge y limits=false, - enlarge x limits=false, - ] - \addplot [const plot mark mid, fill=red] - table [x index=0, y index=1] - {tex/build/delete-me-demo/wfpc2-hist.txt} - \closedcycle; - \end{axis} - - %% The histogram's label - \node[anchor=south west] at (0.95\linewidth,0.45\linewidth) {b}; -\end{tikzpicture} diff --git a/tex/src/delete-me-image-histogram.tex b/tex/src/delete-me-image-histogram.tex new file mode 100644 index 0000000..8d62892 --- /dev/null +++ b/tex/src/delete-me-image-histogram.tex @@ -0,0 +1,51 @@ +%% Plot the demonstration image and its histogram. +% +%% Copyright (C) 2019-2020 Mohammad Akhlaghi +% +%% This file is free software: you can redistribute it and/or modify it +%% under the terms of the GNU General Public License as published by the +%% Free Software Foundation, either version 3 of the License, or (at your +%% option) any later version. +% +%% This file is distributed in the hope that it will be useful, but WITHOUT +%% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +%% FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +%% for more details. +% +%% You should have received a copy of the GNU General Public License along +%% with this file. If not, see . + +\begin{tikzpicture} + + %% The displayed WFPC2 image. + \node[anchor=south west] (img) at (0,0) + {\includegraphics[width=0.5\linewidth] + {tex/build/image-histogram/wfpc2.pdf}}; + + %% Its label + \node[anchor=south west] at (0.45\linewidth,0.45\linewidth) + {\textcolor{white}{a}}; + + %% This histogram. + \begin{axis}[at={(0.52\linewidth,0.1\linewidth)}, + no markers, + axis on top, + xmode=normal, + ymode=normal, + yticklabels={}, + scale only axis, + xlabel=Pixel value, + width=0.5\linewidth, + height=0.412\linewidth, + enlarge y limits=false, + enlarge x limits=false, + ] + \addplot [const plot mark mid, fill=red] + table [x index=0, y index=1] + {tex/build/to-publish/wfpc2-histogram.txt} + \closedcycle; + \end{axis} + + %% The histogram's label + \node[anchor=south west] at (0.95\linewidth,0.45\linewidth) {b}; +\end{tikzpicture} diff --git a/tex/src/delete-me-squared.tex b/tex/src/delete-me-squared.tex new file mode 100644 index 0000000..c0cc609 --- /dev/null +++ b/tex/src/delete-me-squared.tex @@ -0,0 +1,32 @@ +%% PGFPlots code to plot a random set of numbers as demo +%% +%% Copyright (C) 2019-2020 Mohammad Akhlaghi +% +%% This file is free software: you can redistribute it and/or modify it +%% under the terms of the GNU General Public License as published by the +%% Free Software Foundation, either version 3 of the License, or (at your +%% option) any later version. +% +%% This file is distributed in the hope that it will be useful, but WITHOUT +%% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +%% FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +%% for more details. +% +%% You should have received a copy of the GNU General Public License along +%% with this file. If not, see . + +\begin{tikzpicture} + + %% Settings of the plotted axis + \begin{axis}[ + width=\linewidth, + xlabel=$X$, + ylabel=$X^2$, + ] + + %% A particular plot. + \addplot+[scatter, only marks] + table {tex/build/to-publish/squared.txt}; + + \end{axis} +\end{tikzpicture} diff --git a/tex/src/delete-me.tex b/tex/src/delete-me.tex deleted file mode 100644 index e264854..0000000 --- a/tex/src/delete-me.tex +++ /dev/null @@ -1,32 +0,0 @@ -%% PGFPlots code to plot a random set of numbers as demo -%% -%% Copyright (C) 2019-2020 Mohammad Akhlaghi -% -%% This file is free software: you can redistribute it and/or modify it -%% under the terms of the GNU General Public License as published by the -%% Free Software Foundation, either version 3 of the License, or (at your -%% option) any later version. -% -%% This file is distributed in the hope that it will be useful, but WITHOUT -%% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or -%% FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License -%% for more details. -% -%% You should have received a copy of the GNU General Public License along -%% with this file. If not, see . - -\begin{tikzpicture} - - %% Settings of the plotted axis - \begin{axis}[ - width=\linewidth, - xlabel=$X$, - ylabel=$X^2$, - ] - - %% A particular plot. - \addplot+[scatter, only marks] - table {tex/build/delete-me-num/data.txt}; - - \end{axis} -\end{tikzpicture} diff --git a/tex/src/preamble-pgfplots.tex b/tex/src/preamble-pgfplots.tex index af6cb8d..1d57daf 100644 --- a/tex/src/preamble-pgfplots.tex +++ b/tex/src/preamble-pgfplots.tex @@ -77,17 +77,28 @@ -%% The following rule will cause the name of the files keeping a figure's -%% external PDF to be set based on the file that the TiKZ commands are -%% from. Without this, TiKZ will use numbers based on the order of -%% figures. These numbers can be hard to manage and they will also depend -%% on order in the final PDF, so it will be very buggy to manage them. -\newcommand{\includetikz}[1]{% +%% The '\includetikz' can be used to either build the figures using +%% PGFPlots (when '\makepdf' is defined), or use an existing file (when +%% '\makepdf' isn't defined). When making the PDF, it will set the output +%% figure name to be the same as the 'tex/src/XXXX.tex' file that contains +%% the PGFPlots source of the figure. In this way, when using the PDF, it +%% will also have the same name, thus allowing the figures to easily change +%% their place relative to others: figure ordering won't be a problem. This +%% is a problem by default because if an explicit name isn't set at the +%% start, tikz will make images based on their order in the paper. +% +%% This function takes two arguments: +%% 1) The base-name of the LaTeX file with the 'tikzpicture' +%% environment. As mentioned above, this will also be the name of +%% the produced figure. +%% 2) The settings to use with 'includegraphics' when an already-built +%% file should be used. +\newcommand{\includetikz}[2]{% \ifdefined\makepdf% \tikzsetnextfilename{#1}% \input{tex/src/#1.tex}% \else - \includegraphics[width=\linewidth]{tex/tikz/#1.pdf} + \includegraphics[#2]{tex/tikz/#1.pdf} \fi } -- cgit v1.2.1