diff options
Diffstat (limited to 'README-hacking.md')
-rw-r--r-- | README-hacking.md | 630 |
1 files changed, 381 insertions, 249 deletions
diff --git a/README-hacking.md b/README-hacking.md index 475f2ca..fa14795 100644 --- a/README-hacking.md +++ b/README-hacking.md @@ -1,8 +1,8 @@ Maneage: managing data lineage ============================== -Copyright (C) 2018-2021 Mohammad Akhlaghi <mohammad@akhlaghi.org>\ -Copyright (C) 2020-2021 Raul Infante-Sainz <infantesainz@gmail.com>\ +Copyright (C) 2018-2025 Mohammad Akhlaghi <mohammad@akhlaghi.org>\ +Copyright (C) 2020-2025 Raul Infante-Sainz <infantesainz@gmail.com>\ See the end of the file for license conditions. Maneage is a **fully working template** for doing reproducible research (or @@ -180,29 +180,101 @@ evolving rapidly, so some details will differ between the different versions. The more recent papers will tend to be the most useful as good working examples. - - Peper & Roukema ([2020](https://arxiv.org/abs/2010.03742), - arXiv:2010.03742): The live version of the controlled source is [at - Codeberg](https://codeberg.org/boud/elaphrocentre); the main input - dataset, a software snapshot, the software tarballs, the project - outputs and editing history are available at - [zenodo.4062461](https://zenodo.org/record/4062461); and the + - Saremi et + al. [2025](https://ui.adsabs.harvard.edu/abs/2025arXiv250802780S), + Astronomy and Astrophysics (accepted): The project's version controlled + source is on [Gitlab](https://gitlab.com/nasim-projects/pipeline), + necessary software, outputs and backup of history are available at + [zenodo.16152699](https://doi.org/10.5281/zenodo.16152699); and the + archived git history is available at + [swh:1:dir:b3657cfb6053fd976695bd63c15cb99e5095648a](https://archive.softwareheritage.org/swh:1:dir:b3657cfb6053fd976695bd63c15cb99e5095648a;origin=https://gitlab.com/nasim-projects/pipeline;visit=swh:1:snp:ab7c6f0b9999f42d77154103c1bc082fa23b325c;anchor=swh:1:rev:afeb282c01983cba2a11eb4b2f25d5a40d35c164). + + - Eskandarlou & Akhlaghi + [2024](https://ui.adsabs.harvard.edu/abs/2024RNAAS...8..168E), Research + Notes in American Astronomical Society (RNAAS), Volume 8, Issue 6, + id.168. The project's version controlled source is on + [Codeberg](https://codeberg.org/gnuastro/papers) (the `polar-plot` + branch). Necessary software, outputs and backup of history are available + at [zenodo.11403643](https://doi.org/10.5281/zenodo.11403643); and the archived git history is available at - [swh:1:dir:c4770e81288f340083dd8aa9fe017103c4eaf476](https://archive.softwareheritage.org/swh:1:dir:c4770e81288f340083dd8aa9fe017103c4eaf476). + [swh:1:dir:4e09bf85f9f87336fa55920bf67e7bcf6d58bbd5](https://archive.softwareheritage.org/swh:1:dir:4e09bf85f9f87336fa55920bf67e7bcf6d58bbd5;origin=https://codeberg.org/gnuastro/papers;visit=swh:1:snp:557ee1a90de465659659ecc46df0c5ce29d0bb61;anchor=swh:1:rev:375e12e52080006be6a28e10980e79ef54d13d1d). - - Roukema ([2020](https://arxiv.org/abs/2007.11779), - arXiv:2007.11779): The live version of the controlled source is [at - Codeberg](https://codeberg.org/boud/subpoisson); the main input - dataset, a software snapshot, the software tarballs, the project - outputs and editing history are available at - [zenodo.3951152](https://zenodo.org/record/3951152); and the + - Infante-Sainz et + al. [2024](https://ui.adsabs.harvard.edu/abs/2024RNAAS...8...22I), + Research Notes in American Astronomical Society (RNAAS), Volume 8, Issue + 1, id.22. The project's version controlled source is on + [Codeberg](https://codeberg.org/gnuastro/papers) (the `radial-profile` + branch). Necessary software, outputs and backup of history are available + at [zenodo.10124582](https://doi.org/10.5281/zenodo.10124582); and the archived git history is available at - [swh:1:dir:fcc9d6b111e319e51af88502fe6b233dc78d5166](https://archive.softwareheritage.org/swh:1:dir:fcc9d6b111e319e51af88502fe6b233dc78d5166). + [swh:1:dir:d5029e066916cb64f0d95d20eb88294acc78b2b1](https://archive.softwareheritage.org/swh:1:dir:d5029e066916cb64f0d95d20eb88294acc78b2b1;origin=https://codeberg.org/gnuastro/papers;visit=swh:1:snp:b065324c2ef3b48bc26e8f30e48102a1abd2052f;anchor=swh:1:rev:61764447b16da44538e5ddbf7fb69937ba138e81). + + - Infante-Sainz & Akhlaghi + [2024](https://ui.adsabs.harvard.edu/abs/2024RNAAS...8...10I), Research + Notes in American Astronomical Society (RNAAS), Volume 8, Issue 1, + id.10. The project's version controlled source is on + [Codeberg](https://codeberg.org/gnuastro/papers) (the `color-faint-gray` + branch). Necessary software, outputs and backup of history are available + at [zenodo.10058165](https://doi.org/10.5281/zenodo.10058165); and the + archived git history is available at + [swh:1:dir:1064a48d4bb58d6684c3df33c6633a04d4141d2d](https://archive.softwareheritage.org/swh:1:dir:1064a48d4bb58d6684c3df33c6633a04d4141d2d;origin=https://codeberg.org/gnuastro/papers;visit=swh:1:snp:a083ff647c571f895d1ccc9f7432fa1b9a1d03a8;anchor=swh:1:rev:ff77b619daa50b05ddd83206d979d1f8a53d040b). + + - Eskandarlou et + al. [2023](https://ui.adsabs.harvard.edu/abs/2023RNAAS...7..269E), + Research Notes in American Astronomical Society (RNAAS), Volume 7, Issue + 12, id.269. The project's version controlled source is on + [Codeberg](https://codeberg.org/gnuastro/papers) (the `zeropoint` + branch). Necessary software, outputs and backup of history are available + at [zenodo.10256845](https://doi.org/10.5281/zenodo.10256845); and the + archived git history is available at + [swh:1:dir:8b2d1f63be96de3de03aa3e2bb68fa7fa52df56f](https://archive.softwareheritage.org/swh:1:dir:8b2d1f63be96de3de03aa3e2bb68fa7fa52df56f;origin=https://codeberg.org/gnuastro/papers;visit=swh:1:snp:e37e226bab517eef24d854467682b2fcf5d7dc32;anchor=swh:1:rev:ea682783d83707c0e1d114a5de74a100be9f545d). + + - Akhlaghi [2023](https://ui.adsabs.harvard.edu/abs/2023RNAAS...7..211A), + Research Notes in American Astronomical Society (RNAAS), Volume 7, Issue + 10, id.211. The project's version controlled source is on + [Codeberg](https://codeberg.org/gnuastro/papers) (the + `pointing-simulate` branch). + + - Borkowska & Roukema + ([2022](https://ui.adsabs.harvard.edu/abs/2022CQGra..39u5007B), + Classical and Quantum Gravity, arXiv:2112.14174): The live version of + the controlled source is [at + Codeberg](https://codeberg.org/boud/gevcurvtest); the main input + dataset, a software snapshot, the software tarballs, the project outputs + and editing history are available at + [zenodo.5806027](https://doi.org/10.5281/zenodo.5806027); and the + archived git history is available at + [swh:1:rev:54398b720ddbac269ede30bf1e27fe27f07567f7](https://archive.softwareheritage.org/browse/revision/54398b720ddbac269ede30bf1e27fe27f07567f7). - - Akhlaghi et al. ([2020](https://arxiv.org/abs/2006.03018), - arXiv:2006.03018): The project's version controlled source is [on + - Peper & Roukema + ([2021](https://ui.adsabs.harvard.edu/abs/2021MNRAS.505.1223P), MNRAS, + 505, 1223, DOI:10.1093/mnras/stab1342, arXiv:2010.03742): The live + version of the controlled source is [at + Codeberg](https://codeberg.org/boud/elaphrocentre); the main input + dataset, a software snapshot, the software tarballs, the project outputs + and editing history are available at + [zenodo.4699702](https://zenodo.org/record/4699702); and the archived + git history is available at + [swh:1:rev:a029edd32d5cd41dbdac145189d9b1a08421114e](https://archive.softwareheritage.org/swh:1:rev:a029edd32d5cd41dbdac145189d9b1a08421114e). + + - Roukema ([2021](https://ui.adsabs.harvard.edu/abs/2021PeerJ...911856R), + PeerJ, 9:e11856, arXiv:2007.11779): The live version of the controlled + source is [at Codeberg](https://codeberg.org/boud/subpoisson); the main + input dataset, a software snapshot, the software tarballs, the project + outputs and editing history are available at + [zenodo.4765705](https://zenodo.org/record/4765705); and the archived + git history is available at + [swh:1:rev:72242ca8eade9659031ea00394a30e0cc5cc1c37](https://archive.softwareheritage.org/swh:1:rev:72242ca8eade9659031ea00394a30e0cc5cc1c37). + + - Akhlaghi et + al. ([2021](https://ui.adsabs.harvard.edu/abs/2021CSE....23c..82A), + CiSE, 23(3), 82 DOI:10.1109/MCSE.2021.3072860 arXiv:2006.03018): The + project's version controlled source is [on Gitlab](https://gitlab.com/makhlaghi/maneage-paper), necessary software, - outputs and backup of history is available in - [zenodo.3872248](https://doi.org/10.5281/zenodo.3872248). + outputs and backup of history are available at + [zenodo.3872248](https://doi.org/10.5281/zenodo.3872248); and the + archived git history is available at + [swh:1:dir:45a9e282a86145fe9babef529c8fce52ffe8d717](https://archive.softwareheritage.org/swh:1:dir:45a9e282a86145fe9babef529c8fce52ffe8d717). - Infante-Sainz et al. ([2020](https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.5317I), @@ -212,8 +284,8 @@ working examples. [zenodo.3524937](https://zenodo.org/record/3524937). - Akhlaghi ([2019](https://arxiv.org/abs/1909.11230), IAU Symposium - 355). The version controlled project source is available [on - GitLab](https://gitlab.com/makhlaghi/iau-symposium-355) and is also + 355). The version controlled project source is available + [on GitLab](https://gitlab.com/makhlaghi/iau-symposium-355) and is also archived on Zenodo with all the necessary software tarballs: [zenodo.3408481](https://doi.org/10.5281/zenodo.3408481). @@ -553,7 +625,7 @@ First custom commit the default `origin` remote server to specify that this is Maneage's remote server. This will allow you to use the conventional `origin` name for your own project as shown in the next steps. Second, you will - create and go into the conventional `master` branch to start + create and go into the conventional `main` branch to start committing in your project later. ```shell @@ -561,11 +633,51 @@ First custom commit $ mv project my-project # Change the name to your project's name. $ cd my-project # Go into the cloned directory. $ git remote rename origin origin-maneage # Rename current/only remote to "origin-maneage". - $ git checkout -b master # Create and enter your own "master" branch. + $ git checkout -b main # Create and enter your own "main" branch. $ pwd # Just to confirm where you are. ``` - 2. **Prepare to build project**: The `./project configure` command of the + 2. The final job of Maneage is to create your paper's PDF. By default it + uses a custom LaTeX style that resembles that of the Astrophysical + Journal (because the precusor of Maneage was for [Akhlaghi & Ichikawa + 2015](https://ui.adsabs.harvard.edu/abs/2015ApJS..220....1A)). The + journal you plan to submit your ppaer to will have its own separate + style. So it is best that you start your project by writing in the + desired style. We have already customized Maneage for the official + styles of some journals. To find them, run `git branch -r | grep + journal`. If your planned journal is one of them, you can take the + following steps to start your project based on that journal's style. If + it is not in these, you can ignore this step for now and customize the + style later (you can model based on these branchs). In the commands + below, we'll assume you want to prepare for the Astronomy and + Astrophysics journal (A&A). + + ```shell + $ git checkout -b journal origin-maneage/journal-a-and-a + $ git log -1 --oneline | awk '{print $1}' # To keep the commit hash + $ git rebase -i main # See description below + ``` + + In the first text editor that opens after the last command, change all + (except the first) `pick`s into `squash`, then save the change and + close the editor. In case there is no conflict, the second editor will + be pre-filled with all the commit messages in that branch. You do not + need those, so you can delete everything and write a commit message + like the following: `A&A journal (commit XXXXX of Maneage's + journal-a-and-a branch)`. Just replace the `XXXXX` with the output of + the second command above. The commit hash is important to be stored + here since it allows you to later check if any updates have been made + in that branch in the future. After completing the git rebase operation + (last command above), run the following commands below to put the new + commit in your `main` branch (and continue working based on that). + + ```shell + $ git checkout main + $ git merge journal + $ git branch -D journal + ``` + + 3. **Prepare to build project**: The `./project configure` command of the next step will build the different software packages within the "build" directory (that you will specify). Nothing else on your system will be touched. However, since it takes long, it is useful to see @@ -585,7 +697,7 @@ First custom commit $ ./project --check-config ``` - 3. **Test Maneage**: Before making any changes, it is important to test it + 4. **Test Maneage**: Before making any changes, it is important to test it and see if everything works properly with the commands below. If there is any problem in the `./project configure` or `./project make` steps, please contact us to fix the problem before continuing. Since the @@ -603,7 +715,7 @@ First custom commit # Open 'paper.pdf' and see if everything is ok. ``` - 4. **Setup the remote**: You can use any [hosting + 5. **Setup the remote**: You can use any [hosting facility](https://en.wikipedia.org/wiki/Comparison_of_source_code_hosting_facilities) that supports Git to keep an online copy of your project's version controlled history. We recommend [GitLab](https://gitlab.com) because @@ -616,7 +728,7 @@ First custom commit a new project which is bad in this scenario, and will not allow you to push to it). It will give you a URL (usually starting with `git@` and ending in `.git`), put this URL in place of `XXXXXXXXXX` in the first - command below. With the second command, "push" your `master` branch to + command below. With the second command, "push" your `main` branch to your `origin` remote, and (with the `--set-upstream` option) set them to track/follow each other. However, the `maneage` branch is currently tracking/following your `origin-maneage` remote (automatically set @@ -627,15 +739,15 @@ First custom commit ```shell git remote add origin XXXXXXXXXX # Newly created repo is now called 'origin'. - git push --set-upstream origin master # Push 'master' branch to 'origin' (with tracking). + git push --set-upstream origin main # Push 'main' branch to 'origin' (with tracking). git push origin maneage # Push 'maneage' branch to 'origin' (no tracking). ``` - 5. **Title**, **short description** and **author**: You can start adding + 6. **Title**, **short description** and **author**: You can start adding your name (with your possible coauthors) and tentative abstract in `paper.tex`. You should see the relevant place in the preamble (prior to `\begin{document}`. Just note that some core project metadata like - the project tile are actually set in + the project title are actually set in `reproduce/analysis/config/metadata.conf`. So set your project title in there. After you are done, run the `./project make` command again to see your changes in the final PDF and make sure that your changes @@ -644,7 +756,7 @@ First custom commit specific journal's style), please feel free to use it your own methods after finishing this checklist and doing your first commit. - 6. **Delete dummy parts**: Maneage contains some parts that are only for + 7. **Delete dummy parts**: Maneage contains some parts that are only for the initial/test run, mainly as a demonstration of important steps, which you can use as a reference to use in your own project. But they not for any real analysis, so you should remove these parts as @@ -664,14 +776,22 @@ First custom commit - `reproduce/analysis/make/top-make.mk`: Delete the `delete-me` line in the `makesrc` definition. Just make sure there is no empty line - between the `download \` and `verify \` lines (they should be + between the `initialize \` and `verify \` lines (they should be directly under each other). - - `reproduce/analysis/make/verify.mk`: In the final recipe, under the - commented line `Verify TeX macros`, remove the full line that - contains `delete-me`, and set the value of `s` in the line for - `download` to `XXXXX` (any temporary string, you'll fix it in the - end of your project, when its complete). + - `reproduce/analysis/make/initialize.mk`: in the very end of this + file, you will see a set of lines between `delete the lines below + this` and `delete the lines above this`. Delete this whole group of + lines (including the two instruction lines). + + - `reproduce/analysis/config/verify-outputs.conf`: Disable + verification of outputs by changing the `yes` (the value of + `verify-outputs`) to `no`. Later, when you are ready to submit your + paper, or publish the dataset, activate verification and make the + proper corrections in this file (described under the "Other basic + customizations" section below). This is a critical step and only + takes a few minutes when your project is finished. So DON'T FORGET + to activate it in the end. - Delete all `delete-me*` files in the following directories: @@ -681,14 +801,6 @@ First custom commit $ rm reproduce/analysis/config/delete-me* ``` - - Disable verification of outputs by removing the `yes` from - `reproduce/analysis/config/verify-outputs.conf`. Later, when you are - ready to submit your paper, or publish the dataset, activate - verification and make the proper corrections in this file (described - under the "Other basic customizations" section below). This is a - critical step and only takes a few minutes when your project is - finished. So DON'T FORGET to activate it in the end. - - Re-make the project (after a cleaning) to see if you haven't introduced any errors. @@ -697,9 +809,9 @@ First custom commit $ ./project make ``` - 7. **Ignore changes in some Maneage files**: One of the main advantages of + 8. **Ignore changes in some Maneage files**: One of the main advantages of Maneage is that you can later update your infra-structure by merging - your `master` branch with the `maneage` branch. This is good for many + your `main` branch with the `maneage` branch. This is good for many low-level features that you will likely never modify yourself. But it is not desired for some files like `paper.tex` (you don't want changes in Maneage's default `paper.tex` to cause conflicts with all the text @@ -729,7 +841,7 @@ First custom commit $ git add .gitattributes ``` - 8. **Copyright and License notice**: It is necessary that _all_ the + 9. **Copyright and License notice**: It is necessary that _all_ the "copyright-able" files in your project (those larger than 10 lines) have a copyright and license notice. Please take a moment to look at several existing files to see a few examples. The copyright notice is @@ -743,15 +855,15 @@ First custom commit add a copyright notice in your name under the existing one(s), like the line with capital letters below. To start with, add this line with your name and email address to `paper.tex`, - `tex/src/preamble-header.tex`, `reproduce/analysis/make/top-make.mk`, + `tex/src/preamble-project.tex`, `reproduce/analysis/make/top-make.mk`, and generally, all the files you modified in the previous step. ``` - Copyright (C) 2018-2021 Existing Name <existing@email.address> - Copyright (C) 2021 YOUR NAME <YOUR@EMAIL.ADDRESS> + Copyright (C) 2018-2025 Existing Name <existing@email.address> + Copyright (C) 2025-2025 YOUR NAME <YOUR@EMAIL.ADDRESS> ``` - 9. **Configure Git for fist time**: If this is the first time you are + 10. **Configure Git for fist time**: If this is the first time you are running Git on this system, then you have to configure it with some basic information in order to have essential information in the commit messages (ignore this step if you have already done it). Git will @@ -765,8 +877,8 @@ First custom commit $ git config --global core.editor nano ``` - 10. **Your first commit**: You have already made some small and basic - changes in the steps above and you are in your project's `master` + 11. **Your first commit**: You have already made some small and basic + changes in the steps above and you are in your project's `main` branch. So, you can officially make your first commit in your project's history and push it. But before that, you need to make sure that there are no problems in the project. This is a good habit to @@ -784,7 +896,7 @@ First custom commit $ git push # Push your commit to your remote. ``` - 11. **Read the publication checklist**: The publication checklist below is + 12. **Read the publication checklist**: The publication checklist below is very similar to this one, but for the final phase of your project. For now, you don't have to do any of its steps, but reading it will give you good insight into the later stages of your project. If you already @@ -794,7 +906,7 @@ First custom commit Making it much easier to complete that checklist when you are ready for submission. - 12. **Start your exciting research**: You are now ready to add flesh and + 13. **Start your exciting research**: You are now ready to add flesh and blood to this raw skeleton by further modifying and adding your exciting research steps. You can use the "published works" section in the introduction (above) as some fully working models to learn @@ -823,24 +935,12 @@ Other basic customizations Gnuastro, go through the analysis steps in `reproduce/analysis` and remove all its use cases (clearly marked). - - **Input dataset**: The input datasets are managed through the - `reproduce/analysis/config/INPUTS.conf` file. It is best to gather all - the information regarding all the input datasets into this one central - file. To ensure that the proper dataset is being downloaded and used - by the project, it is also recommended get an [MD5 - checksum](https://en.wikipedia.org/wiki/MD5) of the file and include - that in `INPUTS.conf` so the project can check it automatically. The - preparation/downloading of the input datasets is done in - `reproduce/analysis/make/download.mk`. Have a look there to see how - these values are to be used. This information about the input datasets - is also used in the initial `configure` script (to inform the users), - so also modify that file. You can find all occurrences of the demo - dataset with the command below and replace it with your input's - dataset. - - ```shell - $ grep -ir wfpc2 ./* - ``` + - **Input datasets**: The input datasets are managed through the + `reproduce/analysis/config/INPUTS.conf` file. It is best to gather the + following information regarding all the input datasets into this one + central file: 1) the SHA256 checksum of the file, 2) the URL where the + file can be downloaded online. Please read the comments at the start + of `reproduce/analysis/config/INPUTS.conf` carefully. - **`README.md`**: Correct all the `XXXXX` place holders (name of your project, your own name, address of your project's online/remote @@ -925,6 +1025,123 @@ Other basic customizations +Upgrading the Maneage branch of your project +============================================ + +In time, Maneage is going to become more and more mature and robust (thanks +to your feedback and the feedback of other users). Bugs will be fixed and +new/improved features will be added. So every once and a while, you can run +the commands below to pull new work that is done in Maneage. If the changes +are useful for your work, you can merge them with your project to benefit +from them. + +0. Before going into the technicalities of upgrading your project's +Maneage, it may happen that you don't have the `maneage` branch and +`origin-maneage` remote any more! This can happen when you clone your own +project on a different system, or a colleague clones it to collaborate with +you: the clone won't have the `origin-maneage` remote that you started the +project with. If this is the case, you can add the `origin-maneage` remote +and define the `maneage` branch from it using the steps below: + +```shell +$ git remote add origin-maneage https://git.maneage.org/project.git +$ git fetch origin-maneage maneage +$ git checkout -b maneage --track origin-maneage/maneage +``` + +1. Pull the latest changes on the `maneage` branch and read the commit +message (full description) of all new changes with the commands below. Just +be sure that you have already committed any changes in your branch +(otherwise the checkout command will fail). + +```shell +# Go to the 'maneage' branch and import updates. +$ git checkout maneage +$ git pull # Get recent work in Maneage +``` + +2. Read all the commit messages of the newly imported features/changes. In +particular pay close attention to the ones starting with 'IMPORTANT': these +may cause a crash in your project (changing something fundamental in +Maneage). Replace the `XXXXXXX..YYYYYYY` with hashs mentioned close to +start of the `git pull` command outputs (prevoius step). + +```shell +$ git log XXXXXXX..YYYYYYY --reverse +``` + +3. If you find the updates useful, go to your project's `main` branch and +import all the updates into it with the commands below. Don't worry about +the printed outputs (in particular the `CONFLICT`s), we'll clean them up in +the next step. + +```shell +$ git checkout main +$ git merge maneage + +# Ignore conflicting Maneage files that you had previously deleted +# in the customization checklist (mostly demonstration files). +$ git status # Just for a check +$ git status --porcelain | awk '/^DU/{system("git rm "$NF)}' +$ git status # Just for a check +``` + +4. Files with conflicts will be visible from the output of the last command +above) with the classification `both modified:`. Open one of these files +with your favorite text editor and correct the conflict (placed in between +`<<<<<<<`, `=======` and `>>>>>>>`). Once all conflicts in a file are +remoted, the file will be automatically removed from the "Unmerged paths" +of `git status`. So run `git status` after correcting the conflicts of each +file just to make sure things are clean. TIP: If you want the changes in +one file to be only from a special branch (`maneage` or `main`, completely +ignoring changes in the other), use this command: + +```shell +$ git checkout <BRANCH-NAME> -- <FILENAME> +``` + +5. Once all the Git conflicts are fixed, it is important to make sure that +"semantic conflicts" (that don't show up in Git, but can potentially break +your project) are also fixed. For example updates to software versions +(their behavior may have changed), or to internal Maneage structure. Hence +read the commit messages of `git log` carefully to see what has changed. In +case you see a commit with `IMPORTANT` in its title, the best way is to +delete your build directory and let the software and project be executed +from scratch. + +```shell +$ ./project make distclean # will DELETE ALL your build-directory!! +$ ./project configure -e +$ ./project make +``` + +6. Once your final product is created at the end of the previous step and +its contents are what you expect, you are ready to commit the merge. In the +commit message, Explain any conflicts that you fixed. + +```shell +$ git add -u +$ git commit +``` + +7. When everything is OK, before continuing with your project's work, don't +forget to push both your `main` branch and your updated `maneage` branch to +your remote server. + +```shell +$ git push +$ git push origin maneage +``` + + + + + + + + + + Publication checklist ===================== @@ -945,14 +1162,14 @@ effectively no cost in keeping multiple redundancies on different servers, just in case one (or more) of them are discontinued in the (near/far) future. - - **Reserve a DOI for your dataset**: There are multiple data servers that - give this functionality, one of the most well known and (currently!) - well-funded is [Zenodo](https://zenodo.org) so we'll focus on it - here. Of course, you can use any other service that provides a similar - functionality. Once you complete these steps, you can start using/citing - your dataset's DOI in the source of your project to finalize the rest of - the points. With Zenodo, you can even use the given identifier - for things like downloading. + - **Reserve a DOI for your datasets**: There are multiple data servers + that give this functionality, one of the most well known and + (currently!) well-funded is [Zenodo](https://zenodo.org) so we'll focus + on it here. Of course, you can use any other service that provides a + similar functionality. Once you complete these steps, you can start + using/citing your dataset's DOI in the source of your project to + finalize the rest of the points. With Zenodo, you can even use the given + identifier for things like downloading. * *Start new upload*: After you log in to Zenodo, you can start a new upload by clicking on the "New Upload button". @@ -961,16 +1178,23 @@ future. Identifier", click on the "Reserve DOI" button. * *Fill basic info*: You need to at least fill in the "required fields" - (marked with a red star). You will always be able to change any - metadata (even after you "Publish"), so don't worry too much about - values in the fields, at this phase, its just important that they - are not empty. + (marked with a red star). You will always be able to change any + metadata (even after you "Publish"), so don't worry too much about + values in the fields, at this phase, its just important that they are + not empty. * *Save your project but do not yet publish*: Press the "Save" button - (at the top or bottom of the page). Do not yet press "Publish" - though, since that would make the project public, and freeze the DOI - with any possible file you may have uploaded already. We will get to - the publication phase in the next steps. + (at the top or bottom of the page). Do not yet press "Publish" though, + since that would make the project public, and freeze the DOI with any + possible file you may have uploaded already. We will get to the + publication phase in the next steps. + + - **Record the metadata**: Maneage comes with a file to store all the + project's metadata: `reproduce/analysis/config/metadata.conf`. Open this + file and store all the information that you currently have: for example + the Zenodo DOI, project's Git repository, Copyright owner and license of + the data after it becomes public. Keep the empty fields in mind and + after obtaining them, don't forget to fill them up. - **Request archival on SoftwareHeritage**: [Software Heritage](https://archive.softwareheritage.org/save/) is an online @@ -989,7 +1213,7 @@ future. - **Zenodo/SoftwareHeritage links in paper**: put links to the Zenodo-DOI (and SoftwareHeritage source when you make it public) in your - paper. Somewhere close the start, maybe under the keywords/abstract, + paper. Somewhere close to the start, maybe under the keywords/abstract, highlighting that they are supplements for reproducibility. These help readers easily access these resources for supplementary material directly from your PDF paper (sources on SoftwareHeritage and @@ -1013,14 +1237,14 @@ future. (for example with columns separated by white-space characters) or in the more formal [Comma-separated values](https://en.wikipedia.org/wiki/Comma-separated_values) or CSV, - format). In the former case, its best to set the suffixes to `.txt` - (because most browsers/OSs will automatically know they are plain-text - and open them without needing any other software. If you have other - types of data (for example images, or very large tables with millions - of rows/columns that can be inconvenient in plain-text), feel free to - use custom binary formats, but later, in the description of your - project on the server, add a note, explaining what software they - should use to open them. + format). Generally, its best to set the suffixes to `.txt` (because + most browsers/OSs will automatically know they are plain-text and open + them without needing any other software). If you have other types of + data (for example images, or very large tables with millions of + rows/columns that can be inconvenient in plain-text), feel free to use + custom binary formats, but later, in the description of your project + on the server, add a note, explaining what software they should use to + open them. * *Descriptive names*: In some papers there are many files and having cryptic names will only confuse your readers (actually, yourself in @@ -1033,58 +1257,35 @@ future. to rename everything related to each figure (which is very frustrating and prone to errors). - * *Good metadata*: Raw data are not too useful merely as a series of + * *Good metadata*: Raw data are not too useful merely as a series of raw numbers! So don't forget to have **good metadata in every file**. If its a plain-text file, usually lines starting with a `#` are ignored. So in the command that generates each dataset, add some extra - information about the dataset as lines starting with `#`. A minimal - set of recommended metadata are listed below. Feel free to add - more. You can use a configuration file to keep this information in one - place and automatically include them in all your output files. - - * *Project Title and authors*: This is very important to give a - general perspective of the figure. - - * *Links to project*: For example Zenodo-DOI, Journal-DOI (after it is - accepted), SoftwareHeritage page, arXiv-ID (or any other pre-print - server) and ofcourse, your Git repository. - - * *Commit hash* of the project that produced the dataset. This - directly links the dataset to a particular point in your project's - history. It is stored in the `$(project-commit-hash)` variable that - is defined in `initialize.mk`. So you can use it anywhere in your - project. - - * *Same commit hashes*: each dataset may have been created at - different phases of your project's history. If you simply upload the - produced datasets, they may therefore have different commits on - them. To avoid confusing your readers (and your self in the future), - it is best that they all have the same commit hash (which will also - be the commit hash printed in the paper). So upon publication, we - recommend deleting all of them and running `./project make` to build - them all with the same commit hash. - - * *Copyright as metadata*: people need to know if they can "use" the - dataset (i.e., modify it), or possibly re-distribute it and their - derived products. They also need to know how they can contact the - creator of the datset (who is usually also the copyright owner). So - as another metadata element, also add your name and email-address - (or the name of the person and email of the person who was in charge - of that part of the project), and the copyright license name and - standard link to the fully copyright license. + information (the more the better!) about the dataset as lines starting + with `#`. Based on `reproduce/analysis/config/metadata.conf`, in + `initialize.mk`, Maneage will produce a default set of basic + information for plain-text data and will put it in the + `$(print-general-metadata)` variable. It is thus recommended to print + this variable into your plain-text file before printing the actual + data (so it shows on top of the file). For a real-world example, see + its usage in `reproduce/analysis/make/delete-me.mk` (in the `maneage` + branch). If you are publishing your data in binary formats, please add + all the metadata you see in `$(print-general-metadata)` into each + dataset file (for example keywords in the FITS format). If there are + many files, its easy to define a tiny shell-script to do the job on + each dataset. - **Link to figure datasets in caption**: all the datasets that go into the plots should be uploaded directly to Zenodo so they can be viewed/downloaded with a simple link in the caption. For example see the last sentence of the caption of Figure 1 in - [arXiv:2006.03018v1](https://arxiv.org/pdf/2006.03018v1.pdf), it points - to [the - data](https://zenodo.org/record/3872248/files/tools-per-year.txt) that - was used to create that figure's top plot. As you see, this will allow - your paper's readers (again, most probably your future-self!) to - directly access the numbers of each visualization (plot/figure) with a - simple click in a trusted server. This also shows the major advantage of - having your data as simple plain-text where possible, as described + [arXiv:2006.03018](https://arxiv.org/pdf/2006.03018.pdf), it points to + [the data](https://zenodo.org/record/3872248/files/tools-per-year.txt) + that was used to create that figure's left-side plot. As you see, this + will allow your paper's readers (again, most probably your future-self!) + to directly access the numbers of each visualization (plot/figure) with + a simple click in a trusted server. This also shows the major advantage + of having your data as simple plain-text where possible, as described above. To help you keep all your to-be-visualized datasets in a single place, Maneage has the two `tex-publish-dir` and `data-publish-dir` directories that are defined in `reproduce/analysis/make/initialize.mk`, @@ -1127,7 +1328,7 @@ future. - **Confirm if your project builds from scratch**: Before publishing anything, you should see if your project can indeed reproduce itself! You may be mistakenly using temporarily created files that aren't built - when teh project is built from scratch (this happens a lot and is very + when the project is built from scratch (this happens a lot and is very dangerous for the integrity of your project!). So, go to a temporary directory, clone your project from its repository and try configuring and building it from scratch in a new-temporary build-directory. It is @@ -1188,8 +1389,8 @@ future. `.build/software/tarballs`. It is necessary to upload these with your project to avoid relying on third party servers. In the future any one of those servers may go down and if so, your project won't - be buildable. You can generate this tarball easily with `make - dist-software`. + be buildable. You can generate this tarball easily with `./project + make dist-software`. * All the figure (and other) output datasets of the project. Don't rename these files, let them have the same descriptive name @@ -1201,9 +1402,12 @@ future. initial/final submission to your desired journal. But we'll just add the necessary points for arXiv submission here: - * *Necessary links in comments*: put a link to your project's Git - repository, Zenodo-DOI (this is not your paper's DOI, its the - data/resources DOI), and/or SoftwareHeritage link in the comments. + * *Necessary links in comments*: put a link to your project's Git + repository, Zenodo-DOI (this is not your paper's DOI, its the + data/resources DOI), and/or SoftwareHeritage link in the comments. + + - *Update `metadata.conf`*: Once you have your final arXiv ID (formated + as: `1234.56789`) put it in `reproduce/analysis/config/metadata.conf`. - **Submission to a journal**: different journals accept submissions in different formats, some accept LaTeX, some only want a PDF, or etc. It @@ -1221,6 +1425,33 @@ future. the DOI (so you don't need to upload a new version if you just want to update the metadata). + - **After acceptance (before publication)**: Congratulations on the + acceptance! The main science content of your paper can't be changed any + more, but the paper will now go to the publication editor (for language + and style). Your approval of the final proof is necessary before the + paper is finally published. Use this period to finalize the final + metadata of your project: the journal's DOI. Some journals associate + your paper's DOI during this process. So before approving the final + proof do these steps: + + * Add the Journal DOI in `reproduce/analysis/config/metadata.conf`, + and re-build your final data products, so this important metadata is + added. + + * Once you get the final proof, and if everything is OK for you, + implement all the good language corrections/edits they have made + inside your own copy here and commit it into your project. This will + be the final commit of your project before publication. + + * Submit your final project as a new version to Zenodo (and + arXiv). The Zenodo one is most important because your plots will + link to it and you want the commit hash in the data files that + readers will get from Zenodo to be the same hash as the paper. + + * Tell the journal's publication editor to correct the hash and Zenodo + ID in your final proof confirmation (so the links point to the + correct place). Recall that on every new version upload in Zenodo, + you get a new DOI (or Zenodo ID). @@ -1472,105 +1703,6 @@ for the benefit of others. history of your project under version control. So try to make commits regularly (after any meaningful change/step/result). - - *Keep Maneage up-to-date*: In time, Maneage is going to become more - and more mature and robust (thanks to your feedback and the feedback - of other users). Bugs will be fixed and new/improved features will be - added. So every once and a while, you can run the commands below to - pull new work that is done in Maneage. If the changes are useful for - your work, you can merge them with your project to benefit from - them. Just pay **very close attention** to resolving possible - **conflicts** which might happen in the merge. In particular the - "semantic conflicts" that don't show up in Git, but can potentially - break your project, for example updates to software versions, or to - internal Maneage structure. Hence read the commit messages of `git - log` carefully to **see what has changed**. The best way to check is - to first complete the steps below, then build your project from - scratch (from `./project configure` in a new build-directory). - - ```shell - # Go to the 'maneage' branch and import updates. - $ git checkout maneage - $ git pull # Get recent work in Maneage - - # Read all the commit messages of the newly imported - # features/changes. In particular pay close attention to the ones - # starting with 'IMPORTANT': these may cause a crash in your - # project (changing something fundamental in Maneage). - # - # Replace the XXXXXXX..YYYYYYY with hashs mentioned close to start - # of the 'git pull' command outputs. - $ git log XXXXXXX..YYYYYYY --reverse - - # Have a look at the commits in the 'maneage' branch in relation - # with your project. - $ git log --oneline --graph --decorate --all # General view of branches. - - # Go to your 'master' branch and import all the updates into - # 'master', don't worry about the printed outputs (in particular - # the 'CONFLICT's), we'll clean them up in the next step. - $ git checkout master - $ git merge maneage - - # Ignore conflicting Maneage files that you had previously deleted - # in the customization checklist (mostly demonstration files). - $ git status # Just for a check - $ git status --porcelain | awk '/^DU/{system("git rm "$NF)}' - $ git status # Just for a check - - # If any files have conflicts, open a text editor and correct the - # conflict (placed in between '<<<<<<<', '=======' and '>>>>>>>'. - # Once all conflicts in a file are remoted, the file will be - # automatically removed from the "Unmerged paths", so run this - # command after correcting the conflicts of each file just to make - # sure things are clean. - git status - - # TIP: If you want the changes in one file to be only from a - # special branch ('maneage' or 'master', completely ignoring - # changes in the other), use this command: - # $ git checkout <BRANCH-NAME> -- <FILENAME> - - # When there are no more "Unmerged paths", you can commit the - # merge. In the commit message, Explain any conflicts that you - # fixed. - git commit - - # Do a clean build of your project (to check for "Semanic - # conflicts" (not detected as a conflict by Git, but may cause a - # crash in your project). You can backup your build directory - # before running the 'distclean' target. - # - # Any error in the build will be due to low-level changes in - # Maneage, so look closely at the commit messages in the Maneage - # branch and especially those where the title starts with - # 'IMPORTANT'. - ./project make distclean # will DELETE ALL your build-directory!! - ./project configure -e - ./project make - - # When everything is OK, before continuing with your project's - # work, don't forget to push both your 'master' branch and your - # updated 'maneage' branch to your remote server. - git push - git push origin maneage - ``` - - - *Adding Maneage to a fork of your project*: As you and your colleagues - continue your project, it will be necessary to have separate - forks/clones of it. But when you clone your own project on a - different system, or a colleague clones it to collaborate with you, - the clone won't have the `origin-maneage` remote that you started the - project with. As shown in the previous item above, you need this - remote to be able to pull recent updates from Maneage. The steps - below will setup the `origin-maneage` remote, and a local `maneage` - branch to track it, on the new clone. - - ```shell - $ git remote add origin-maneage https://git.maneage.org/project.git - $ git fetch origin-maneage - $ git checkout -b maneage --track origin-maneage/maneage - ``` - - *Commit message*: The commit message is a very important and useful aspect of version control. To make the commit message useful for others (or yourself, one year later), it is good to follow a |