diff options
Diffstat (limited to 'README-hacking.md')
-rw-r--r-- | README-hacking.md | 360 |
1 files changed, 188 insertions, 172 deletions
diff --git a/README-hacking.md b/README-hacking.md index f490625..ad44d3c 100644 --- a/README-hacking.md +++ b/README-hacking.md @@ -1,8 +1,8 @@ Maneage: managing data lineage ============================== -Copyright (C) 2018-2021 Mohammad Akhlaghi <mohammad@akhlaghi.org>\ -Copyright (C) 2020-2021 Raul Infante-Sainz <infantesainz@gmail.com>\ +Copyright (C) 2018-2025 Mohammad Akhlaghi <mohammad@akhlaghi.org>\ +Copyright (C) 2020-2025 Raul Infante-Sainz <infantesainz@gmail.com>\ See the end of the file for license conditions. Maneage is a **fully working template** for doing reproducible research (or @@ -180,29 +180,40 @@ evolving rapidly, so some details will differ between the different versions. The more recent papers will tend to be the most useful as good working examples. - - Peper & Roukema ([2021](https://ui.adsabs.harvard.edu/abs/2021MNRAS.tmp.1317P), - MNRAS, 505, 1223, DOI:10.1093/mnras/stab1342, arXiv:2010.03742): - The live version of the controlled source is - [at Codeberg](https://codeberg.org/boud/elaphrocentre); the main input - dataset, a software snapshot, the software tarballs, the project - outputs and editing history are available at - [zenodo.4699702](https://zenodo.org/record/4699702); and the - archived git history is available at + - Borkowska & Roukema + ([2022](https://ui.adsabs.harvard.edu/abs/2021arXiv211214174B), MNRAS + Submitted, arXiv:2112.14174): The live version of the controlled source + is [at Codeberg](https://codeberg.org/boud/gevcurvtest); the main input + dataset, a software snapshot, the software tarballs, the project outputs + and editing history are available at + [zenodo.5806027](https://doi.org/10.5281/zenodo.5806027); and the + archived git history is available at [swh:1:rev:54398b720ddbac269ede30bf1e27fe27f07567f7](https://archive.softwareheritage.org/browse/revision/54398b720ddbac269ede30bf1e27fe27f07567f7). + + - Peper & Roukema + ([2021](https://ui.adsabs.harvard.edu/abs/2021MNRAS.505.1223P), MNRAS, + 505, 1223, DOI:10.1093/mnras/stab1342, arXiv:2010.03742): The live + version of the controlled source is [at + Codeberg](https://codeberg.org/boud/elaphrocentre); the main input + dataset, a software snapshot, the software tarballs, the project outputs + and editing history are available at + [zenodo.4699702](https://zenodo.org/record/4699702); and the archived + git history is available at [swh:1:rev:a029edd32d5cd41dbdac145189d9b1a08421114e](https://archive.softwareheritage.org/swh:1:rev:a029edd32d5cd41dbdac145189d9b1a08421114e). - - Roukema ([2021](https://ui.adsabs.harvard.edu/abs/2020arXiv200711779R), - arXiv:2007.11779): The live version of the controlled source is - [at Codeberg](https://codeberg.org/boud/subpoisson); the main input - dataset, a software snapshot, the software tarballs, the project + - Roukema ([2021](https://ui.adsabs.harvard.edu/abs/2021PeerJ...911856R), + PeerJ, 9:e11856, arXiv:2007.11779): The live version of the controlled + source is [at Codeberg](https://codeberg.org/boud/subpoisson); the main + input dataset, a software snapshot, the software tarballs, the project outputs and editing history are available at - [zenodo.4765705](https://zenodo.org/record/4765705); and the - archived git history is available at + [zenodo.4765705](https://zenodo.org/record/4765705); and the archived + git history is available at [swh:1:rev:72242ca8eade9659031ea00394a30e0cc5cc1c37](https://archive.softwareheritage.org/swh:1:rev:72242ca8eade9659031ea00394a30e0cc5cc1c37). - - Akhlaghi et al. ([2021](https://arxiv.org/abs/2006.03018), - CiSE, in press, DOI:10.1109/MCSE.2021.3072860 - arXiv:2006.03018): The project's version controlled source is - [on Gitlab](https://gitlab.com/makhlaghi/maneage-paper), necessary software, + - Akhlaghi et + al. ([2021](https://ui.adsabs.harvard.edu/abs/2021CSE....23c..82A), + CiSE, 23(3), 82 DOI:10.1109/MCSE.2021.3072860 arXiv:2006.03018): The + project's version controlled source is [on + Gitlab](https://gitlab.com/makhlaghi/maneage-paper), necessary software, outputs and backup of history are available at [zenodo.3872248](https://doi.org/10.5281/zenodo.3872248); and the archived git history is available at @@ -557,7 +568,7 @@ First custom commit the default `origin` remote server to specify that this is Maneage's remote server. This will allow you to use the conventional `origin` name for your own project as shown in the next steps. Second, you will - create and go into the conventional `master` branch to start + create and go into the conventional `main` branch to start committing in your project later. ```shell @@ -565,7 +576,7 @@ First custom commit $ mv project my-project # Change the name to your project's name. $ cd my-project # Go into the cloned directory. $ git remote rename origin origin-maneage # Rename current/only remote to "origin-maneage". - $ git checkout -b master # Create and enter your own "master" branch. + $ git checkout -b main # Create and enter your own "main" branch. $ pwd # Just to confirm where you are. ``` @@ -620,7 +631,7 @@ First custom commit a new project which is bad in this scenario, and will not allow you to push to it). It will give you a URL (usually starting with `git@` and ending in `.git`), put this URL in place of `XXXXXXXXXX` in the first - command below. With the second command, "push" your `master` branch to + command below. With the second command, "push" your `main` branch to your `origin` remote, and (with the `--set-upstream` option) set them to track/follow each other. However, the `maneage` branch is currently tracking/following your `origin-maneage` remote (automatically set @@ -631,7 +642,7 @@ First custom commit ```shell git remote add origin XXXXXXXXXX # Newly created repo is now called 'origin'. - git push --set-upstream origin master # Push 'master' branch to 'origin' (with tracking). + git push --set-upstream origin main # Push 'main' branch to 'origin' (with tracking). git push origin maneage # Push 'maneage' branch to 'origin' (no tracking). ``` @@ -639,7 +650,7 @@ First custom commit your name (with your possible coauthors) and tentative abstract in `paper.tex`. You should see the relevant place in the preamble (prior to `\begin{document}`. Just note that some core project metadata like - the project tile are actually set in + the project title are actually set in `reproduce/analysis/config/metadata.conf`. So set your project title in there. After you are done, run the `./project make` command again to see your changes in the final PDF and make sure that your changes @@ -668,14 +679,22 @@ First custom commit - `reproduce/analysis/make/top-make.mk`: Delete the `delete-me` line in the `makesrc` definition. Just make sure there is no empty line - between the `download \` and `verify \` lines (they should be + between the `initialize \` and `verify \` lines (they should be directly under each other). - - `reproduce/analysis/make/verify.mk`: In the final recipe, under the - commented line `Verify TeX macros`, remove the full line that - contains `delete-me`, and set the value of `s` in the line for - `download` to `XXXXX` (any temporary string, you'll fix it in the - end of your project, when its complete). + - `reproduce/analysis/make/initialize.mk`: in the very end of this + file, you will see a set of lines between `delete the lines below + this` and `delete the lines above this`. Delete this whole group of + lines (including the two instruction lines). + + - `reproduce/analysis/config/verify-outputs.conf`: Disable + verification of outputs by changing the `yes` (the value of + `verify-outputs`) to `no`. Later, when you are ready to submit your + paper, or publish the dataset, activate verification and make the + proper corrections in this file (described under the "Other basic + customizations" section below). This is a critical step and only + takes a few minutes when your project is finished. So DON'T FORGET + to activate it in the end. - Delete all `delete-me*` files in the following directories: @@ -685,14 +704,6 @@ First custom commit $ rm reproduce/analysis/config/delete-me* ``` - - Disable verification of outputs by removing the `yes` from - `reproduce/analysis/config/verify-outputs.conf`. Later, when you are - ready to submit your paper, or publish the dataset, activate - verification and make the proper corrections in this file (described - under the "Other basic customizations" section below). This is a - critical step and only takes a few minutes when your project is - finished. So DON'T FORGET to activate it in the end. - - Re-make the project (after a cleaning) to see if you haven't introduced any errors. @@ -703,7 +714,7 @@ First custom commit 7. **Ignore changes in some Maneage files**: One of the main advantages of Maneage is that you can later update your infra-structure by merging - your `master` branch with the `maneage` branch. This is good for many + your `main` branch with the `maneage` branch. This is good for many low-level features that you will likely never modify yourself. But it is not desired for some files like `paper.tex` (you don't want changes in Maneage's default `paper.tex` to cause conflicts with all the text @@ -747,12 +758,12 @@ First custom commit add a copyright notice in your name under the existing one(s), like the line with capital letters below. To start with, add this line with your name and email address to `paper.tex`, - `tex/src/preamble-header.tex`, `reproduce/analysis/make/top-make.mk`, + `tex/src/preamble-project.tex`, `reproduce/analysis/make/top-make.mk`, and generally, all the files you modified in the previous step. ``` - Copyright (C) 2018-2021 Existing Name <existing@email.address> - Copyright (C) 2021 YOUR NAME <YOUR@EMAIL.ADDRESS> + Copyright (C) 2018-2025 Existing Name <existing@email.address> + Copyright (C) 2025-2025 YOUR NAME <YOUR@EMAIL.ADDRESS> ``` 9. **Configure Git for fist time**: If this is the first time you are @@ -770,7 +781,7 @@ First custom commit ``` 10. **Your first commit**: You have already made some small and basic - changes in the steps above and you are in your project's `master` + changes in the steps above and you are in your project's `main` branch. So, you can officially make your first commit in your project's history and push it. But before that, you need to make sure that there are no problems in the project. This is a good habit to @@ -827,24 +838,12 @@ Other basic customizations Gnuastro, go through the analysis steps in `reproduce/analysis` and remove all its use cases (clearly marked). - - **Input dataset**: The input datasets are managed through the - `reproduce/analysis/config/INPUTS.conf` file. It is best to gather all - the information regarding all the input datasets into this one central - file. To ensure that the proper dataset is being downloaded and used - by the project, it is also recommended get an [MD5 - checksum](https://en.wikipedia.org/wiki/MD5) of the file and include - that in `INPUTS.conf` so the project can check it automatically. The - preparation/downloading of the input datasets is done in - `reproduce/analysis/make/download.mk`. Have a look there to see how - these values are to be used. This information about the input datasets - is also used in the initial `configure` script (to inform the users), - so also modify that file. You can find all occurrences of the demo - dataset with the command below and replace it with your input's - dataset. - - ```shell - $ grep -ir wfpc2 ./* - ``` + - **Input datasets**: The input datasets are managed through the + `reproduce/analysis/config/INPUTS.conf` file. It is best to gather the + following information regarding all the input datasets into this one + central file: 1) the SHA256 checksum of the file, 2) the URL where the + file can be downloaded online. Please read the comments at the start + of `reproduce/analysis/config/INPUTS.conf` carefully. - **`README.md`**: Correct all the `XXXXX` place holders (name of your project, your own name, address of your project's online/remote @@ -929,6 +928,123 @@ Other basic customizations +Upgrading the Maneage branch of your project +============================================ + +In time, Maneage is going to become more and more mature and robust (thanks +to your feedback and the feedback of other users). Bugs will be fixed and +new/improved features will be added. So every once and a while, you can run +the commands below to pull new work that is done in Maneage. If the changes +are useful for your work, you can merge them with your project to benefit +from them. + +0. Before going into the technicalities of upgrading your project's +Maneage, it may happen that you don't have the `maneage` branch and +`origin-maneage` remote any more! This can happen when you clone your own +project on a different system, or a colleague clones it to collaborate with +you: the clone won't have the `origin-maneage` remote that you started the +project with. If this is the case, you can add the `origin-maneage` remote +and define the `maneage` branch from it using the steps below: + +```shell +$ git remote add origin-maneage https://git.maneage.org/project.git +$ git fetch origin-maneage maneage +$ git checkout -b maneage --track origin-maneage/maneage +``` + +1. Pull the latest changes on the `maneage` branch and read the commit +message (full description) of all new changes with the commands below. Just +be sure that you have already committed any changes in your branch +(otherwise the checkout command will fail). + +```shell +# Go to the 'maneage' branch and import updates. +$ git checkout maneage +$ git pull # Get recent work in Maneage +``` + +2. Read all the commit messages of the newly imported features/changes. In +particular pay close attention to the ones starting with 'IMPORTANT': these +may cause a crash in your project (changing something fundamental in +Maneage). Replace the `XXXXXXX..YYYYYYY` with hashs mentioned close to +start of the `git pull` command outputs (prevoius step). + +```shell +$ git log XXXXXXX..YYYYYYY --reverse +``` + +3. If you find the updates useful, go to your project's `main` branch and +import all the updates into it with the commands below. Don't worry about +the printed outputs (in particular the `CONFLICT`s), we'll clean them up in +the next step. + +```shell +$ git checkout main +$ git merge maneage + +# Ignore conflicting Maneage files that you had previously deleted +# in the customization checklist (mostly demonstration files). +$ git status # Just for a check +$ git status --porcelain | awk '/^DU/{system("git rm "$NF)}' +$ git status # Just for a check +``` + +4. Files with conflicts will be visible from the output of the last command +above) with the classification `both modified:`. Open one of these files +with your favorite text editor and correct the conflict (placed in between +`<<<<<<<`, `=======` and `>>>>>>>`). Once all conflicts in a file are +remoted, the file will be automatically removed from the "Unmerged paths" +of `git status`. So run `git status` after correcting the conflicts of each +file just to make sure things are clean. TIP: If you want the changes in +one file to be only from a special branch (`maneage` or `main`, completely +ignoring changes in the other), use this command: + +```shell +$ git checkout <BRANCH-NAME> -- <FILENAME> +``` + +5. Once all the Git conflicts are fixed, it is important to make sure that +"semantic conflicts" (that don't show up in Git, but can potentially break +your project) are also fixed. For example updates to software versions +(their behavior may have changed), or to internal Maneage structure. Hence +read the commit messages of `git log` carefully to see what has changed. In +case you see a commit with `IMPORTANT` in its title, the best way is to +delete your build directory and let the software and project be executed +from scratch. + +```shell +$ ./project make distclean # will DELETE ALL your build-directory!! +$ ./project configure -e +$ ./project make +``` + +6. Once your final product is created at the end of the previous step and +its contents are what you expect, you are ready to commit the merge. In the +commit message, Explain any conflicts that you fixed. + +```shell +$ git add -u +$ git commit +``` + +7. When everything is OK, before continuing with your project's work, don't +forget to push both your `main` branch and your updated `maneage` branch to +your remote server. + +```shell +$ git push +$ git push origin maneage +``` + + + + + + + + + + Publication checklist ===================== @@ -1066,14 +1182,13 @@ future. the plots should be uploaded directly to Zenodo so they can be viewed/downloaded with a simple link in the caption. For example see the last sentence of the caption of Figure 1 in - [arXiv:2006.03018v1](https://arxiv.org/pdf/2006.03018v1.pdf), it points - to [the - data](https://zenodo.org/record/3872248/files/tools-per-year.txt) that - was used to create that figure's top plot. As you see, this will allow - your paper's readers (again, most probably your future-self!) to - directly access the numbers of each visualization (plot/figure) with a - simple click in a trusted server. This also shows the major advantage of - having your data as simple plain-text where possible, as described + [arXiv:2006.03018](https://arxiv.org/pdf/2006.03018.pdf), it points to + [the data](https://zenodo.org/record/3872248/files/tools-per-year.txt) + that was used to create that figure's left-side plot. As you see, this + will allow your paper's readers (again, most probably your future-self!) + to directly access the numbers of each visualization (plot/figure) with + a simple click in a trusted server. This also shows the major advantage + of having your data as simple plain-text where possible, as described above. To help you keep all your to-be-visualized datasets in a single place, Maneage has the two `tex-publish-dir` and `data-publish-dir` directories that are defined in `reproduce/analysis/make/initialize.mk`, @@ -1116,7 +1231,7 @@ future. - **Confirm if your project builds from scratch**: Before publishing anything, you should see if your project can indeed reproduce itself! You may be mistakenly using temporarily created files that aren't built - when teh project is built from scratch (this happens a lot and is very + when the project is built from scratch (this happens a lot and is very dangerous for the integrity of your project!). So, go to a temporary directory, clone your project from its repository and try configuring and building it from scratch in a new-temporary build-directory. It is @@ -1177,8 +1292,8 @@ future. `.build/software/tarballs`. It is necessary to upload these with your project to avoid relying on third party servers. In the future any one of those servers may go down and if so, your project won't - be buildable. You can generate this tarball easily with `make - dist-software`. + be buildable. You can generate this tarball easily with `./project + make dist-software`. * All the figure (and other) output datasets of the project. Don't rename these files, let them have the same descriptive name @@ -1491,105 +1606,6 @@ for the benefit of others. history of your project under version control. So try to make commits regularly (after any meaningful change/step/result). - - *Keep Maneage up-to-date*: In time, Maneage is going to become more - and more mature and robust (thanks to your feedback and the feedback - of other users). Bugs will be fixed and new/improved features will be - added. So every once and a while, you can run the commands below to - pull new work that is done in Maneage. If the changes are useful for - your work, you can merge them with your project to benefit from - them. Just pay **very close attention** to resolving possible - **conflicts** which might happen in the merge. In particular the - "semantic conflicts" that don't show up in Git, but can potentially - break your project, for example updates to software versions, or to - internal Maneage structure. Hence read the commit messages of `git - log` carefully to **see what has changed**. The best way to check is - to first complete the steps below, then build your project from - scratch (from `./project configure` in a new build-directory). - - ```shell - # Go to the 'maneage' branch and import updates. - $ git checkout maneage - $ git pull # Get recent work in Maneage - - # Read all the commit messages of the newly imported - # features/changes. In particular pay close attention to the ones - # starting with 'IMPORTANT': these may cause a crash in your - # project (changing something fundamental in Maneage). - # - # Replace the XXXXXXX..YYYYYYY with hashs mentioned close to start - # of the 'git pull' command outputs. - $ git log XXXXXXX..YYYYYYY --reverse - - # Have a look at the commits in the 'maneage' branch in relation - # with your project. - $ git log --oneline --graph --decorate --all # General view of branches. - - # Go to your 'master' branch and import all the updates into - # 'master', don't worry about the printed outputs (in particular - # the 'CONFLICT's), we'll clean them up in the next step. - $ git checkout master - $ git merge maneage - - # Ignore conflicting Maneage files that you had previously deleted - # in the customization checklist (mostly demonstration files). - $ git status # Just for a check - $ git status --porcelain | awk '/^DU/{system("git rm "$NF)}' - $ git status # Just for a check - - # If any files have conflicts, open a text editor and correct the - # conflict (placed in between '<<<<<<<', '=======' and '>>>>>>>'. - # Once all conflicts in a file are remoted, the file will be - # automatically removed from the "Unmerged paths", so run this - # command after correcting the conflicts of each file just to make - # sure things are clean. - git status - - # TIP: If you want the changes in one file to be only from a - # special branch ('maneage' or 'master', completely ignoring - # changes in the other), use this command: - # $ git checkout <BRANCH-NAME> -- <FILENAME> - - # When there are no more "Unmerged paths", you can commit the - # merge. In the commit message, Explain any conflicts that you - # fixed. - git commit - - # Do a clean build of your project (to check for "Semanic - # conflicts" (not detected as a conflict by Git, but may cause a - # crash in your project). You can backup your build directory - # before running the 'distclean' target. - # - # Any error in the build will be due to low-level changes in - # Maneage, so look closely at the commit messages in the Maneage - # branch and especially those where the title starts with - # 'IMPORTANT'. - ./project make distclean # will DELETE ALL your build-directory!! - ./project configure -e - ./project make - - # When everything is OK, before continuing with your project's - # work, don't forget to push both your 'master' branch and your - # updated 'maneage' branch to your remote server. - git push - git push origin maneage - ``` - - - *Adding Maneage to a fork of your project*: As you and your colleagues - continue your project, it will be necessary to have separate - forks/clones of it. But when you clone your own project on a - different system, or a colleague clones it to collaborate with you, - the clone won't have the `origin-maneage` remote that you started the - project with. As shown in the previous item above, you need this - remote to be able to pull recent updates from Maneage. The steps - below will setup the `origin-maneage` remote, and a local `maneage` - branch to track it, on the new clone. - - ```shell - $ git remote add origin-maneage https://git.maneage.org/project.git - $ git fetch origin-maneage - $ git checkout -b maneage --track origin-maneage/maneage - ``` - - *Commit message*: The commit message is a very important and useful aspect of version control. To make the commit message useful for others (or yourself, one year later), it is good to follow a |