aboutsummaryrefslogtreecommitdiff
path: root/README-hacking.md
diff options
context:
space:
mode:
Diffstat (limited to 'README-hacking.md')
-rw-r--r--README-hacking.md156
1 files changed, 77 insertions, 79 deletions
diff --git a/README-hacking.md b/README-hacking.md
index f490625..8897333 100644
--- a/README-hacking.md
+++ b/README-hacking.md
@@ -1,8 +1,8 @@
Maneage: managing data lineage
==============================
-Copyright (C) 2018-2021 Mohammad Akhlaghi <mohammad@akhlaghi.org>\
-Copyright (C) 2020-2021 Raul Infante-Sainz <infantesainz@gmail.com>\
+Copyright (C) 2018-2023 Mohammad Akhlaghi <mohammad@akhlaghi.org>\
+Copyright (C) 2020-2023 Raul Infante-Sainz <infantesainz@gmail.com>\
See the end of the file for license conditions.
Maneage is a **fully working template** for doing reproducible research (or
@@ -180,29 +180,40 @@ evolving rapidly, so some details will differ between the different
versions. The more recent papers will tend to be the most useful as good
working examples.
- - Peper & Roukema ([2021](https://ui.adsabs.harvard.edu/abs/2021MNRAS.tmp.1317P),
- MNRAS, 505, 1223, DOI:10.1093/mnras/stab1342, arXiv:2010.03742):
- The live version of the controlled source is
- [at Codeberg](https://codeberg.org/boud/elaphrocentre); the main input
- dataset, a software snapshot, the software tarballs, the project
- outputs and editing history are available at
- [zenodo.4699702](https://zenodo.org/record/4699702); and the
- archived git history is available at
+ - Borkowska & Roukema
+ ([2022](https://ui.adsabs.harvard.edu/abs/2021arXiv211214174B), MNRAS
+ Submitted, arXiv:2112.14174): The live version of the controlled source
+ is [at Codeberg](https://codeberg.org/boud/gevcurvtest); the main input
+ dataset, a software snapshot, the software tarballs, the project outputs
+ and editing history are available at
+ [zenodo.5806027](https://doi.org/10.5281/zenodo.5806027); and the
+ archived git history is available at [swh:1:rev:54398b720ddbac269ede30bf1e27fe27f07567f7](https://archive.softwareheritage.org/browse/revision/54398b720ddbac269ede30bf1e27fe27f07567f7).
+
+ - Peper & Roukema
+ ([2021](https://ui.adsabs.harvard.edu/abs/2021MNRAS.505.1223P), MNRAS,
+ 505, 1223, DOI:10.1093/mnras/stab1342, arXiv:2010.03742): The live
+ version of the controlled source is [at
+ Codeberg](https://codeberg.org/boud/elaphrocentre); the main input
+ dataset, a software snapshot, the software tarballs, the project outputs
+ and editing history are available at
+ [zenodo.4699702](https://zenodo.org/record/4699702); and the archived
+ git history is available at
[swh:1:rev:a029edd32d5cd41dbdac145189d9b1a08421114e](https://archive.softwareheritage.org/swh:1:rev:a029edd32d5cd41dbdac145189d9b1a08421114e).
- - Roukema ([2021](https://ui.adsabs.harvard.edu/abs/2020arXiv200711779R),
- arXiv:2007.11779): The live version of the controlled source is
- [at Codeberg](https://codeberg.org/boud/subpoisson); the main input
- dataset, a software snapshot, the software tarballs, the project
+ - Roukema ([2021](https://ui.adsabs.harvard.edu/abs/2021PeerJ...911856R),
+ PeerJ, 9:e11856, arXiv:2007.11779): The live version of the controlled
+ source is [at Codeberg](https://codeberg.org/boud/subpoisson); the main
+ input dataset, a software snapshot, the software tarballs, the project
outputs and editing history are available at
- [zenodo.4765705](https://zenodo.org/record/4765705); and the
- archived git history is available at
+ [zenodo.4765705](https://zenodo.org/record/4765705); and the archived
+ git history is available at
[swh:1:rev:72242ca8eade9659031ea00394a30e0cc5cc1c37](https://archive.softwareheritage.org/swh:1:rev:72242ca8eade9659031ea00394a30e0cc5cc1c37).
- - Akhlaghi et al. ([2021](https://arxiv.org/abs/2006.03018),
- CiSE, in press, DOI:10.1109/MCSE.2021.3072860
- arXiv:2006.03018): The project's version controlled source is
- [on Gitlab](https://gitlab.com/makhlaghi/maneage-paper), necessary software,
+ - Akhlaghi et
+ al. ([2021](https://ui.adsabs.harvard.edu/abs/2021CSE....23c..82A),
+ CiSE, 23(3), 82 DOI:10.1109/MCSE.2021.3072860 arXiv:2006.03018): The
+ project's version controlled source is [on
+ Gitlab](https://gitlab.com/makhlaghi/maneage-paper), necessary software,
outputs and backup of history are available at
[zenodo.3872248](https://doi.org/10.5281/zenodo.3872248); and the
archived git history is available at
@@ -557,7 +568,7 @@ First custom commit
the default `origin` remote server to specify that this is Maneage's
remote server. This will allow you to use the conventional `origin`
name for your own project as shown in the next steps. Second, you will
- create and go into the conventional `master` branch to start
+ create and go into the conventional `main` branch to start
committing in your project later.
```shell
@@ -565,7 +576,7 @@ First custom commit
$ mv project my-project # Change the name to your project's name.
$ cd my-project # Go into the cloned directory.
$ git remote rename origin origin-maneage # Rename current/only remote to "origin-maneage".
- $ git checkout -b master # Create and enter your own "master" branch.
+ $ git checkout -b main # Create and enter your own "main" branch.
$ pwd # Just to confirm where you are.
```
@@ -620,7 +631,7 @@ First custom commit
a new project which is bad in this scenario, and will not allow you to
push to it). It will give you a URL (usually starting with `git@` and
ending in `.git`), put this URL in place of `XXXXXXXXXX` in the first
- command below. With the second command, "push" your `master` branch to
+ command below. With the second command, "push" your `main` branch to
your `origin` remote, and (with the `--set-upstream` option) set them
to track/follow each other. However, the `maneage` branch is currently
tracking/following your `origin-maneage` remote (automatically set
@@ -631,7 +642,7 @@ First custom commit
```shell
git remote add origin XXXXXXXXXX # Newly created repo is now called 'origin'.
- git push --set-upstream origin master # Push 'master' branch to 'origin' (with tracking).
+ git push --set-upstream origin main # Push 'main' branch to 'origin' (with tracking).
git push origin maneage # Push 'maneage' branch to 'origin' (no tracking).
```
@@ -639,7 +650,7 @@ First custom commit
your name (with your possible coauthors) and tentative abstract in
`paper.tex`. You should see the relevant place in the preamble (prior
to `\begin{document}`. Just note that some core project metadata like
- the project tile are actually set in
+ the project title are actually set in
`reproduce/analysis/config/metadata.conf`. So set your project title
in there. After you are done, run the `./project make` command again
to see your changes in the final PDF and make sure that your changes
@@ -668,14 +679,22 @@ First custom commit
- `reproduce/analysis/make/top-make.mk`: Delete the `delete-me` line
in the `makesrc` definition. Just make sure there is no empty line
- between the `download \` and `verify \` lines (they should be
+ between the `initialize \` and `verify \` lines (they should be
directly under each other).
- - `reproduce/analysis/make/verify.mk`: In the final recipe, under the
- commented line `Verify TeX macros`, remove the full line that
- contains `delete-me`, and set the value of `s` in the line for
- `download` to `XXXXX` (any temporary string, you'll fix it in the
- end of your project, when its complete).
+ - `reproduce/analysis/make/initialize.mk`: in the very end of this
+ file, you will see a set of lines between `delete the lines below
+ this` and `delete the lines above this`. Delete this whole group of
+ lines (including the two instruction lines).
+
+ - `reproduce/analysis/config/verify-outputs.conf`: Disable
+ verification of outputs by changing the `yes` (the value of
+ `verify-outputs`) to `no`. Later, when you are ready to submit your
+ paper, or publish the dataset, activate verification and make the
+ proper corrections in this file (described under the "Other basic
+ customizations" section below). This is a critical step and only
+ takes a few minutes when your project is finished. So DON'T FORGET
+ to activate it in the end.
- Delete all `delete-me*` files in the following directories:
@@ -685,14 +704,6 @@ First custom commit
$ rm reproduce/analysis/config/delete-me*
```
- - Disable verification of outputs by removing the `yes` from
- `reproduce/analysis/config/verify-outputs.conf`. Later, when you are
- ready to submit your paper, or publish the dataset, activate
- verification and make the proper corrections in this file (described
- under the "Other basic customizations" section below). This is a
- critical step and only takes a few minutes when your project is
- finished. So DON'T FORGET to activate it in the end.
-
- Re-make the project (after a cleaning) to see if you haven't
introduced any errors.
@@ -703,7 +714,7 @@ First custom commit
7. **Ignore changes in some Maneage files**: One of the main advantages of
Maneage is that you can later update your infra-structure by merging
- your `master` branch with the `maneage` branch. This is good for many
+ your `main` branch with the `maneage` branch. This is good for many
low-level features that you will likely never modify yourself. But it
is not desired for some files like `paper.tex` (you don't want changes
in Maneage's default `paper.tex` to cause conflicts with all the text
@@ -747,12 +758,12 @@ First custom commit
add a copyright notice in your name under the existing one(s), like
the line with capital letters below. To start with, add this line with
your name and email address to `paper.tex`,
- `tex/src/preamble-header.tex`, `reproduce/analysis/make/top-make.mk`,
+ `tex/src/preamble-project.tex`, `reproduce/analysis/make/top-make.mk`,
and generally, all the files you modified in the previous step.
```
- Copyright (C) 2018-2021 Existing Name <existing@email.address>
- Copyright (C) 2021 YOUR NAME <YOUR@EMAIL.ADDRESS>
+ Copyright (C) 2018-2023 Existing Name <existing@email.address>
+ Copyright (C) 2023 YOUR NAME <YOUR@EMAIL.ADDRESS>
```
9. **Configure Git for fist time**: If this is the first time you are
@@ -770,7 +781,7 @@ First custom commit
```
10. **Your first commit**: You have already made some small and basic
- changes in the steps above and you are in your project's `master`
+ changes in the steps above and you are in your project's `main`
branch. So, you can officially make your first commit in your
project's history and push it. But before that, you need to make sure
that there are no problems in the project. This is a good habit to
@@ -827,24 +838,12 @@ Other basic customizations
Gnuastro, go through the analysis steps in `reproduce/analysis` and
remove all its use cases (clearly marked).
- - **Input dataset**: The input datasets are managed through the
- `reproduce/analysis/config/INPUTS.conf` file. It is best to gather all
- the information regarding all the input datasets into this one central
- file. To ensure that the proper dataset is being downloaded and used
- by the project, it is also recommended get an [MD5
- checksum](https://en.wikipedia.org/wiki/MD5) of the file and include
- that in `INPUTS.conf` so the project can check it automatically. The
- preparation/downloading of the input datasets is done in
- `reproduce/analysis/make/download.mk`. Have a look there to see how
- these values are to be used. This information about the input datasets
- is also used in the initial `configure` script (to inform the users),
- so also modify that file. You can find all occurrences of the demo
- dataset with the command below and replace it with your input's
- dataset.
-
- ```shell
- $ grep -ir wfpc2 ./*
- ```
+ - **Input datasets**: The input datasets are managed through the
+ `reproduce/analysis/config/INPUTS.conf` file. It is best to gather the
+ following information regarding all the input datasets into this one
+ central file: 1) the SHA256 checksum of the file, 2) the URL where the
+ file can be downloaded online. Please read the comments at the start
+ of `reproduce/analysis/config/INPUTS.conf` carefully.
- **`README.md`**: Correct all the `XXXXX` place holders (name of your
project, your own name, address of your project's online/remote
@@ -1066,14 +1065,13 @@ future.
the plots should be uploaded directly to Zenodo so they can be
viewed/downloaded with a simple link in the caption. For example see the
last sentence of the caption of Figure 1 in
- [arXiv:2006.03018v1](https://arxiv.org/pdf/2006.03018v1.pdf), it points
- to [the
- data](https://zenodo.org/record/3872248/files/tools-per-year.txt) that
- was used to create that figure's top plot. As you see, this will allow
- your paper's readers (again, most probably your future-self!) to
- directly access the numbers of each visualization (plot/figure) with a
- simple click in a trusted server. This also shows the major advantage of
- having your data as simple plain-text where possible, as described
+ [arXiv:2006.03018](https://arxiv.org/pdf/2006.03018.pdf), it points to
+ [the data](https://zenodo.org/record/3872248/files/tools-per-year.txt)
+ that was used to create that figure's left-side plot. As you see, this
+ will allow your paper's readers (again, most probably your future-self!)
+ to directly access the numbers of each visualization (plot/figure) with
+ a simple click in a trusted server. This also shows the major advantage
+ of having your data as simple plain-text where possible, as described
above. To help you keep all your to-be-visualized datasets in a single
place, Maneage has the two `tex-publish-dir` and `data-publish-dir`
directories that are defined in `reproduce/analysis/make/initialize.mk`,
@@ -1116,7 +1114,7 @@ future.
- **Confirm if your project builds from scratch**: Before publishing
anything, you should see if your project can indeed reproduce itself!
You may be mistakenly using temporarily created files that aren't built
- when teh project is built from scratch (this happens a lot and is very
+ when the project is built from scratch (this happens a lot and is very
dangerous for the integrity of your project!). So, go to a temporary
directory, clone your project from its repository and try configuring
and building it from scratch in a new-temporary build-directory. It is
@@ -1177,8 +1175,8 @@ future.
`.build/software/tarballs`. It is necessary to upload these with
your project to avoid relying on third party servers. In the future
any one of those servers may go down and if so, your project won't
- be buildable. You can generate this tarball easily with `make
- dist-software`.
+ be buildable. You can generate this tarball easily with `./project
+ make dist-software`.
* All the figure (and other) output datasets of the project. Don't
rename these files, let them have the same descriptive name
@@ -1522,12 +1520,12 @@ for the benefit of others.
# Have a look at the commits in the 'maneage' branch in relation
# with your project.
- $ git log --oneline --graph --decorate --all # General view of branches.
+ $ git log --oneline --graph --all # General view of branches.
- # Go to your 'master' branch and import all the updates into
- # 'master', don't worry about the printed outputs (in particular
+ # Go to your 'main' branch and import all the updates into
+ # 'main', don't worry about the printed outputs (in particular
# the 'CONFLICT's), we'll clean them up in the next step.
- $ git checkout master
+ $ git checkout main
$ git merge maneage
# Ignore conflicting Maneage files that you had previously deleted
@@ -1545,7 +1543,7 @@ for the benefit of others.
git status
# TIP: If you want the changes in one file to be only from a
- # special branch ('maneage' or 'master', completely ignoring
+ # special branch ('maneage' or 'main', completely ignoring
# changes in the other), use this command:
# $ git checkout <BRANCH-NAME> -- <FILENAME>
@@ -1568,7 +1566,7 @@ for the benefit of others.
./project make
# When everything is OK, before continuing with your project's
- # work, don't forget to push both your 'master' branch and your
+ # work, don't forget to push both your 'main' branch and your
# updated 'maneage' branch to your remote server.
git push
git push origin maneage