diff options
author | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2020-11-26 03:45:54 +0000 |
---|---|---|
committer | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2020-11-26 03:45:54 +0000 |
commit | 6b87843fc38c1646615ab0342a703f7ab3caf1cb (patch) | |
tree | ea11daebe93d93f7e549fe9e3404248d850026f7 /about-customize.html | |
parent | 56779683e1abd996f50d1e66055f4f5540a7d61c (diff) |
The long about.hml is now broken up into smaller pages
The "About" page ('about.html') was effectively a full copy of
Maneage's 'README-hacking.md', so it was very long. To help in
readability it has now been broken down into smaller pages (one for
each section).
Also the indentation of Make recipes was corrected, both in the about
pages, and also in the tutorial.
Diffstat (limited to 'about-customize.html')
-rw-r--r-- | about-customize.html | 444 |
1 files changed, 444 insertions, 0 deletions
diff --git a/about-customize.html b/about-customize.html new file mode 100644 index 0000000..6f66dc2 --- /dev/null +++ b/about-customize.html @@ -0,0 +1,444 @@ +<!DOCTYPE html> +<!-- Copyright notes are just below the head and before body --> + + <html lang="en-US"> + + <!-- HTML Header --> + <head> + <!-- Title of the page. --> + <title>Maneage -- Managing data lineage</title> + + <!-- Enable UTF-8 encoding to easily use non-ASCII charactes --> + <meta charset="UTF-8"> + <meta http-equiv="Content-type" content="text/html; charset=UTF-8"> + + <!-- Put logo beside the address bar --> + <link rel="shortcut icon" href="./img/favicon.svg" /> + + <!-- The viewport meta tag is placed mainly for mobile browsers + that are pre-configured in different ways (for example setting the + different widths for the page than the actual width of the device, + or zooming to different values. Without this the CSS media + solutions might not work properly on all mobile browsers.--> + <meta name="viewport" + content="width=device-width, initial-scale=1"> + + <!-- Basic styles --> + <link rel="stylesheet" href="css/base.css" /> + </head> + + <!-- + Webpage of Maneage: a framework for managing data lineage + + Copyright (C) 2020, Pedram Ashofteh Ardakani <pedramardakani@pm.me> + Copyright (C) 2020, Mohammad Akhlaghi <mohammad@akhlaghi.org> + + This file is part of Maneage. Maneage is free software: you can + redistribute it and/or modify it under the terms of the GNU General + Public License as published by the Free Software Foundation, either + version 3 of the License, or (at your option) any later version. + + Maneage is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. See + <http://www.gnu.org/licenses/>. --> + + <!-- Start the main body. --> + <body> + <div id="container"> + <header role="banner"> + <!-- global navigation --> + <nav role="navigation" id="nav-hamburger-wrapper"> + <input type="checkbox" id="nav-hamburger-input"/> + <label for="nav-hamburger-input">|||</label> + <div id="nav-hamburger-items" class="button"> + <a href="index.html">Home</a> + <a href="about.html">About</a> + <a href="http://git.maneage.org/project.git/">Git</a> + <a href="tutorial.html">Tutorial</a> + </div> + </nav> + </header> + <div class="banner"> + <div> + <a href="index.html"><img src="img/maneage-logo.svg" /></a> + </div> + <div> + <h1>Maneage</h1><h2>Managing Data Lineage</h2> + <p>Copyright © 2018-2020 Mohammad Akhlaghi <a href="mailto:mohammad@akhlaghi.org">mohammad@akhlaghi.org</a><br /> + Copyright © 2020 Raul Infante-Sainz <a href="mailto:infantesainz@gmail.com">infantesainz@gmail.com</a><br /> + <a href="#page-footer">License Conditions</a></p> + </div> + </div> + + + + + <hr /> + <p align="right">Next: <a href="about-tips.html">Tips for designing your project</a>, Previous: <a href="about-architecture.html">Maneage architecture</a>, Up: <a href="about.html">About</a> </p> + + <h1>Customization checklist</h1> + + <p>Take the following steps to fully customize Maneage for your research + project. After finishing the list, be sure to run <code>./project configure</code> and + <code>project make</code> to see if everything works correctly. If you notice anything + missing or any in-correct part (probably a change that has not been + explained here), please let us know to correct it.</p> + + <p>As described above, the concept of reproducibility (during a project) + heavily relies on <a href="https://en.wikipedia.org/wiki/Version_control">version + control</a>. Currently Maneage + uses Git as its main version control system. If you are not already + familiar with Git, please read the first three chapters of the <a href="https://git-scm.com/book/en/v2">ProGit + book</a> which provides a wonderful practical + understanding of the basics. You can read later chapters as you get more + advanced in later stages of your work.</p> + + <h2>First custom commit</h2> + + <ol> + <li><p><strong>Get this repository and its history</strong> (if you don't already have it): + Arguably the easiest way to start is to clone Maneage and prepare for + your customizations as shown below. After the cloning first you rename + the default <code>origin</code> remote server to specify that this is Maneage's + remote server. This will allow you to use the conventional <code>origin</code> + name for your own project as shown in the next steps. Second, you will + create and go into the conventional <code>master</code> branch to start + committing in your project later.</p> + + <pre><code>git clone https://git.maneage.org/project.git <span class="comment"># Clone/copy the project and its history.</span> +mv project my-project <span class="comment"># Change the name to your project's name.</span> +cd my-project <span class="comment"># Go into the cloned directory.</span> +git remote rename origin origin-maneage <span class="comment"># Rename current/only remote to "origin-maneage".</span> +git checkout -b master <span class="comment"># Create and enter your own "master" branch.</span> +pwd <span class="comment"># Just to confirm where you are.</span></code></pre></li> + <li><p><strong>Prepare to build project</strong>: The <code>./project configure</code> command of the + next step will build the different software packages within the + "build" directory (that you will specify). Nothing else on your system + will be touched. However, since it takes long, it is useful to see + what it is being built at every instant (its almost impossible to tell + from the torrent of commands that are produced!). So open another + terminal on your desktop and navigate to the same project directory + that you cloned (output of last command above). Then run the following + command. Once every second, this command will just print the date + (possibly followed by a non-existent directory notice). But as soon as + the next step starts building software, you'll see the names of + software get printed as they are being built. Once any software is + installed in the project build directory it will be removed. Again, + don't worry, nothing will be installed outside the build directory.</p> + + <pre><code><span class="comment"># On another terminal (go to top project source directory, last command above)</span> +./project --check-config</code></pre></li> + <li><p><strong>Test Maneage</strong>: Before making any changes, it is important to test it + and see if everything works properly with the commands below. If there + is any problem in the <code>./project configure</code> or <code>./project make</code> steps, + please contact us to fix the problem before continuing. Since the + building of dependencies in configuration can take long, you can take + the next few steps (editing the files) while its working (they don't + affect the configuration). After <code>./project make</code> is finished, open + <code>paper.pdf</code>. If it looks fine, you are ready to start customizing the + Maneage for your project. But before that, clean all the extra Maneage + outputs with <code>make clean</code> as shown below.</p> + + <pre><code>./project configure <span class="comment"># Build the project's software environment (can take an hour or so).</span> +./project make <span class="comment"># Do the processing and build paper (just a simple demo).</span> +<span class="comment"># Open 'paper.pdf' and see if everything is ok.</code></pre></li> + <li><p><strong>Setup the remote</strong>: You can use any <a href="https://en.wikipedia.org/wiki/Comparison_of_source_code_hosting_facilities">hosting + facility</a> + that supports Git to keep an online copy of your project's version + controlled history. We recommend <a href="https://gitlab.com">GitLab</a> because + it is <a href="https://www.gnu.org/software/repo-criteria-evaluation.html">more ethical (although not + perfect)</a>, + and later you can also host GitLab on your own server. Anyway, create + an account in your favorite hosting facility (if you don't already + have one), and define a new project there. Please make sure <em>the newly + created project is empty</em> (some services ask to include a <code>README</code> in + a new project which is bad in this scenario, and will not allow you to + push to it). It will give you a URL (usually starting with <code>git@</code> and + ending in <code>.git</code>), put this URL in place of <code>XXXXXXXXXX</code> in the first + command below. With the second command, "push" your <code>master</code> branch to + your <code>origin</code> remote, and (with the <code>--set-upstream</code> option) set them + to track/follow each other. However, the <code>maneage</code> branch is currently + tracking/following your <code>origin-maneage</code> remote (automatically set + when you cloned Maneage). So when pushing the <code>maneage</code> branch to your + <code>origin</code> remote, you <em>shouldn't</em> use <code>--set-upstream</code>. With the last + command, you can actually check this (which local and remote branches + are tracking each other).</p> + + <pre><code>git remote add origin XXXXXXXXXX <span class="comment"># Newly created repo is now called 'origin'.</span> +git push --set-upstream origin master <span class="comment"># Push 'master' branch to 'origin' (with tracking).</span> +git push origin maneage <span class="comment"># Push 'maneage' branch to 'origin' (no tracking).</span></code></pre></li> + <li><p><strong>Title</strong>, <strong>short description</strong> and <strong>author</strong>: The title and basic + information of your project's output PDF paper should be added in + <code>paper.tex</code>. You should see the relevant place in the preamble (prior + to <code>\begin{document}</code>. After you are done, run the <code>./project make</code> + command again to see your changes in the final PDF, and make sure that + your changes don't cause a crash in LaTeX. Of course, if you use a + different LaTeX package/style for managing the title and authors (in + particular a specific journal's style), please feel free to use it + your own methods after finishing this checklist and doing your first + commit.</p></li> + <li><p><strong>Delete dummy parts</strong>: Maneage contains some parts that are only for + the initial/test run, mainly as a demonstration of important steps, + which you can use as a reference to use in your own project. But they + not for any real analysis, so you should remove these parts as + described below:</p> + + <ul> + <li><p><code>paper.tex</code>: 1) Delete the text of the abstract (from + <code>\includeabstract{</code> to <code>\vspace{0.25cm}</code>) and write your own (a + single sentence can be enough now, you can complete it later). 2) + Add some keywords under it in the keywords part. 3) Delete + everything between <code>%% Start of main body.</code> and <code>%% End of main + body.</code>. 4) Remove the notice in the "Acknowledgments" section (in + <code>\new{}</code>) and Acknowledge your funding sources (this can also be + done later). Just don't delete the existing acknowledgment + statement: Maneage is possible thanks to funding from several + grants. Since Maneage is being used in your work, it is necessary to + acknowledge them in your work also.</p></li> + <li><p><code>reproduce/analysis/make/top-make.mk</code>: Delete the <code>delete-me</code> line + in the <code>makesrc</code> definition. Just make sure there is no empty line + between the <code>download \</code> and <code>verify \</code> lines (they should be + directly under each other).</p></li> + <li><p><code>reproduce/analysis/make/verify.mk</code>: In the final recipe, under the + commented line <code>Verify TeX macros</code>, remove the full line that + contains <code>delete-me</code>, and set the value of <code>s</code> in the line for + <code>download</code> to <code>XXXXX</code> (any temporary string, you'll fix it in the + end of your project, when its complete).</p></li> + <li><p>Delete all <code>delete-me*</code> files in the following directories:</p> + <pre><code>rm tex/src/delete-me* +rm reproduce/analysis/make/delete-me* +rm reproduce/analysis/config/delete-me*</code></pre></li> + <li><p>Disable verification of outputs by removing the <code>yes</code> from + <code>reproduce/analysis/config/verify-outputs.conf</code>. Later, when you are + ready to submit your paper, or publish the dataset, activate + verification and make the proper corrections in this file (described + under the "Other basic customizations" section below). This is a + critical step and only takes a few minutes when your project is + finished. So DON'T FORGET to activate it in the end.</p></li> + <li><p>Re-make the project (after a cleaning) to see if you haven't + introduced any errors.</p> + + <pre><code>./project make clean +./project make</code></pre></li> + </ul></li> + <li><p><strong>Don't merge some files in future updates</strong>: As described below, you + can later update your infra-structure (for example to fix bugs) by + merging your <code>master</code> branch with <code>maneage</code>. For files that you have + created in your own branch, there will be no problem. However if you + modify an existing Maneage file for your project, next time its + updated on <code>maneage</code> you'll have an annoying conflict. The commands + below show how to fix this future problem. With them, you can + configure Git to ignore the changes in <code>maneage</code> for some of the files + you have already edited and deleted above (and will edit below). Note + that only the first <code>echo</code> command has a <code>></code> (to write over the file), + the rest are <code>>></code> (to append to it). If you want to avoid any other + set of files to be imported from Maneage into your project's branch, + you can follow a similar strategy. We recommend only doing it when you + encounter the same conflict in more than one merge and there is no + other change in that file. Also, don't add core Maneage Makefiles, + otherwise Maneage can break on the next run.</p> + + <pre><code>echo "paper.tex merge=ours" > .gitattributes +echo "tex/src/delete-me.mk merge=ours" >> .gitattributes +echo "tex/src/delete-me-demo.mk merge=ours" >> .gitattributes +echo "reproduce/analysis/make/delete-me.mk merge=ours" >> .gitattributes +echo "reproduce/software/config/TARGETS.conf merge=ours" >> .gitattributes +echo "reproduce/analysis/config/delete-me-num.conf merge=ours" >> .gitattributes +git add .gitattributes</code></pre></li> + <li><p><strong>Copyright and License notice</strong>: It is necessary that <em>all</em> the + "copyright-able" files in your project (those larger than 10 lines) + have a copyright and license notice. Please take a moment to look at + several existing files to see a few examples. The copyright notice is + usually close to the start of the file, it is the line starting with + <code>Copyright (C)</code> and containing a year and the author's name (like the + examples below). The License notice is a short description of the + copyright license, usually one or two paragraphs with a URL to the + full license. Don't forget to add these <em>two</em> notices to <em>any new + file</em> you add in your project (you can just copy-and-paste). When you + modify an existing Maneage file (which already has the notices), just + add a copyright notice in your name under the existing one(s), like + the line with capital letters below. To start with, add this line with + your name and email address to <code>paper.tex</code>, + <code>tex/src/preamble-header.tex</code>, <code>reproduce/analysis/make/top-make.mk</code>, + and generally, all the files you modified in the previous step.</p> + + <pre><code>Copyright (C) 2018-2020 Existing Name <existing@email.address> +Copyright (C) 2020 YOUR NAME <YOUR@EMAIL.ADDRESS></code></pre></li> + <li><p><strong>Configure Git for fist time</strong>: If this is the first time you are + running Git on this system, then you have to configure it with some + basic information in order to have essential information in the commit + messages (ignore this step if you have already done it). Git will + include your name and e-mail address information in each commit. You + can also specify your favorite text editor for making the commit + (<code>emacs</code>, <code>vim</code>, <code>nano</code>, and etc.).</p> + + <pre><code>git config --global user.name "YourName YourSurname" +git config --global user.email your-email@example.com +git config --global core.editor nano</code></pre></li> + <li><p><strong>Your first commit</strong>: You have already made some small and basic + changes in the steps above and you are in your project's <code>master</code> + branch. So, you can officially make your first commit in your + project's history and push it. But before that, you need to make sure + that there are no problems in the project. This is a good habit to + always re-build the system before a commit to be sure it works as + expected.</p> + + <pre><code>git status <span class="comment"># See which files you have changed.</span> +git diff <span class="comment"># Check the lines you have added/changed.</span> +./project make <span class="comment"># Make sure everything builds successfully.</span> +git add -u <span class="comment"># Put all tracked changes in staging area.</span> +git status <span class="comment"># Make sure everything is fine.</span> +git diff --cached <span class="comment"># Confirm all the changes that will be committed.</span> +git commit <span class="comment"># Your first commit: put a good description!</span> +git push <span class="comment"># Push your commit to your remote.</span></code></pre></li> + <li><p><strong>Start your exciting research</strong>: You are now ready to add flesh and + blood to this raw skeleton by further modifying and adding your + exciting research steps. You can use the "published works" section in + the introduction (above) as some fully working models to learn + from. Also, don't hesitate to contact us if you have any + questions.</p></li> + </ol> + + <h2>Other basic customizations</h2> + + <ul> + <li><p><strong>High-level software</strong>: Maneage installs all the software that your + project needs. You can specify which software your project needs in + <code>reproduce/software/config/TARGETS.conf</code>. The necessary software are + classified into two classes: 1) programs or libraries (usually written + in C/C++) which are run directly by the operating system. 2) Python + modules/libraries that are run within Python. By default + <code>TARGETS.conf</code> only has GNU Astronomy Utilities (Gnuastro) as one + scientific program and Astropy as one scientific Python module. Both + have many dependencies which will be installed into your project + during the configuration step. To see a list of software that are + currently ready to be built in Maneage, see + <code>reproduce/software/config/versions.conf</code> (which has their versions + also), the comments in <code>TARGETS.conf</code> describe how to use the software + name from <code>versions.conf</code>. Currently the raw pipeline just uses + Gnuastro to make the demonstration plots. Therefore if you don't need + Gnuastro, go through the analysis steps in <code>reproduce/analysis</code> and + remove all its use cases (clearly marked).</p></li> + <li><p><strong>Input dataset</strong>: The input datasets are managed through the + <code>reproduce/analysis/config/INPUTS.conf</code> file. It is best to gather all + the information regarding all the input datasets into this one central + file. To ensure that the proper dataset is being downloaded and used + by the project, it is also recommended get an <a href="https://en.wikipedia.org/wiki/MD5">MD5 + checksum</a> of the file and include + that in <code>INPUTS.conf</code> so the project can check it automatically. The + preparation/downloading of the input datasets is done in + <code>reproduce/analysis/make/download.mk</code>. Have a look there to see how + these values are to be used. This information about the input datasets + is also used in the initial <code>configure</code> script (to inform the users), + so also modify that file. You can find all occurrences of the demo + dataset with the command below and replace it with your input's + dataset.</p> + + <pre><code>grep -ir wfpc2 ./*</code></pre></li> + <li><p><strong><code>README.md</code></strong>: Correct all the <code>XXXXX</code> place holders (name of your + project, your own name, address of your project's online/remote + repository, link to download dependencies and etc). Generally, read + over the text and update it where necessary to fit your project. Don't + forget that this is the first file that is displayed on your online + repository and also your colleagues will first be drawn to read this + file. Therefore, make it as easy as possible for them to start + with. Also check and update this file one last time when you are ready + to publish your project's paper/source.</p></li> + <li><p><strong>Verify outputs</strong>: During the initial customization checklist, you + disabled verification. This is natural because during the project you + need to make changes all the time and its a waste of time to enable + verification every time. But at significant moments of the project + (for example before submission to a journal, or publication) it is + necessary. When you activate verification, before building the paper, + all the specified datasets will be compared with their respective + checksum and if any file's checksum is different from the one recorded + in the project, it will stop and print the problematic file and its + expected and calculated checksums. First set the value of + <code>verify-outputs</code> variable in + <code>reproduce/analysis/config/verify-outputs.conf</code> to <code>yes</code>. Then go to + <code>reproduce/analysis/make/verify.mk</code>. The verification of all the files + is only done in one recipe. First the files that go into the + plots/figures are checked, then the LaTeX macros. Validation of the + former (inputs to plots/figures) should be done manually. If its the + first time you are doing this, you can see two examples of the dummy + steps (with <code>delete-me</code>, you can use them if you like). These two + examples should be removed before you can run the project. For the + latter, you just have to update the checksums. The important thing to + consider is that a simple checksum can be problematic because some + file generators print their run-time date in the file (for example as + commented lines in a text table). When checking text files, this + Makefile already has this function: + <code>verify-txt-no-comments-leading-space</code>. As the name suggests, it will + remove comment lines and empty lines before calculating the MD5 + checksum. For FITS formats (common in astronomy, fortunately there is + a <code>DATASUM</code> definition which will return the checksum independent of + the headers. You can use the provided function(s), or define one for + your special formats.</p></li> + <li><p><strong>Feedback</strong>: As you use Maneage you will notice many things that if + implemented from the start would have been very useful for your + work. This can be in the actual scripting and architecture of Maneage, + or useful implementation and usage tips, like those below. In any + case, please share your thoughts and suggestions with us, so we can + add them here for everyone's benefit.</p></li> + <li><p><strong>Re-preparation</strong>: Automatic preparation is only run in the first run + of the project on a system, to re-do the preparation you have to use + the option below. Here is the reason for this: when its necessary, the + preparation process can be slow and will unnecessarily slow down the + whole project while the project is under development (focus is on the + analysis that is done after preparation). Because of this, preparation + will be done automatically for the first time that the project is run + (when <code>.build/software/preparation-done.mk</code> doesn't exist). After the + preparation process completes once, future runs of <code>./project make</code> + will not do the preparation process anymore (will not call + <code>top-prepare.mk</code>). They will only call <code>top-make.mk</code> for the + analysis. To manually invoke the preparation process after the first + attempt, the <code>./project make</code> script should be run with the + <code>--prepare-redo</code> option, or you can delete the special file above.</p> + + <pre><code>./project make --prepare-redo</code></pre></li> + <li><p><strong>Pre-publication</strong>: add notice on reproducibility**: Add a notice + somewhere prominent in the first page within your paper, informing the + reader that your research is fully reproducible. For example in the + end of the abstract, or under the keywords with a title like + "reproducible paper". This will encourage them to publish their own + works in this manner also and also will help spread the word.</p></li> + </ul> + + <p align="right">Next: <a href="about-tips.html">Tips for designing your project</a>, Previous: <a href="about-architecture.html">Maneage architecture</a>, Up: <a href="about.html">About</a> </p> + + + + + + <footer role="contentinfo" id="page-footer"> + <h2>Copyright information</h2> + + <p>This file is part of Maneage's core: <a href="https://git.maneage.org/project.git">https://git.maneage.org/project.git</a></p> + + <p>Maneage is free software: you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation, either version 3 of the License, or (at your option) + any later version.</p> + + <p>Maneage is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for more + details.</p> + + <p>You should have received a copy of the GNU General Public License along + with Maneage. If not, see <a href="https://www.gnu.org/licenses/">https://www.gnu.org/licenses/</a>.</p> + <ul> + <li><p>Maneage is currently based in the Instituto de Astrofísica de Canarias (IAC).</p></li> + <li><p>Address: IAC, Calle Vía Láctea, s/n, E38205 - La Laguna (Tenerife), Spain.</p></li> + <!-- The people page will be added later + <li><p>People [page will be added later]</p></li> + --> + <li><p>Contact: with <a href="https://savannah.nongnu.org/support/?func=additem&group=reproduce">this form.</a></p></li> + <li><p>Copyright © 2020 Maneage volunteers</p></li> + <li><p>All logos are copyrighted by the respective institutions</p></li> + </ul> + </footer> + </div> + </body> |