diff options
Diffstat (limited to 'about-customize.html')
-rw-r--r-- | about-customize.html | 444 |
1 files changed, 444 insertions, 0 deletions
diff --git a/about-customize.html b/about-customize.html new file mode 100644 index 0000000..6f66dc2 --- /dev/null +++ b/about-customize.html @@ -0,0 +1,444 @@ +<!DOCTYPE html> +<!-- Copyright notes are just below the head and before body --> + + <html lang="en-US"> + + <!-- HTML Header --> + <head> + <!-- Title of the page. --> + <title>Maneage -- Managing data lineage</title> + + <!-- Enable UTF-8 encoding to easily use non-ASCII charactes --> + <meta charset="UTF-8"> + <meta http-equiv="Content-type" content="text/html; charset=UTF-8"> + + <!-- Put logo beside the address bar --> + <link rel="shortcut icon" href="./img/favicon.svg" /> + + <!-- The viewport meta tag is placed mainly for mobile browsers + that are pre-configured in different ways (for example setting the + different widths for the page than the actual width of the device, + or zooming to different values. Without this the CSS media + solutions might not work properly on all mobile browsers.--> + <meta name="viewport" + content="width=device-width, initial-scale=1"> + + <!-- Basic styles --> + <link rel="stylesheet" href="css/base.css" /> + </head> + + <!-- + Webpage of Maneage: a framework for managing data lineage + + Copyright (C) 2020, Pedram Ashofteh Ardakani <pedramardakani@pm.me> + Copyright (C) 2020, Mohammad Akhlaghi <mohammad@akhlaghi.org> + + This file is part of Maneage. Maneage is free software: you can + redistribute it and/or modify it under the terms of the GNU General + Public License as published by the Free Software Foundation, either + version 3 of the License, or (at your option) any later version. + + Maneage is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. See + <http://www.gnu.org/licenses/>. --> + + <!-- Start the main body. --> + <body> + <div id="container"> + <header role="banner"> + <!-- global navigation --> + <nav role="navigation" id="nav-hamburger-wrapper"> + <input type="checkbox" id="nav-hamburger-input"/> + <label for="nav-hamburger-input">|||</label> + <div id="nav-hamburger-items" class="button"> + <a href="index.html">Home</a> + <a href="about.html">About</a> + <a href="http://git.maneage.org/project.git/">Git</a> + <a href="tutorial.html">Tutorial</a> + </div> + </nav> + </header> + <div class="banner"> + <div> + <a href="index.html"><img src="img/maneage-logo.svg" /></a> + </div> + <div> + <h1>Maneage</h1><h2>Managing Data Lineage</h2> + <p>Copyright © 2018-2020 Mohammad Akhlaghi <a href="mailto:mohammad@akhlaghi.org">mohammad@akhlaghi.org</a><br /> + Copyright © 2020 Raul Infante-Sainz <a href="mailto:infantesainz@gmail.com">infantesainz@gmail.com</a><br /> + <a href="#page-footer">License Conditions</a></p> + </div> + </div> + + + + + <hr /> + <p align="right">Next: <a href="about-tips.html">Tips for designing your project</a>, Previous: <a href="about-architecture.html">Maneage architecture</a>, Up: <a href="about.html">About</a> </p> + + <h1>Customization checklist</h1> + + <p>Take the following steps to fully customize Maneage for your research + project. After finishing the list, be sure to run <code>./project configure</code> and + <code>project make</code> to see if everything works correctly. If you notice anything + missing or any in-correct part (probably a change that has not been + explained here), please let us know to correct it.</p> + + <p>As described above, the concept of reproducibility (during a project) + heavily relies on <a href="https://en.wikipedia.org/wiki/Version_control">version + control</a>. Currently Maneage + uses Git as its main version control system. If you are not already + familiar with Git, please read the first three chapters of the <a href="https://git-scm.com/book/en/v2">ProGit + book</a> which provides a wonderful practical + understanding of the basics. You can read later chapters as you get more + advanced in later stages of your work.</p> + + <h2>First custom commit</h2> + + <ol> + <li><p><strong>Get this repository and its history</strong> (if you don't already have it): + Arguably the easiest way to start is to clone Maneage and prepare for + your customizations as shown below. After the cloning first you rename + the default <code>origin</code> remote server to specify that this is Maneage's + remote server. This will allow you to use the conventional <code>origin</code> + name for your own project as shown in the next steps. Second, you will + create and go into the conventional <code>master</code> branch to start + committing in your project later.</p> + + <pre><code>git clone https://git.maneage.org/project.git <span class="comment"># Clone/copy the project and its history.</span> +mv project my-project <span class="comment"># Change the name to your project's name.</span> +cd my-project <span class="comment"># Go into the cloned directory.</span> +git remote rename origin origin-maneage <span class="comment"># Rename current/only remote to "origin-maneage".</span> +git checkout -b master <span class="comment"># Create and enter your own "master" branch.</span> +pwd <span class="comment"># Just to confirm where you are.</span></code></pre></li> + <li><p><strong>Prepare to build project</strong>: The <code>./project configure</code> command of the + next step will build the different software packages within the + "build" directory (that you will specify). Nothing else on your system + will be touched. However, since it takes long, it is useful to see + what it is being built at every instant (its almost impossible to tell + from the torrent of commands that are produced!). So open another + terminal on your desktop and navigate to the same project directory + that you cloned (output of last command above). Then run the following + command. Once every second, this command will just print the date + (possibly followed by a non-existent directory notice). But as soon as + the next step starts building software, you'll see the names of + software get printed as they are being built. Once any software is + installed in the project build directory it will be removed. Again, + don't worry, nothing will be installed outside the build directory.</p> + + <pre><code><span class="comment"># On another terminal (go to top project source directory, last command above)</span> +./project --check-config</code></pre></li> + <li><p><strong>Test Maneage</strong>: Before making any changes, it is important to test it + and see if everything works properly with the commands below. If there + is any problem in the <code>./project configure</code> or <code>./project make</code> steps, + please contact us to fix the problem before continuing. Since the + building of dependencies in configuration can take long, you can take + the next few steps (editing the files) while its working (they don't + affect the configuration). After <code>./project make</code> is finished, open + <code>paper.pdf</code>. If it looks fine, you are ready to start customizing the + Maneage for your project. But before that, clean all the extra Maneage + outputs with <code>make clean</code> as shown below.</p> + + <pre><code>./project configure <span class="comment"># Build the project's software environment (can take an hour or so).</span> +./project make <span class="comment"># Do the processing and build paper (just a simple demo).</span> +<span class="comment"># Open 'paper.pdf' and see if everything is ok.</code></pre></li> + <li><p><strong>Setup the remote</strong>: You can use any <a href="https://en.wikipedia.org/wiki/Comparison_of_source_code_hosting_facilities">hosting + facility</a> + that supports Git to keep an online copy of your project's version + controlled history. We recommend <a href="https://gitlab.com">GitLab</a> because + it is <a href="https://www.gnu.org/software/repo-criteria-evaluation.html">more ethical (although not + perfect)</a>, + and later you can also host GitLab on your own server. Anyway, create + an account in your favorite hosting facility (if you don't already + have one), and define a new project there. Please make sure <em>the newly + created project is empty</em> (some services ask to include a <code>README</code> in + a new project which is bad in this scenario, and will not allow you to + push to it). It will give you a URL (usually starting with <code>git@</code> and + ending in <code>.git</code>), put this URL in place of <code>XXXXXXXXXX</code> in the first + command below. With the second command, "push" your <code>master</code> branch to + your <code>origin</code> remote, and (with the <code>--set-upstream</code> option) set them + to track/follow each other. However, the <code>maneage</code> branch is currently + tracking/following your <code>origin-maneage</code> remote (automatically set + when you cloned Maneage). So when pushing the <code>maneage</code> branch to your + <code>origin</code> remote, you <em>shouldn't</em> use <code>--set-upstream</code>. With the last + command, you can actually check this (which local and remote branches + are tracking each other).</p> + + <pre><code>git remote add origin XXXXXXXXXX <span class="comment"># Newly created repo is now called 'origin'.</span> +git push --set-upstream origin master <span class="comment"># Push 'master' branch to 'origin' (with tracking).</span> +git push origin maneage <span class="comment"># Push 'maneage' branch to 'origin' (no tracking).</span></code></pre></li> + <li><p><strong>Title</strong>, <strong>short description</strong> and <strong>author</strong>: The title and basic + information of your project's output PDF paper should be added in + <code>paper.tex</code>. You should see the relevant place in the preamble (prior + to <code>\begin{document}</code>. After you are done, run the <code>./project make</code> + command again to see your changes in the final PDF, and make sure that + your changes don't cause a crash in LaTeX. Of course, if you use a + different LaTeX package/style for managing the title and authors (in + particular a specific journal's style), please feel free to use it + your own methods after finishing this checklist and doing your first + commit.</p></li> + <li><p><strong>Delete dummy parts</strong>: Maneage contains some parts that are only for + the initial/test run, mainly as a demonstration of important steps, + which you can use as a reference to use in your own project. But they + not for any real analysis, so you should remove these parts as + described below:</p> + + <ul> + <li><p><code>paper.tex</code>: 1) Delete the text of the abstract (from + <code>\includeabstract{</code> to <code>\vspace{0.25cm}</code>) and write your own (a + single sentence can be enough now, you can complete it later). 2) + Add some keywords under it in the keywords part. 3) Delete + everything between <code>%% Start of main body.</code> and <code>%% End of main + body.</code>. 4) Remove the notice in the "Acknowledgments" section (in + <code>\new{}</code>) and Acknowledge your funding sources (this can also be + done later). Just don't delete the existing acknowledgment + statement: Maneage is possible thanks to funding from several + grants. Since Maneage is being used in your work, it is necessary to + acknowledge them in your work also.</p></li> + <li><p><code>reproduce/analysis/make/top-make.mk</code>: Delete the <code>delete-me</code> line + in the <code>makesrc</code> definition. Just make sure there is no empty line + between the <code>download \</code> and <code>verify \</code> lines (they should be + directly under each other).</p></li> + <li><p><code>reproduce/analysis/make/verify.mk</code>: In the final recipe, under the + commented line <code>Verify TeX macros</code>, remove the full line that + contains <code>delete-me</code>, and set the value of <code>s</code> in the line for + <code>download</code> to <code>XXXXX</code> (any temporary string, you'll fix it in the + end of your project, when its complete).</p></li> + <li><p>Delete all <code>delete-me*</code> files in the following directories:</p> + <pre><code>rm tex/src/delete-me* +rm reproduce/analysis/make/delete-me* +rm reproduce/analysis/config/delete-me*</code></pre></li> + <li><p>Disable verification of outputs by removing the <code>yes</code> from + <code>reproduce/analysis/config/verify-outputs.conf</code>. Later, when you are + ready to submit your paper, or publish the dataset, activate + verification and make the proper corrections in this file (described + under the "Other basic customizations" section below). This is a + critical step and only takes a few minutes when your project is + finished. So DON'T FORGET to activate it in the end.</p></li> + <li><p>Re-make the project (after a cleaning) to see if you haven't + introduced any errors.</p> + + <pre><code>./project make clean +./project make</code></pre></li> + </ul></li> + <li><p><strong>Don't merge some files in future updates</strong>: As described below, you + can later update your infra-structure (for example to fix bugs) by + merging your <code>master</code> branch with <code>maneage</code>. For files that you have + created in your own branch, there will be no problem. However if you + modify an existing Maneage file for your project, next time its + updated on <code>maneage</code> you'll have an annoying conflict. The commands + below show how to fix this future problem. With them, you can + configure Git to ignore the changes in <code>maneage</code> for some of the files + you have already edited and deleted above (and will edit below). Note + that only the first <code>echo</code> command has a <code>></code> (to write over the file), + the rest are <code>>></code> (to append to it). If you want to avoid any other + set of files to be imported from Maneage into your project's branch, + you can follow a similar strategy. We recommend only doing it when you + encounter the same conflict in more than one merge and there is no + other change in that file. Also, don't add core Maneage Makefiles, + otherwise Maneage can break on the next run.</p> + + <pre><code>echo "paper.tex merge=ours" > .gitattributes +echo "tex/src/delete-me.mk merge=ours" >> .gitattributes +echo "tex/src/delete-me-demo.mk merge=ours" >> .gitattributes +echo "reproduce/analysis/make/delete-me.mk merge=ours" >> .gitattributes +echo "reproduce/software/config/TARGETS.conf merge=ours" >> .gitattributes +echo "reproduce/analysis/config/delete-me-num.conf merge=ours" >> .gitattributes +git add .gitattributes</code></pre></li> + <li><p><strong>Copyright and License notice</strong>: It is necessary that <em>all</em> the + "copyright-able" files in your project (those larger than 10 lines) + have a copyright and license notice. Please take a moment to look at + several existing files to see a few examples. The copyright notice is + usually close to the start of the file, it is the line starting with + <code>Copyright (C)</code> and containing a year and the author's name (like the + examples below). The License notice is a short description of the + copyright license, usually one or two paragraphs with a URL to the + full license. Don't forget to add these <em>two</em> notices to <em>any new + file</em> you add in your project (you can just copy-and-paste). When you + modify an existing Maneage file (which already has the notices), just + add a copyright notice in your name under the existing one(s), like + the line with capital letters below. To start with, add this line with + your name and email address to <code>paper.tex</code>, + <code>tex/src/preamble-header.tex</code>, <code>reproduce/analysis/make/top-make.mk</code>, + and generally, all the files you modified in the previous step.</p> + + <pre><code>Copyright (C) 2018-2020 Existing Name <existing@email.address> +Copyright (C) 2020 YOUR NAME <YOUR@EMAIL.ADDRESS></code></pre></li> + <li><p><strong>Configure Git for fist time</strong>: If this is the first time you are + running Git on this system, then you have to configure it with some + basic information in order to have essential information in the commit + messages (ignore this step if you have already done it). Git will + include your name and e-mail address information in each commit. You + can also specify your favorite text editor for making the commit + (<code>emacs</code>, <code>vim</code>, <code>nano</code>, and etc.).</p> + + <pre><code>git config --global user.name "YourName YourSurname" +git config --global user.email your-email@example.com +git config --global core.editor nano</code></pre></li> + <li><p><strong>Your first commit</strong>: You have already made some small and basic + changes in the steps above and you are in your project's <code>master</code> + branch. So, you can officially make your first commit in your + project's history and push it. But before that, you need to make sure + that there are no problems in the project. This is a good habit to + always re-build the system before a commit to be sure it works as + expected.</p> + + <pre><code>git status <span class="comment"># See which files you have changed.</span> +git diff <span class="comment"># Check the lines you have added/changed.</span> +./project make <span class="comment"># Make sure everything builds successfully.</span> +git add -u <span class="comment"># Put all tracked changes in staging area.</span> +git status <span class="comment"># Make sure everything is fine.</span> +git diff --cached <span class="comment"># Confirm all the changes that will be committed.</span> +git commit <span class="comment"># Your first commit: put a good description!</span> +git push <span class="comment"># Push your commit to your remote.</span></code></pre></li> + <li><p><strong>Start your exciting research</strong>: You are now ready to add flesh and + blood to this raw skeleton by further modifying and adding your + exciting research steps. You can use the "published works" section in + the introduction (above) as some fully working models to learn + from. Also, don't hesitate to contact us if you have any + questions.</p></li> + </ol> + + <h2>Other basic customizations</h2> + + <ul> + <li><p><strong>High-level software</strong>: Maneage installs all the software that your + project needs. You can specify which software your project needs in + <code>reproduce/software/config/TARGETS.conf</code>. The necessary software are + classified into two classes: 1) programs or libraries (usually written + in C/C++) which are run directly by the operating system. 2) Python + modules/libraries that are run within Python. By default + <code>TARGETS.conf</code> only has GNU Astronomy Utilities (Gnuastro) as one + scientific program and Astropy as one scientific Python module. Both + have many dependencies which will be installed into your project + during the configuration step. To see a list of software that are + currently ready to be built in Maneage, see + <code>reproduce/software/config/versions.conf</code> (which has their versions + also), the comments in <code>TARGETS.conf</code> describe how to use the software + name from <code>versions.conf</code>. Currently the raw pipeline just uses + Gnuastro to make the demonstration plots. Therefore if you don't need + Gnuastro, go through the analysis steps in <code>reproduce/analysis</code> and + remove all its use cases (clearly marked).</p></li> + <li><p><strong>Input dataset</strong>: The input datasets are managed through the + <code>reproduce/analysis/config/INPUTS.conf</code> file. It is best to gather all + the information regarding all the input datasets into this one central + file. To ensure that the proper dataset is being downloaded and used + by the project, it is also recommended get an <a href="https://en.wikipedia.org/wiki/MD5">MD5 + checksum</a> of the file and include + that in <code>INPUTS.conf</code> so the project can check it automatically. The + preparation/downloading of the input datasets is done in + <code>reproduce/analysis/make/download.mk</code>. Have a look there to see how + these values are to be used. This information about the input datasets + is also used in the initial <code>configure</code> script (to inform the users), + so also modify that file. You can find all occurrences of the demo + dataset with the command below and replace it with your input's + dataset.</p> + + <pre><code>grep -ir wfpc2 ./*</code></pre></li> + <li><p><strong><code>README.md</code></strong>: Correct all the <code>XXXXX</code> place holders (name of your + project, your own name, address of your project's online/remote + repository, link to download dependencies and etc). Generally, read + over the text and update it where necessary to fit your project. Don't + forget that this is the first file that is displayed on your online + repository and also your colleagues will first be drawn to read this + file. Therefore, make it as easy as possible for them to start + with. Also check and update this file one last time when you are ready + to publish your project's paper/source.</p></li> + <li><p><strong>Verify outputs</strong>: During the initial customization checklist, you + disabled verification. This is natural because during the project you + need to make changes all the time and its a waste of time to enable + verification every time. But at significant moments of the project + (for example before submission to a journal, or publication) it is + necessary. When you activate verification, before building the paper, + all the specified datasets will be compared with their respective + checksum and if any file's checksum is different from the one recorded + in the project, it will stop and print the problematic file and its + expected and calculated checksums. First set the value of + <code>verify-outputs</code> variable in + <code>reproduce/analysis/config/verify-outputs.conf</code> to <code>yes</code>. Then go to + <code>reproduce/analysis/make/verify.mk</code>. The verification of all the files + is only done in one recipe. First the files that go into the + plots/figures are checked, then the LaTeX macros. Validation of the + former (inputs to plots/figures) should be done manually. If its the + first time you are doing this, you can see two examples of the dummy + steps (with <code>delete-me</code>, you can use them if you like). These two + examples should be removed before you can run the project. For the + latter, you just have to update the checksums. The important thing to + consider is that a simple checksum can be problematic because some + file generators print their run-time date in the file (for example as + commented lines in a text table). When checking text files, this + Makefile already has this function: + <code>verify-txt-no-comments-leading-space</code>. As the name suggests, it will + remove comment lines and empty lines before calculating the MD5 + checksum. For FITS formats (common in astronomy, fortunately there is + a <code>DATASUM</code> definition which will return the checksum independent of + the headers. You can use the provided function(s), or define one for + your special formats.</p></li> + <li><p><strong>Feedback</strong>: As you use Maneage you will notice many things that if + implemented from the start would have been very useful for your + work. This can be in the actual scripting and architecture of Maneage, + or useful implementation and usage tips, like those below. In any + case, please share your thoughts and suggestions with us, so we can + add them here for everyone's benefit.</p></li> + <li><p><strong>Re-preparation</strong>: Automatic preparation is only run in the first run + of the project on a system, to re-do the preparation you have to use + the option below. Here is the reason for this: when its necessary, the + preparation process can be slow and will unnecessarily slow down the + whole project while the project is under development (focus is on the + analysis that is done after preparation). Because of this, preparation + will be done automatically for the first time that the project is run + (when <code>.build/software/preparation-done.mk</code> doesn't exist). After the + preparation process completes once, future runs of <code>./project make</code> + will not do the preparation process anymore (will not call + <code>top-prepare.mk</code>). They will only call <code>top-make.mk</code> for the + analysis. To manually invoke the preparation process after the first + attempt, the <code>./project make</code> script should be run with the + <code>--prepare-redo</code> option, or you can delete the special file above.</p> + + <pre><code>./project make --prepare-redo</code></pre></li> + <li><p><strong>Pre-publication</strong>: add notice on reproducibility**: Add a notice + somewhere prominent in the first page within your paper, informing the + reader that your research is fully reproducible. For example in the + end of the abstract, or under the keywords with a title like + "reproducible paper". This will encourage them to publish their own + works in this manner also and also will help spread the word.</p></li> + </ul> + + <p align="right">Next: <a href="about-tips.html">Tips for designing your project</a>, Previous: <a href="about-architecture.html">Maneage architecture</a>, Up: <a href="about.html">About</a> </p> + + + + + + <footer role="contentinfo" id="page-footer"> + <h2>Copyright information</h2> + + <p>This file is part of Maneage's core: <a href="https://git.maneage.org/project.git">https://git.maneage.org/project.git</a></p> + + <p>Maneage is free software: you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation, either version 3 of the License, or (at your option) + any later version.</p> + + <p>Maneage is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for more + details.</p> + + <p>You should have received a copy of the GNU General Public License along + with Maneage. If not, see <a href="https://www.gnu.org/licenses/">https://www.gnu.org/licenses/</a>.</p> + <ul> + <li><p>Maneage is currently based in the Instituto de Astrofísica de Canarias (IAC).</p></li> + <li><p>Address: IAC, Calle Vía Láctea, s/n, E38205 - La Laguna (Tenerife), Spain.</p></li> + <!-- The people page will be added later + <li><p>People [page will be added later]</p></li> + --> + <li><p>Contact: with <a href="https://savannah.nongnu.org/support/?func=additem&group=reproduce">this form.</a></p></li> + <li><p>Copyright © 2020 Maneage volunteers</p></li> + <li><p>All logos are copyrighted by the respective institutions</p></li> + </ul> + </footer> + </div> + </body> |