From 2c9e797a73fc5f6e2cfa5562ce0772497a6650a5 Mon Sep 17 00:00:00 2001 From: Pedram Ashofteh Ardakani Date: Wed, 29 Apr 2020 16:41:38 +0430 Subject: tutorial: Initial tutorial conversion https://gitlab.com/infantesainz/reproduce-raulfork/-/blob/tutorial/README-tutorial.md --- tutorial.html | 792 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 792 insertions(+) create mode 100644 tutorial.html diff --git a/tutorial.html b/tutorial.html new file mode 100644 index 0000000..0060235 --- /dev/null +++ b/tutorial.html @@ -0,0 +1,792 @@ +

Maneage tutorial

+ +

Copyright (C) 2020 Raul Infante-Sainz infantesainz@gmail.com\ +Copyright (C) 2020 Mohammad Akhlaghi mohammad@akhlaghi.org\ +See the end of the file for license conditions.

+ +

This document is a tutorial in which it is described how Maneage +(management + lineage) works in practice. It is highly recommended to read +the README-hacking.md in order to have a clear idea of what is this +project about. Actually, in this tutorial it is assumed you have the project +already set up and working properly. In order to do it, please, read and +follow all the steps described in the sections Customization checklist up +to the section Title, short description and author (including the last +one).

+ +

With the current tutorial, the reader will be able to have a fully +reproducible paper describing a small research example carried out step by +step. The research example is very simple: it will consist in analyse a +dataset with two columns (time and population). The analysis will be just to +make a linear fitting of the data, and then, write the results in a small +paragraph into the final paper.

+ +

In the following, the tutorial assume you have three different directories. +You had to set up them in the configure step:

+ + + +

IMPORTANT NOTE: the tutorial assume you are always in +project-directory when considering command lines.

+ +

In short: this hands on tutorial will guide you through a simple +research example in order to show the workflow in Maneage. The tutorial +describes by step how to download a small file containg data, analyse the +data (by making a linear fitting), and finally write a small paragraph with +the fitting parameters into the final paper. All of this will be done in the +same Makefile.

+ +

Installing available software: Matplotlib

+ +

If all steps above have been done successfully, you are ready to start +including your own analysis scripts. But, before that, let's install +Matplotlib Python package, which will be used later in the analysis of the +data when obtaining the linear fit figure. This Python package will be used +as an example on how to install programs that are already available in +Maneage. Just open the Makefile +reproduce/software/config/installation/TARGETS.mk and add to the +top-level-python line, the word matplotlib.

+ +

shell + # Python libraries/modules. + top-level-python = astropy matplotlib +

+ +

After that, run the configure step again with the option -e to continue +using the same configuration options given before (input and build +directories). Also, run the prepare and make steps:

+ +

```shell +$ ./project configure -e +$ ./project prepare +$ ./project make

+ +

Open 'paper.pdf' and see if everything is fine. Note that now, Matplotlib

+ +

is appearing in the software appendix at the end of the document. +```

+ +

Once you have verified that Matplotlib has been properly installed and it +appears into the final paper.pdf, you are ready to make the first commit +of the project. With the next commands, you will see which files have been +modified, what are the modifications, prepare them to be commited, and make +the commit. In the commit process, Git will open the text editor for +writting the commit message. Take into account that all changes commited +will be preserved in the history of your project. So, it is a good practice +to take some time to describe properly what have been done/changed/added. +Finally, as this is the very first commit of the project, tag this as the +zero-th version.

+ +

shell +$ git status # See which files have been changed. +$ git diff # See the lines you have modified. +$ git add -u # Put all tracked changes in staging area. +$ git status # Make sure everything is fine. +$ git commit # Your first commit, add a nice description. +$ git tag -a v0.0 # Tag this as the zero-th version of your project. +

+ +

Now, have a look at the Git history of the project. Note that the local +master branch is one commit above than the remote origin/master branch. +After that, push your first commit and its tag to your remote repository +with the next commands. Since you had setup your master branch to follow +origin/master, you can just use git push.

+ +

shell +$ git log --oneline --decorate --all --graph # Have a look at the Git history. +$ git push # Push the commit to the remote/origin. +$ git push --tags # Push all tags to the remote/origin. +

+ +

Now it is time to start including your own scripts to download and make the +analysis of the data. It is important to bear in mind that the goal of this +tutorial is to give a general view of the workflow in Maneage. In this +sense, only a few basic concepts about Make and how it is used into this +project will be given. Maneage is much more powerfull and much more things +than the ones showed in this tutorial can be done. So, read carefully all +the documentation and comments already available into each file, be creative +and experiment making your own research.

+ +

In the following, the tutorial will be focused in download the data, analyse +the data, and finally write the results into the final paper. As a +consequence, there are a lot of things already done that are not necessary. +For example, all the text of the final paper already written into the +paper.tex file, some Makefiles to download images from the Hubble Space +Telescope and analyse them, etc. In your own research, all of this work +would be removed. However, in this tutorial they are not removed because we +will only show how to do a simple analysis and include a small paragraph +with the result of the linear fitting.

+ +

In short: in this section you have learnt how to install available +software in Maneage. In this particular case, you installed Matplotlib

+ +

Including Python script to make the analysis

+ +

You are going to use a small Python script to make the analysis of the data. +This Python script will be invoked from a Makefile that will be set up +later. For now, we are going to just create the Python script and put it in +an appropiate location. All analysis scripts are kept into a subfolder with +the name of the same file type in reproduce/analysis. For example, the +Makefiles are saved into the make directory, and bash scripts are saved +into the bash directory. Since there is any python directory, create it +with the following command.

+ +

shell +$ mkdir reproduce/analysis/python +

+ +

After that, you need the Python script itself. The code is very simple: it +will take an input file containing two columns (year and population), the +name of the output file in which the parameters of the linear fit will be +saved, and the name of the figure showing the original data and the fitted +curve. Paste the next Python script into a new file named linear-fit.py +into the directory generated in the above step +(reproduce/analysis/python).

+ +

```

+ +

Make a linear fit of an input data set

+ +

#

+ +

This Python script makes a linear fitting of a data consisting in time and

+ +

population. It generates a figure in which the original data and the

+ +

fitted curve is plotted. Finally, it saves the fitting parameters.

+ +

Original author:

+ +

Copyright (C) 2020, Raul Infante-Sainz infantesainz@gmail.com

+ +

Contributing author(s):

+ +

Copyright (C) YEAR, YourName YourSurname.

+ +

#

+ +

This Python script is free software: you can redistribute it and/or modify it

+ +

under the terms of the GNU General Public License as published by the

+ +

Free Software Foundation, either version 3 of the License, or (at your

+ +

option) any later version.

+ +

#

+ +

This Python script is distributed in the hope that it will be useful, but

+ +

WITHOUT ANY WARRANTY; without even the implied warranty of

+ +

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General

+ +

Public License for more details. See http://www.gnu.org/licenses/.

+ +

Necessary packages

+ +

import sys +import numpy as np +import matplotlib.pyplot as plt +from scipy.optimize import curve_fit

+ +

Fitting function (linear fit)

+ +

def func(x, a, b): + return a * x + b

+ +

Define input and output arguments

+ +

ifile = sys.argv[1] # Input file +ofile = sys.argv[2] # Output file +ofig = sys.argv[3] # Output figure

+ +

Read the data from the input file.

+ +

data = np.loadtxt(ifile)

+ +

Time and population:

+ +

time ---------- x

+ +

population ---- y

+ +

x = data[:, 0] +y = data[:, 1]

+ +

Make the linear fit

+ +

params, pcov = curve_fit(func, x, y)

+ +

Make and save the figure

+ +

plt.clf() +plt.figure()

+ +

plt.plot(x, y, 'bo', label="Original data") +plt.plot(x, func(x, *params), 'r-', label="Fitted curve")

+ +

plt.title('Population along time') +plt.xlabel('Time (year)') +plt.ylabel('Population (million people)') +plt.legend() +plt.grid()

+ +

plt.savefig(ofig, format='PDF', bbox_inches='tight')

+ +

Save the fitting parameters

+ +

np.savetxt(ofile, params, fmt='%.3f') +```

+ +

Have a look at this Python script. At the very beginning, it has a block of +commented lines with a descriptive title, a small paragraph describing the +the script, and the copyright with the contact information. For each file, +it is very important to have such kind of meta-data. Below these lines, +there is the source code itself.

+ +

As it can be seen, this Python script (linear-fit.py) is designed to be +invoked from the command line in the following way.

+ +

shell +$ python /path/to/linear-fit.py /path/to/input.dat /path/to/output.dat /path/to/figure.pdf +

+ +

/path/to/input.dat is the input data file, /path/to/output.dat is the +output data file (with the fitted parameters), and /path/to/figure.pdf is +the plotted figure.

+ +

You will do this invokation inside of a Make rule (that will be set up +later). Now that you have included this Python script, make a commit in +order to save this work. With the first command you will see the files with +modifications. With the second command, you can check what are the changes. +Correct, add and modify whatever you want in order to include more +information, comments or clarify any step. After that, add the files and +commit the work. Finally, push the commit to the remote/origin.

+ +

shell +$ git status # See which files you have changed. +$ git diff # See the lines you have added/changed. +$ git add reproduce/analysis/python/linear-fit.py # Put all tracked changes in staging area. +$ git commit # Commit, add a nice descriptions. +$ git push # Push the commit to the remote/origin. +

+ +

Check that everything is fine having a look at the Git history of the +project. Note that the master branch has been increased in one commit, +while the template branch is behind.

+ +

shell +$ git log --oneline --decorate --all --graph # See the `Git` history. +

+ +

In short: in this section you have included a Python script that will +be used for making the linear fitting.

+ +

Downloading data

+ +

As it was said before, there are multiple things that are already included +into the project. One of them is to use a dedicated Makefile to manage all +necessary download of the input data +(reproduce/analysis/make/download.mk). By appropiate modifications of this +file, you would be able to download the necessary data. However, in order to +keep this tutorial as simple as possible, we will describe how to download +the data you need more explicity.

+ +

The data needed by this tutorial consist in a simple plain text file +containing two rows: time (year) and population (in million of people). This +data correspond to Spain, and it can be downloaded from this URL: +http://akhlaghi.org/data/template-tutorial/ESP.dat. But don't do that +using your browser, you have to do it into Maneage!

+ +

Let's create a Makefile for downloading the data. Later, you will also +include (in the same Makefile) the necessary work in order to make the +analysis. Save this Makefile in the dedicated directory +(reproduce/analysis/make) with the name getdata-analysis.mk. In that +Makefile, paste the following code.

+ +

```

+ +

Download data for the tutorial

+ +

#

+ +

In this Makefile, data for the tutorial is downloaded.

+ +

#

+ +

Copyright (C) 2020 Raul Infante-Sainz infantesainz@gmail.com

+ +

Copyright (C) YYYY Your Name your-email@example.xxx

+ +

#

+ +

This Makefile is free software: you can redistribute it and/or modify it

+ +

under the terms of the GNU General Public License as published by the

+ +

Free Software Foundation, either version 3 of the License, or (at your

+ +

option) any later version.

+ +

#

+ +

This Makefile is distributed in the hope that it will be useful, but

+ +

WITHOUT ANY WARRANTY; without even the implied warranty of

+ +

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General

+ +

Public License for more details. See http://www.gnu.org/licenses/.

+ +

Download data for the tutorial

+ +

------------------------------

+ +

# +pop-data = $(indir)/ESP.dat +$(pop-data): | $(indir) + wget http://akhlaghi.org/data/template-tutorial/ESP.dat -O $@

+ +

Final TeX macro

+ +

---------------

+ +

#

+ +

It is very important to mention the address where the data were

+ +

downloaded in the final report.

+ +

$(mtexdir)/getdata-analysis.tex: $(pop-data) | $(mtexdir) + echo "\newcommand{\popurl}{http://akhlaghi.org/data/template-tutorial}" > $@ +```

+ +

Have a look at this Makefile and see the different parts. The first line is +a descriptive title. Below, include your name, contact email, and finally, +the copyright. Please, take your time in order to add all relevant +information in each Makefile you modify. As you can see, these lines start +with # because they are comments.

+ +

After that information, there are five white lines in order to separate the +different parts. Then, you have the Make rule to download the data. Remember +the general structure of a Make rule:

+ +

+TARGETS: PREREQUISITES + RECIPE +

+ +

In a rule, it is said how to construct the TARGETS from the +PREREQUISITES, following the RECIPE. Note that the white space at the +beginning of the RECIPE are not spaces but a single TAB. Take into +account this if you copy/paste the code.

+ +

Now you can see this structure in our particular case:

+ +

+$(pop-data): | $(indir) + wget http://akhlaghi.org/data/template-tutorial/ESP.dat -O $@ +

+ +

Here we have:

+ + + +

With this, you have included the rule that will download the data. Now, to +finish, you have to specify what is the final purpose of the Makefile: +download that data! This is done by setting $(pop-data) as a prerequisite +of the final rule. Remember that each Makefile will build a final target +with the same name as the Makefile, but with the extension .tex. As a +consequence, they will be TeX macros in which relevant information to be +included into the final paper are saved . Here, you are saving the URL.

+ +

+$(mtexdir)/getdata-analysis.tex: $(pop-data) | $(mtexdir) + echo "\\newcommand{\\popurl}{http://akhlaghi.org/data/template-tutorial}" > $@ +

+ +

In this final rule we have:

+ + + +

Only one step is remaining to finally make the download of the data. You +have to add the name (without the extension .mk) of this Makefile into the +reproduce/analysis/make/top-make.mk Makefile. There it is defined which +Makefiles have to be executed. You have to end up having:

+ +

+makesrc = initialize \ + download \ + getdata-analyse \ + delete-me \ + paper +

+ +

As allways, read carefully all comments and information in order to know +what is going ong. Also, add your own comments and information in order to +be clear and explain each step with enough level of detail. If everything is +fine, now the project is ready to download the data in the make step. Try +it!

+ +

shell +$ ./project make +

+ +

Hopefully, it will download and save the file into the folder called +inputs under the build-directory. Check that it is there, and also have +a look at the TeX macro in order to see that the new command has been +included, it is into the top-build directory: +build-directory/tex/macros/getdata-analysis.tex.

+ +

Now that all of this changes have been included and it works fine, it is +time to check little by little everything and make a commit order to save +this work. Remember to put a good commit title and a nice commit message +describing what you have done and why. Then, push the commit to the +remote/origin.

+ +

Congratulations! You have included you first Makefile and the data is now +ready to be analysed!

+ +

In short, to download the data you did the following:

+ + + +

Adding the analysis rule

+ +

Until this point, you have included the Python script that will do the +linear fitting, and the rule for downloading the data. Now, it is necessary +to construct the Make rule in which this Python script is invoked to do the +analysis. This rule will be put in the same Makefile you have already +generated for downloading the data. But, before this, define the directory +in which the target is going to be saved.

+ +

+odir = $(BDIR)/fit-parameters +

+ +

This is a folder under the build-directory called fit-parameters. After +that, define the target: a plain text file in which the linear fit +parameters are saved (by the Python script). Put it into the previously +defined directory. As the data is from Spain, name it ESP.txt.

+ +

+param-file = $(odir)/ESP.txt +

+ +

Now, include a rule to construct the output directory odir. This is +necessary because this directory is needed for saving the file ESP.txt.

+ +

+$(odir): + mkdir $@ +

+ +

With all the previous definitions, now it is possible to set the rule for +making the analysis:

+ +

+$(param-file): $(indir)/ESP.dat | $(odir) + python reproduce/analysis/python/linear-fit.py $< $@ $(odir)/ESP.pdf +

+ +

In this rule you have:

+ + + +

Finally, in order to indicate you want to obtain the target you have just +included ($(param-file)), it is necessary to add it as a prerequisite of +the final TARGET $(mtexdir)/linear-fit.tex. So, in the last rule (which +creates the TeX macro), remove $(pop-data) and put $(param-file) +instead. By doing this, you are telling to the Makefile that you want to +obtain the file in which it is saved the fitted parameters. Inside of the +rule, define a couple of bash variables (a and b) that are the fitted +parameters extracted from the prerequisite. For a:

+ +

+a=$$(cat $< | awk 'NR==1{print $1}') +

+ +

Similarly, for obtaining the parameter b (which is in the second row):

+ +

+b=$$(cat $< | awk 'NR==2{print $1}') +

+ +

Then you have to specify the new TeX commands for these two parameters, +just write them as it was done before for the URL:

+ +

``` +echo "\newcommand{\afitparam}{$$a}" >> $@ +echo "\newcommand{\bfitparam}{$$b}" >> $@

+ +

```

+ +

So, at the end you will have the final rule like this:

+ +

``` +$(mtexdir)/getdata-analysis.tex: $(param-file) | $(mtexdir)

+ +
    echo "\\newcommand{\\popurl}{http://akhlaghi.org/data/template-tutorial}" > $@
+
+    a=$$(cat $< | awk 'NR==1{print $1}')
+    b=$$(cat $< | awk 'NR==2{print $1}')
+
+    echo "\newcommand{\afitparam}{$$a}" >> $@
+    echo "\newcommand{\bfitparam}{$$b}" >> $@
+
+ +

```

+ +

Important notes: you have to use two $ in order to use the bash $ +character inside of a Make rule. Also, note that you have to put >> in +order to not create a new target each time you write someting into the +target. With the double > it will only add the line at the end of the file +without generating a new file.

+ +

With all the above modifications, you are ready to obtain the fitting +parameters. If you add the necessary comments and information, the final +Makefile would look similar to:

+ +

```

+ +

Download data and linear fitting for the tutorial

+ +

#

+ +

In this Makefile, data for the tutorial is downloaded. Then, a Python

+ +

script is used to make a linear fitting. Finally, fitted parameters as

+ +

well as the URL is saved into a TeX macro.

+ +

#

+ +

Copyright (C) 2020 Raul Infante-Sainz infantesainz@gmail.com

+ +

Copyright (C) YYYY Your Name your-email@example.xxx

+ +

#

+ +

This Makefile is free software: you can redistribute it and/or modify it

+ +

under the terms of the GNU General Public License as published by the

+ +

Free Software Foundation, either version 3 of the License, or (at your

+ +

option) any later version.

+ +

#

+ +

This Makefile is distributed in the hope that it will be useful, but

+ +

WITHOUT ANY WARRANTY; without even the implied warranty of

+ +

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General

+ +

Public License for more details. See http://www.gnu.org/licenses/.

+ +

Download data for the tutorial

+ +

------------------------------

+ +

#

+ +

The input file is defined and downloaded using the following rule

+ +

pop-data = $(indir)/ESP.dat +$(pop-data): | $(indir) + # Use wget to download the data + wget http://akhlaghi.org/data/template-tutorial/ESP.dat -O $@

+ +

Output directory

+ +

----------------

+ +

#

+ +

Small rule for constructing the output directory, previously defined

+ +

odir = $(BDIR)/fit-parameters +$(odir): + # Build the output directory + mkdir $@

+ +

Linear fitting of the data

+ +

--------------------------

+ +

#

+ +

The output file is defined into the output directory. The fitted

+ +

parameters will be saved into this directory by the Python script.

+ +

param-file = $(odir)/ESP.txt +$(param-file): $(indir)/ESP.dat | $(odir) + # Invoke Python to run the script with the input data + python reproduce/analysis/python/linear-fit.py $< $@ $(odir)/ESP.pdf

+ +

TeX macros final target

+ +

-----------------------

+ +

#

+ +

This is how we write the necessary parameters in the final PDF. In this

+ +

rule, new TeX parameters are defined from the URL, and the fitted

+ +

parameters.

+ +

$(mtexdir)/getdata-analysis.tex: $(param-file) | $(mtexdir)

+ +
    # Write the URL into the target
+    echo "\newcommand{\popurl}{http://akhlaghi.org/data/template-tutorial}" > $@
+
+    # Read the fitted parameters and save them into the target
+    a=$$(cat $< | awk 'NR==1{print $1}')
+    b=$$(cat $< | awk 'NR==2{print $1}')
+
+    echo "\newcommand{\afitparam}{$$a}" >> $@
+    echo "\newcommand{\bfitparam}{$$b}" >> $@
+
+ +

```

+ +

Have look at this Makefile and note that it is what it has been described +above. Take your time for making useful comments and modifying whatever you +think it is necessary. If everything is fine, now the project is ready to +download the data and make the linear fitting. Try it!

+ +

shell +$ ./project make +

+ +

Hopefully, now you will have the fitted parameters into the +build-directory/fit-parameters/ESP.txt file, and the figure in the same +directory. Do not pay to much attention at the quality of the fitting. It is +just an example. Also, check that the TeX macro has been created +successfully by having a look at +build-directory/tex/macros/getdata-analyse.tex. Finally, now that you have +ensured that everything is fine, make a commit in order to keep the work +safe. In the next step, you will see how to include this data into the final +paper.

+ +

In short: with the work included in this section, the project is able to +download and make the linear fitting of the data. The result is the fitted +parameters that are also saved in a TeX macro, and the figure showing the +data with the fitted curve.

+ +

Editing the final paper

+ +

With all the previous work, the project is able to download the file +containing the data (two columns, year and population of Spain), and analyse +them by making a linear fitting (y=ax+b). The result is a TeX macro in +which there are the information about the URL of the data and the linear +fitting parameters (a and b). Now, it is time to add a small paragraph +into the paper, just to ilustrate how to write the relevant parameters from +the analysis.

+ +

Before all, make a copy of the current paper.pdf document you have into +the project-directory. This paper is an example that Maneage constructs +by default. Now, you will modify it by adding a small paragraph including +the fitting parameters and the URL. So, open project-directory/paper.tex +and add the following paragraph just at the beginning of the abstract +section.

+ +

+By following the steps described in the tutorial, I have been able to obtain this reproducible paper! +The project is very simple and it consists in download a file (from \popurl), and make an easy linear fit using a Python script. +The linear fitting is $y=a*x+b$, with the following parameters: $a=\afitparam$ and $b=\bfitparam$ +

+ +

As you can see, the TeX definitions done before in the Makefiles, are now +included into the paper: \popurl, \afitparam, and \bfitparam. If you +do again the make step $ ./project make, you will re-compile the paper +including this paragraph. Check that it is true and compare with the +previous version, of the paper. Contratulations! You have complete this +tutorial and now you are able to use Maneage for making your exciting +research in a reproducible way!

+ +

Copyright information

+ +

This file is part of the reproducible paper template + http://savannah.nongnu.org/projects/reproduce

+ +

This template is free software: you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the Free +Software Foundation, either version 3 of the License, or (at your option) +any later version.

+ +

This template is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details.

+ +

You should have received a copy of the GNU General Public License along +with Template. If not, see https://www.gnu.org/licenses/.

-- cgit v1.2.1