Next: Tips for designing your project, Previous: Maneage architecture, Up: About
Take the following steps to fully customize Maneage for your research
project. After finishing the list, be sure to run ./project configure
and
project make
to see if everything works correctly. If you notice anything
missing or any in-correct part (probably a change that has not been
explained here), please let us know to correct it.
As described above, the concept of reproducibility (during a project) heavily relies on version control. Currently Maneage uses Git as its main version control system. If you are not already familiar with Git, please read the first three chapters of the ProGit book which provides a wonderful practical understanding of the basics. You can read later chapters as you get more advanced in later stages of your work.
Get this repository and its history (if you don't already have it):
Arguably the easiest way to start is to clone Maneage and prepare for
your customizations as shown below. After the cloning first you rename
the default origin
remote server to specify that this is Maneage's
remote server. This will allow you to use the conventional origin
name for your own project as shown in the next steps. Second, you will
create and go into the conventional master
branch to start
committing in your project later.
git clone https://git.maneage.org/project.git # Clone/copy the project and its history.
mv project my-project # Change the name to your project's name.
cd my-project # Go into the cloned directory.
git remote rename origin origin-maneage # Rename current/only remote to "origin-maneage".
git checkout -b master # Create and enter your own "master" branch.
pwd # Just to confirm where you are.
Prepare to build project: The ./project configure
command of the
next step will build the different software packages within the
"build" directory (that you will specify). Nothing else on your system
will be touched. However, since it takes long, it is useful to see
what it is being built at every instant (its almost impossible to tell
from the torrent of commands that are produced!). So open another
terminal on your desktop and navigate to the same project directory
that you cloned (output of last command above). Then run the following
command. Once every second, this command will just print the date
(possibly followed by a non-existent directory notice). But as soon as
the next step starts building software, you'll see the names of
software get printed as they are being built. Once any software is
installed in the project build directory it will be removed. Again,
don't worry, nothing will be installed outside the build directory.
# On another terminal (go to top project source directory, last command above)
./project --check-config
Test Maneage: Before making any changes, it is important to test it
and see if everything works properly with the commands below. If there
is any problem in the ./project configure
or ./project make
steps,
please contact us to fix the problem before continuing. Since the
building of dependencies in configuration can take long, you can take
the next few steps (editing the files) while its working (they don't
affect the configuration). After ./project make
is finished, open
paper.pdf
. If it looks fine, you are ready to start customizing the
Maneage for your project. But before that, clean all the extra Maneage
outputs with make clean
as shown below.
./project configure # Build the project's software environment (can take an hour or so).
./project make # Do the processing and build paper (just a simple demo).
# Open 'paper.pdf' and see if everything is ok.
Setup the remote: You can use any hosting
facility
that supports Git to keep an online copy of your project's version
controlled history. We recommend GitLab because
it is more ethical (although not
perfect),
and later you can also host GitLab on your own server. Anyway, create
an account in your favorite hosting facility (if you don't already
have one), and define a new project there. Please make sure the newly
created project is empty (some services ask to include a README
in
a new project which is bad in this scenario, and will not allow you to
push to it). It will give you a URL (usually starting with git@
and
ending in .git
), put this URL in place of XXXXXXXXXX
in the first
command below. With the second command, "push" your master
branch to
your origin
remote, and (with the --set-upstream
option) set them
to track/follow each other. However, the maneage
branch is currently
tracking/following your origin-maneage
remote (automatically set
when you cloned Maneage). So when pushing the maneage
branch to your
origin
remote, you shouldn't use --set-upstream
. With the last
command, you can actually check this (which local and remote branches
are tracking each other).
git remote add origin XXXXXXXXXX # Newly created repo is now called 'origin'.
git push --set-upstream origin master # Push 'master' branch to 'origin' (with tracking).
git push origin maneage # Push 'maneage' branch to 'origin' (no tracking).
Title, short description and author: The title and basic
information of your project's output PDF paper should be added in
paper.tex
. You should see the relevant place in the preamble (prior
to \begin{document}
. After you are done, run the ./project make
command again to see your changes in the final PDF, and make sure that
your changes don't cause a crash in LaTeX. Of course, if you use a
different LaTeX package/style for managing the title and authors (in
particular a specific journal's style), please feel free to use it
your own methods after finishing this checklist and doing your first
commit.
Delete dummy parts: Maneage contains some parts that are only for the initial/test run, mainly as a demonstration of important steps, which you can use as a reference to use in your own project. But they not for any real analysis, so you should remove these parts as described below:
paper.tex
: 1) Delete the text of the abstract (from
\includeabstract{
to \vspace{0.25cm}
) and write your own (a
single sentence can be enough now, you can complete it later). 2)
Add some keywords under it in the keywords part. 3) Delete
everything between %% Start of main body.
and %% End of main
body.
. 4) Remove the notice in the "Acknowledgments" section (in
\new{}
) and Acknowledge your funding sources (this can also be
done later). Just don't delete the existing acknowledgment
statement: Maneage is possible thanks to funding from several
grants. Since Maneage is being used in your work, it is necessary to
acknowledge them in your work also.
reproduce/analysis/make/top-make.mk
: Delete the delete-me
line
in the makesrc
definition. Just make sure there is no empty line
between the download \
and verify \
lines (they should be
directly under each other).
reproduce/analysis/make/verify.mk
: In the final recipe, under the
commented line Verify TeX macros
, remove the full line that
contains delete-me
, and set the value of s
in the line for
download
to XXXXX
(any temporary string, you'll fix it in the
end of your project, when its complete).
Delete all delete-me*
files in the following directories:
rm tex/src/delete-me*
rm reproduce/analysis/make/delete-me*
rm reproduce/analysis/config/delete-me*
Disable verification of outputs by removing the yes
from
reproduce/analysis/config/verify-outputs.conf
. Later, when you are
ready to submit your paper, or publish the dataset, activate
verification and make the proper corrections in this file (described
under the "Other basic customizations" section below). This is a
critical step and only takes a few minutes when your project is
finished. So DON'T FORGET to activate it in the end.
Re-make the project (after a cleaning) to see if you haven't introduced any errors.
./project make clean
./project make
Don't merge some files in future updates: As described below, you
can later update your infra-structure (for example to fix bugs) by
merging your master
branch with maneage
. For files that you have
created in your own branch, there will be no problem. However if you
modify an existing Maneage file for your project, next time its
updated on maneage
you'll have an annoying conflict. The commands
below show how to fix this future problem. With them, you can
configure Git to ignore the changes in maneage
for some of the files
you have already edited and deleted above (and will edit below). Note
that only the first echo
command has a >
(to write over the file),
the rest are >>
(to append to it). If you want to avoid any other
set of files to be imported from Maneage into your project's branch,
you can follow a similar strategy. We recommend only doing it when you
encounter the same conflict in more than one merge and there is no
other change in that file. Also, don't add core Maneage Makefiles,
otherwise Maneage can break on the next run.
echo "paper.tex merge=ours" > .gitattributes
echo "tex/src/delete-me.mk merge=ours" >> .gitattributes
echo "tex/src/delete-me-demo.mk merge=ours" >> .gitattributes
echo "reproduce/analysis/make/delete-me.mk merge=ours" >> .gitattributes
echo "reproduce/software/config/TARGETS.conf merge=ours" >> .gitattributes
echo "reproduce/analysis/config/delete-me-num.conf merge=ours" >> .gitattributes
git add .gitattributes
Copyright and License notice: It is necessary that all the
"copyright-able" files in your project (those larger than 10 lines)
have a copyright and license notice. Please take a moment to look at
several existing files to see a few examples. The copyright notice is
usually close to the start of the file, it is the line starting with
Copyright (C)
and containing a year and the author's name (like the
examples below). The License notice is a short description of the
copyright license, usually one or two paragraphs with a URL to the
full license. Don't forget to add these two notices to any new
file you add in your project (you can just copy-and-paste). When you
modify an existing Maneage file (which already has the notices), just
add a copyright notice in your name under the existing one(s), like
the line with capital letters below. To start with, add this line with
your name and email address to paper.tex
,
tex/src/preamble-header.tex
, reproduce/analysis/make/top-make.mk
,
and generally, all the files you modified in the previous step.
Copyright (C) 2018-2020 Existing Name <existing@email.address>
Copyright (C) 2020 YOUR NAME <YOUR@EMAIL.ADDRESS>
Configure Git for fist time: If this is the first time you are
running Git on this system, then you have to configure it with some
basic information in order to have essential information in the commit
messages (ignore this step if you have already done it). Git will
include your name and e-mail address information in each commit. You
can also specify your favorite text editor for making the commit
(emacs
, vim
, nano
, and etc.).
git config --global user.name "YourName YourSurname"
git config --global user.email your-email@example.com
git config --global core.editor nano
Your first commit: You have already made some small and basic
changes in the steps above and you are in your project's master
branch. So, you can officially make your first commit in your
project's history and push it. But before that, you need to make sure
that there are no problems in the project. This is a good habit to
always re-build the system before a commit to be sure it works as
expected.
git status # See which files you have changed.
git diff # Check the lines you have added/changed.
./project make # Make sure everything builds successfully.
git add -u # Put all tracked changes in staging area.
git status # Make sure everything is fine.
git diff --cached # Confirm all the changes that will be committed.
git commit # Your first commit: put a good description!
git push # Push your commit to your remote.
Start your exciting research: You are now ready to add flesh and blood to this raw skeleton by further modifying and adding your exciting research steps. You can use the "published works" section in the introduction (above) as some fully working models to learn from. Also, don't hesitate to contact us if you have any questions.
High-level software: Maneage installs all the software that your
project needs. You can specify which software your project needs in
reproduce/software/config/TARGETS.conf
. The necessary software are
classified into two classes: 1) programs or libraries (usually written
in C/C++) which are run directly by the operating system. 2) Python
modules/libraries that are run within Python. By default
TARGETS.conf
only has GNU Astronomy Utilities (Gnuastro) as one
scientific program and Astropy as one scientific Python module. Both
have many dependencies which will be installed into your project
during the configuration step. To see a list of software that are
currently ready to be built in Maneage, see
reproduce/software/config/versions.conf
(which has their versions
also), the comments in TARGETS.conf
describe how to use the software
name from versions.conf
. Currently the raw pipeline just uses
Gnuastro to make the demonstration plots. Therefore if you don't need
Gnuastro, go through the analysis steps in reproduce/analysis
and
remove all its use cases (clearly marked).
Input dataset: The input datasets are managed through the
reproduce/analysis/config/INPUTS.conf
file. It is best to gather all
the information regarding all the input datasets into this one central
file. To ensure that the proper dataset is being downloaded and used
by the project, it is also recommended get an MD5
checksum of the file and include
that in INPUTS.conf
so the project can check it automatically. The
preparation/downloading of the input datasets is done in
reproduce/analysis/make/download.mk
. Have a look there to see how
these values are to be used. This information about the input datasets
is also used in the initial configure
script (to inform the users),
so also modify that file. You can find all occurrences of the demo
dataset with the command below and replace it with your input's
dataset.
grep -ir wfpc2 ./*
README.md
: Correct all the XXXXX
place holders (name of your
project, your own name, address of your project's online/remote
repository, link to download dependencies and etc). Generally, read
over the text and update it where necessary to fit your project. Don't
forget that this is the first file that is displayed on your online
repository and also your colleagues will first be drawn to read this
file. Therefore, make it as easy as possible for them to start
with. Also check and update this file one last time when you are ready
to publish your project's paper/source.
Verify outputs: During the initial customization checklist, you
disabled verification. This is natural because during the project you
need to make changes all the time and its a waste of time to enable
verification every time. But at significant moments of the project
(for example before submission to a journal, or publication) it is
necessary. When you activate verification, before building the paper,
all the specified datasets will be compared with their respective
checksum and if any file's checksum is different from the one recorded
in the project, it will stop and print the problematic file and its
expected and calculated checksums. First set the value of
verify-outputs
variable in
reproduce/analysis/config/verify-outputs.conf
to yes
. Then go to
reproduce/analysis/make/verify.mk
. The verification of all the files
is only done in one recipe. First the files that go into the
plots/figures are checked, then the LaTeX macros. Validation of the
former (inputs to plots/figures) should be done manually. If its the
first time you are doing this, you can see two examples of the dummy
steps (with delete-me
, you can use them if you like). These two
examples should be removed before you can run the project. For the
latter, you just have to update the checksums. The important thing to
consider is that a simple checksum can be problematic because some
file generators print their run-time date in the file (for example as
commented lines in a text table). When checking text files, this
Makefile already has this function:
verify-txt-no-comments-leading-space
. As the name suggests, it will
remove comment lines and empty lines before calculating the MD5
checksum. For FITS formats (common in astronomy, fortunately there is
a DATASUM
definition which will return the checksum independent of
the headers. You can use the provided function(s), or define one for
your special formats.
Feedback: As you use Maneage you will notice many things that if implemented from the start would have been very useful for your work. This can be in the actual scripting and architecture of Maneage, or useful implementation and usage tips, like those below. In any case, please share your thoughts and suggestions with us, so we can add them here for everyone's benefit.
Re-preparation: Automatic preparation is only run in the first run
of the project on a system, to re-do the preparation you have to use
the option below. Here is the reason for this: when its necessary, the
preparation process can be slow and will unnecessarily slow down the
whole project while the project is under development (focus is on the
analysis that is done after preparation). Because of this, preparation
will be done automatically for the first time that the project is run
(when .build/software/preparation-done.mk
doesn't exist). After the
preparation process completes once, future runs of ./project make
will not do the preparation process anymore (will not call
top-prepare.mk
). They will only call top-make.mk
for the
analysis. To manually invoke the preparation process after the first
attempt, the ./project make
script should be run with the
--prepare-redo
option, or you can delete the special file above.
./project make --prepare-redo
Pre-publication: add notice on reproducibility**: Add a notice somewhere prominent in the first page within your paper, informing the reader that your research is fully reproducible. For example in the end of the abstract, or under the keywords with a title like "reproducible paper". This will encourage them to publish their own works in this manner also and also will help spread the word.
Next: Tips for designing your project, Previous: Maneage architecture, Up: About