Age | Commit message (Collapse) | Author | Lines |
|
When building Maneage inside a Docker container, in the end the users want
to extract the final outputs from the container into their host operating
system to inspect more comfortably. So with this commit, a short
examplanation has been added on how to do this.
We also noticed that it is much better if the 'Dockerfile' is stored and
run in an empty directory, otherwise, it will start parsing the full
directory and its subdirectories as the docker image's environment.
|
|
Docker is a "container" technology that allows an almost independent
operating system run on the host. It is useful when the host OS doesn't
support some features or has internal problems (for example its C library
or C compiler have problems). Fortunately a Maneaged project can easily be
built within a Docker image and a minimal image operating system.
With this commit, a section has been added to 'README.md' to describe this
process. Each step of the Dockerfile is explined, to help users that may
not be too familiar with Docker, or help Docker user who are not familiar
with Maneage.
|
|
Possible semantic conflicts (that may not show up as Git conflicts but may
cause a crash in your project after the merge):
1) The project title (and other basic metadata) should be set in
'reproduce/analysis/conf/metadata.conf'. Please include this file in
your merge (if it is ignored because of '.gitattributes'!).
2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if
you have added all analysis Makefiles to the '.gitattributes' file
(thus not merging any change in them with your branch). For example
with this command:
git diff master...maneage -- reproduce/analysis/make/initialize.mk
3) The old 'verify-txt-no-comments-leading-space' function has been
replaced by 'verify-txt-no-comments-no-space'. The new function will
also remove all white-space characters between the columns (not just
white space characters at the start of the line). Thus the resulting
check won't involve spacing between columns.
A common set of steps are always necessary to prepare a project for
publication. Until now, we would simply look at previous submissions and
try to follow them, but that was prone to errors and could cause
confusion. The internal infrastructure also didn't have some useful
features to make good publication possible. Now that the submission of a
paper fully devoted to the founding criteria of Maneage is complete
(arXiv:2006.03018), it was time to formalize the necessary steps for easier
submission of a project using Maneage and implement some low-level features
that can make things easier.
With this commit a first draft of the publication checklist has been added
to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018
and zenodo.3872248. To help guide users on implementing the good practices
for output datasets, the outputs of the default project shown in the paper
now use the new features). After reading the checklist, please inspect
these.
Some other relevant changes in this commit:
- The publication involves a copy of the necessary software
tarballs. Hence a new target ('dist-software') was also added to
package all the project's software tarballs in one tarball for easy
distribution.
- A new 'dist-lzip' target has been defined for those who want to
distribute an Lzip-compressed tarball.
- The '\includetikz' LaTeX macro now has a second argument to allow
configuring the '\includegraphics' call when the plot should not be
built, but just imported.
|
|
In time, some of the copyright license description had been mistakenly
shortened to two paragraphs instead of the original three that is
recommended in the GPL. With this commit, they are corrected to be exactly
in the same three paragraph format suggested by GPL.
The following files also didn't have a copyright notice, so one was added
for them:
reproduce/software/make/README.md
reproduce/software/bibtex/healpix.tex
reproduce/analysis/config/delete-me-num.conf
reproduce/analysis/config/verify-outputs.conf
|
|
Until now, the primary Maneage URLs were under GitLab, but since we now
have a dedicated URL and Git repository, its better to transfer to this as
soon as possible. Therefore with this commit, throughout Maneage, any place
that Maneage was referenced through GitLab has been corrected.
Please correct your project's remote to point to the new repository at
`git.maneage.org/project.git', and please make sure it follows the
`maneage' branch. There is no more `master' branch on Maneage.
|
|
Until now, throughout Maneage we were using the old name of "Reproducible
Paper Template". But we have finally decided to use Maneage, so to avoid
confusion, the name has been corrected in `README-hacking.md' and also in
the copyright notices.
Note also that in `README-hacking.md', the main Maneage branch is now
called `maneage', and the main Git remote has been changed to
`https://gitlab.com/maneage/project' (this is a new GitLab Group that I
have setup for all Maneage-related projects). In this repository there is
only one `maneage' branch to avoid complications with the `master' branch
of the projects using Maneage later.
|
|
Until now, the main commands to run the project were these: `./project
configure' (to build the software), `./project prepare' (to possibly
arrange input datasets and build special configuration Makefiles) and
finally `./project make' to run the project.
The main logic behind the "prepare" phase `top-prepare.mk' is to build
configuration files that can be fed into the "make" step and optimize its
operation. For example when the total number of necessary inputs for the
majority of the analysis is not as large as the total number of
inputs. With "prepare" (when necessary), you go through the raw inputs,
select the ones that are necessary for the rest of the project. The output
of `top-prepare.mk' is a configuration file (a Make variable) that keeps
the IDs (numbers, names, etc). That configuration file would then be used
in the `top-make.mk' to identify the lower level targets and allow optimal
project organization and management.
But the last two are both part of the analysis, and while they indeed need
different calls to Make to be executed, many projects don't actually need a
preparation phase: ultimately, its an implementation choice by the project
developers and doesn't concern the project users (or the developers when
they are running it).
To avoid confusing the users, or simply annoying them when a projet doesn't
need it, with this commit, the top-level `top-prepare.mk' and `top-make.mk'
Makefiles are called with the single `./project make' command and
`./project prepare' has been dropped. I noticed this while writing the
paper on this system.
|
|
Now that its 2020, its necessary to include this year in the copyright
statements.
|
|
In many real-world scenarios, `./project make' can really benefit from
having some basic information about the data before being run. For example
when quering a server. If we know how many datasets were downloaded and
their general properties, it can greatly optmize the process when we are
designing the solution to be run in `./project make'.
Therefore with this commit, a new phase has been added to the template's
design: `./project prepare'. In the raw template this is empty, because the
simple analysis done in the template doesn't warrant it. But everything is
ready for projects using the template to add preparation phases prior to
the analysis.
|
|
Until now, when the project's source was downloaded from something like
arXiv, in `README.md', we were instructing them to set the executable flags
of all the files that need it. But except for `./project', the reader
shouldn't have to worry about the project internals! Once its executable,
`./project' can easily fix the executable flags of all the files that need
it automatically.
With this commit, in `README.md', we just instruct the reader to set the
executable flag of `./project' and any other file that needs an executable
flag is given one at the start of the set of commands for `./project
configure'. In customized projects, if an author needs executable flags on
any other files, they can easily add it there without involving the user.
|
|
Konrad Hinsen pointed out that this part was missing from the instructions
in `README.md' after cloning. So it is added.
|
|
The two modifications to the LaTeX source of an arXiv-downloaded source
weren't rendered properly on Gitlab, so they are corrected to be in the
same line and not have a separate code-block.
|
|
Until now, we were assuming that the users would just clone the project in
Git. But after submitting arXiv:1909.11230, and trying to build directly
from the arXiv source, I noticed several problems that wouldn't allow users
to build it automatically. So I tried the build step by step and was able
to find a fix for the several issues that came up.
The scripting parts of the fix were primarily related to the fact that the
unpacked arXiv tarball isn't under version control, so some checks had to
be put there. Also, we wanted to make it easy to remove the extra files, so
an extra `--clean-texdit' option was added to `./project'.
Finally, some manual corrections were necessary (prior to running
`./project', which are now described in `README.md'. Most of the later
steps can be automated and we should do it later, I just don't have enough
time now.
|
|
Until now customizing it was a little more detailed, for example the
copyright statement wasn't generic and was about "this template". So the
user would have to correct it.
With this commit, the copyright statment just says "this project", so it
can apply to the raw template and also any customization of it. Also, some
minor edits were made in the various parts of the text to make it more
clear.
|
|
The Copyright year is now on a separate line (by adding a backslash), and
the `file-metadata' is now enclosed in two "`" characters to show
differently after rendering.
|
|
Until now, to work on a project, it was necessary to `./configure' it and
build the software. Then we had to run `.local/bin/make' to run the project
and do the analysis every time. If the project was a shared project between
many users on a large server, it was necessary to call the `./for-group'
script.
This way of managing the project had a major problem: since the user
directly called the lower-level `./configure' or `.local/bin/make' it was
not possible to provide high-level control (for example limiting the
environment variables). This was especially noticed recently with a bug
that was related to environment variables (bug #56682).
With this commit, this problem is solved using a single script called
`project' in the top directory. To configure and build the project, users
can now run these commands:
$ ./project configure
$ ./project make
To work on the project with other users in a group these commands can be
used:
$ ./project configure --group=GROUPNAME
$ ./project make --group=GROUPNAME
The old options to both configure and make the project are still valid. Run
`./project --help' to see a list. For example:
$ ./project configure -e --host-cc
$ ./project make -j8
The old `configure' script has been moved to
`reproduce/software/bash/configure.sh' and is called by the new `./project'
script. The `./project' script now just manages the options, then passes
control to the `configure.sh' script. For the "make" step, it also reads
the options, then calls Make. So in the lower-level nothing has
changed. Only the `./project' script is now the single/direct user
interface of the project.
On a parallel note: as part of bug #56682, we also found out that on some
macOS systems, the `DYLD_LIBRARY_PATH' environment variable has to be set
to blank. This is no problem because RPATH is automatically set in macOS
and the executables and libraries contain the absolute address of the
libraries they should link with. But having `DYLD_LIBRARY_PATH' can
conflict with some low-level system libraries and cause very hard to debug
linking errors (like that reported in the bug report).
This fixes bug #56682.
|
|
All occurances of "pipeline" have been chanaged to "project" or "template"
withint the text (comments, READMEs, and comments) of the template. The
main template branch is now also named `template'.
This was all because `pipeline' is too generic and couldn't be
distinguished from the base, and customized project.
|
|
Since `.file-metadata' is a binary file, we can't include a copyright
inside of it so we have to use `README.md' to mention its copyright and
license notice. However, this was not done clearly and is now corrected.
|
|
Until now, the files where the people were meant to change didn't have a
proper copyright notice (for example `Copyright (C) YOUR NAME.'). This was
wrong because the license does not convey copyright ownership. So the name
of the file's original author must always be included and when people
modify it (and add their own copyright-able modifications).
With this commit, the file's original author (and email) are added to the
copyright notice and when more than one person modified a file, both names
have their individual copyright notice.
Based on this, the description for adding a copyright notice in
`README-hacking.md' has also been modified.
|
|
Since `.file-metadata' is a binary file and we couldn't put a copyright
notice within it, it has been mentioned in `README.md' to have the same
copyright.
Also, the copyright modification step in `README-hacking.md' was brought to
a later step to be more clear that it should always be done (on new files
or files that are changed).
|
|
Until now, for short files, we only had a license notice, not an actual
copyright notice. With this commit, a copyright notice has also been
added. We use this new command to find these files, suggested by
`ineiev@gnu.org'.
|
|
In order to be more clear, a copyright statement was added to all the LaTeX
and README files.
|
|
To be more generic and recognizable, the `README-pipeline.md' script was
renamed to `README-hacking.md'. In essence, it is just that: to hack the
existing pipeline for your own project. We follow a similar naming
convention in many GNU software.
|
|
Until now, there was no reference to `README-pipeline.md' within the
`README.md' file. Since `README.md' is the first file that someone reads
and the basic perpose and structure of the pipeline is described in
`README-pipeline.md', it was necessary to bring it up there.
|
|
To help and be more clear a link to this pipeline's dependency repository
has been added to `README.md'.
|
|
The README.md file was updated to reflect recent changes in the pipeline
(especially regarding the downloader).
|
|
A spellcheck was run on the two README files.
|
|
The note to the pipeline designers was corrected to display properly on
Gitlab.
|
|
A placeholder link is now used in `README.md' to encourage the pipeline
designers to keep a backup of all the dependencies they use.
|
|
Until now, were were advising the users to rename the two README files
after cloning the project. This was because online Git browsers usually
display the `README.md' file, so we wanted the description of the pipeline
to be visible in the pipeline, and later when a project adopts it, they can
have their own `README.md'. But the problem is that any change in
`REAME.md' will later cause conflicts with a project's `README.md'. So we
are now using the same naming convention as the papers that use the
pipeline.
|
|
In the checklist, we are now defining the remote host of the repository at
an early stage. This is because we will need it in the `README.md' file
(which now has a placeholder `XXXXXXX' instead of a valid URL).
|
|
Until now, in the instructions, we were suggesting to run
`./.local/bin/make', but the `./' part is extra: this is already a
directory and so the shell will be able to find it. So to make things more
clear and easy to read/write, we removed the `./' part from the calls to
our custom Make installation.
|
|
When you point to this project, the `README.md' file is the default file
that opens on GitLab and other online git repositories. Since a
reproduction pipeline project is different from the actual pipeline, its
best for the default text that opens to describe the paper, not the
pipeline. The old `README.md' is also kept, but its now called
`REAME-pipeline.md'.
|
|
In the previous commit, we were recommending to fetch the work from this
pipeline. But since we have a separate `pipeline' branch, we can simply
checkout to that branch and pull all the recent changes. So with this
commit, the steps to get recent updates to the pipeline are updated.
|
|
Since working on the pipeline will evolve along with the projects that use
it, it can be useful for projects to fetch updates in the pipeline. So the
checklist in `README.md' updated to explain how to do this cleanly.
|
|
Until now, because we didn't build the dependencies internally, it was
important for the pipeline to be usable with any version of Make. But
because of the new installation of dependencies (including GNU Make), that
is no longer the case. So we can safely use GNU Make and this needs to be
mentioned in `README.md'.
|
|
GNU Coreutils are basic programs that can help in the configuration of
higher-level programs. Because of that, it was a dependency of almost all
software built in `dependencies.mk'. To make things more clear, easier to
read and faster (when building in parallel), the building of Coreutils is
now moved to the `dependencies-basic.mk' rules. There, it is built
along-side Bash. Since `dependenceis-basic.mk' is run and completed before
`dependencies.mk', with this, we can be sure that Coreutils is present by
the time we want to build the higher-level programs.
Also, Zlib is now added as a dependency of Git also (it is necessary for
its build).
|
|
After going through the checklist for starting a new project based on the
pipeline, I noticed some parts that could be modified to be more
clear. They are now applied.
|
|
Until now, we were using a customized `tar.lz' tarball for Gzip. But on
systems that don't have GNU Tar, this will cause a problem (non-GNU Tar
doesn't recognize `.tar.lz'). So to keep things simple, we are using the
customized gzip in `tar.gz' format. After the internal build of GNU Tar and
Lzip, the default method of unpacking (`tar xf XXXXX.tar.XX') will work
nicely on all the standard compression algorithms and we don't have to
modify our commands based on the algorithm (nice feature of GNU Tar).
|
|
The two README files have been updated to explain the new feature of
downloading and building dependencies.
|
|
All the used software are now acknowledged in the template paper along with
their versions. This section is also mentioned in the check list, so users
don't delete it by mistake.
|
|
The version of all programs is now checked in
`reproduce/make/src/initialize.mk' and the pipeline won't complete if any
of the program versions change from those listed in
`reproduce/config/pipeline/dependency-versions.mk'.
Since the pipeline is systematically checking all program versions, we
don't need Gnuastro's `--onlyversion' option any more. So it (and all
references to it) have been removed.
|
|
To enable easy/proper reproduction of results, all the high-level
dependencies are now built within the pipeline and installed in a fixed
directory that is added to the PATH of the Makefile. This includes GNU Bash
and GNU Make, which are then used to run the pipeline.
The `./configure' script will first build Bash and Make within itself, then
it will build
All the dependencies are also built to be static. So after they are built,
changing of the system's low-level libraries (like C library) won't change
the tarballs.
Currently the C library and C compiler aren't built within the pipeline,
but we'll hopefully add them to the build process also.
With this change, we now have full control of the shell and Make that will
be used in the pipeline, so we can safely remove some of the generalities
we had before.
|
|
After a full trial of the checklist, some further minor edits were made to
make it more clear.
|
|
Some minor changes were made to be more clear.
|
|
A few minor points were corrected in README.md.
|
|
A step was added close to the top of the checklist to remind people to
check the pipeline before making any changes. Also, the `--origin' option
was removed from the `git clone' command into a separate command to rename
the origin branch. This helps in readability.
|
|
Until now, in the check list of `README.md', we were recommending to delete
the history of the pipeline and start your own history from that. But this
disables users of the pipeline to keep it up to date with new features that
are added to it.
With this commit, the main branch is now called `pipeline' (to allow users
to use `master' for their own research) and in the clone command, the
pipeline's remote is now called `pipeline-origin' (to allow the user to use
`origin' for their own remote).
|
|
While trying the checklist, I noticed that I had forgot to add my name
after the copyright year and that `reproduce/src/make/paper.mk' still had
my own name on it, the copyright notice also said `script' instead of
`Makefile' which is now corrected.
|
|
While testing the reproduction pipeline on a small project, I noticed some
parts of the checklist that were either repetative or needed to be
corrected. This is done with this commit.
|