Age | Commit message (Collapse) | Author | Lines |
|
SUMMARY: just house-cleaning, no need to do anything major in your
branch. Just update the copyright years in files that you have added.
Until now, the latest copyright years of the whole Maneage source code was
2022! As of this commit, we have already moved to 2023 for 5 months!
Furthermore, there were a few other minor issues that needed correction:
- The URL to download input datasets wasn't quoted in 'initialize.mk' or
the download script! As a result, when the input URL had characters that
are meaningful to the shell (like '&'), the download command would not
work.
- The only program that had 'make check' in the 'basic.mk' programs was
MPFR. At that stage, we still haven't built our own compiler at this
stage, this is not accurate.
- The 'pyerfa' and 'extension-helpers' packages in Python need
'setuptools_scm' on some systems. But until now, it was not in the list
of their prerequisites.
With this commit, all the issues above have been corrected.
|
|
SUMMARY: Nothing special is necessary for your existing projects. This
commit just addds two new features (read the commit description for more):
1. To provide a user and password to servers that need authentication
before they allow downloading of proprietary data,
2. To use the FITS Standard's DATASUM for file verification (for cases
where the file is not static on the server, and is generated upon
receiving your download request).
Until now, Maneage didn't have any infrastructure for databases that
require authentication (through a user or password, when calling
'wget'). Furthermore, when the downloaded file is automatically generated
by the server upon request, the server usually adds metadata (like file
date, or query number and etc) in the header. Therefore the simple SHA256
checksum of the file would differ on every download! This made it very hard
to verify if the data (not headers) are unchanged.
With this commit, both these problems have been addressed:
- Server authentication: the 'reproduce/software/config/LOCAL.conf' now
contains three new variables for this purpose. With them, you can give
your username and password, along with the authentication method of the
server. The comments on top of these three variables give a full
description of their usage.
- Verifying only the data in a file (ignoring the headers): The
'reproduce/analysis/config/INPUTS.conf' now accepts two new optional
variables for each input file using the FITS standard's DATASUM
convention: 'INPUT-%-fitsdatasum' and 'INPUT-%-fitshdu'. If the SHA256
isn't specified for a file, Maneage will use these to verify the
file. With the latter, you specify the HDU of the data you want to
verify and with the former you give the DATASUM value for that HDU. As
the name suggests, this is only valid for FITS files. If we find other
formats that support a similar behavior, we can add this feature for
those formats also. This is also thoroughly discussed in the comments of
'reproduce/analysis/config/INPUTS.conf'.
This commit was done with the help of Pedram Ashofte Ardakani, Sepideh
Eskandarlou and Mohammadreza Khellat.
|
|
Until now, when some configuration files were generated in the preparation
phase, only the auto-generated (in the preparation phase) Makefiles (with
the suffix '*.mk') were always included in 'initialize.mk'. As a
consequence, if there were any configuration files (with suffix '*.conf'),
they would not be automatically added, and it was necessary to manually
include them.
Since auto-generated configuration files are also one common output of the
preparation phase of a project, it is better to include them automatically.
With this commit, the '*.conf' configuration files generated in the
preparation phase are added by 'initialize.mk' automatically (if
necessary).
In the process, the comments in the final target of
'reproduce/analysis/make/prepare.mk' were updated to be more clear.
|
|
SUMMARY: no special action should be necessary; but its an important update
in low-level Maneage infra-structure (related with downloading and setting
input checksums).
Until now, we had a separate 'download.mk' as one of the default
sub-Makefiles that should have been loaded in all the 'top-*.mk' files
after 'initialize.mk'. This was due to historic reasons: until Commit
91799fe4b6d, we had to manually make some changes in 'download.mk' for
every input file we defined in 'INPUTS.mk' (which was very inconvenient,
and not easily possible for a large number of files!). But since Commit
91799fe4b6d, those manual changes are no longer necessary, and a normal
user will hardly ever need to touch the contents of 'download.mk' (which
also had one effective rule).
Furthermore, based on shared projects with Zohre Ghaffari and Sepideh
Eskandarlou (which involved a large number of large files), we recognized
that it is very inconvenient to download a file once, update its checksum,
and re-run Maneage (so the validation works). A robust solution was
necesary to let project authors download the data and automatically update
the checksum.
With this commit, to help in high-level project management in Maneage, the
single, and generic rule of 'download.mk' has been moved to
'initialize.mk', enabling us to fully remove this extra sub-Makefile from
Maneage's source.
Furthermore, with this commit, a usable solution to the automatic updating
of the checksum has also been implemented (which has been described in the
comments of 'INPUTS.conf'): the users can now set the checksum to
'--auto-replace--'. In this case, the download rule (now in
'initialize.mk') will automatically update that line of 'INPUTS.conf' and
add the checksum instead.
After './project make' is complete, when the user runs 'git diff', they can
see all the updated checksums in the source of their project and commit the
updated 'INPUTS.conf' into the source so this will not be necessary later.
Two other smaller issues have also been addressed in this commit:
- There was an extra ',' in the call to 'filter-out' when we defined
'prepare-dep' in 'reproduce/analysis/make/prepare.mk'. This would cause
a crash (with Make complaining that there is no rule for target
'initialize.mk,': notice the extra ','). With this commit, that extra
',' has been removed and the problem was solved.
- The build recipe of Imfit (in 'reproduce/software/make/high-level.mk'),
had two SPACE characters after '--no-openmp' which would make the
reading hard. They have been updated to one SPACE.
|
|
Until now, there were several portability issues in Maneage:
1. Maneage would crash on older operating systems (checked on Debian 6),
where Wget didn't have the '--no-use-server-timestamps'.
2. On a Linux kernel 2.6.32 (of the same Debian 6 above) some features in
'util-linux' (like 'swapon' or 'libmount') wouldn't build and wouldn't
let 'util-linux' complete. These features need root permissions to be
useful, so the wouldn't be used in Maneage any way! But they wouldn't
let Maneage get built
3. The './project shell' command would still read the host's '~/.bashrc',
letting the host environment leak-in to Maneage's interactive shell.
4. The building of Flex 2.64 wouldn't complete due to a segmentation
fault an Ubuntu, but NetPBM (which depends on Flex) would crash with a
wrong usage of 'yyunput'. This had actually caused a non-update to
Flex in a previous Maneage software update.
5. The update Astrometry.net would assume SExtractor's executable name is
'source-extractor'; causing a crash in usage. This forced the users to
manually create a 'source-extractor' symbolic link in the '.local/bin'
directory.
6. The 'reproduce/software/shell/tarball-prepare.sh' script (that is used
for making Maneage-standard tarballs) wouldn't accept option values
with an '=' between the option name and value! It also didnt' print
sufficiently informative messages and errors (for example it would say
"skipping ..." (making the user think there is a problem!), but it was
actually that the file already existed!
7. The 'reproduce/analysis/make/prepare.mk' and
'reproduce/analysis/make/verify.mk' Makefiles that needed to reject
some of the 'makesrc' sub-Makefiles would simply substitute their names
with nothing. But this would cause problems when the name is part of
the name of another sub-Makefile.
8. On the Debian 6 system mentioned above the raw 'df' command's output
wasn't in the expected format; so Maneage would fail to properly detect
the free space in the disk.
With these commit, all the issues above have been solved: for 1, A check
has been added to avoid using that option. For 2, those 'util-linux'
features have been disabled. For 3, the '--norc' and '--noprofile' options
have beed added to the call to Bash. For 4, see below. For 5, the symbolic
link is now automatically made with SExtractor. For 6, the option reading
components of that script have been fully re-written and more robust sanity
checks are also added, with more informative warnings. For 7, the 'subst'
function of Make was replaced with 'filter-out' and this fixed the
problem. For 8, 'df' is called with the '-P' option so it has a unified
format in all versions.
For 4, the versions of 'flex' and 'netpbm' have been updated. Since they
were the dependency of 'astrometrynet', that has also been updated. In the
process, we discovered that 'lzip' has a new version which claims to be
faster, so that is also updated.
lzip 1.22 --> 1.23
astrometrynet 0.85 --> 0.89
flex 2.6.4 --> 2.6.4-410-74a89fd
netpbm 10.73.39 --> 10.73.39
NetPBM needed some manual manipulation in its source (to remove the extra
line), so the necessary steps have been added to its build recipe in
'reproduce/software/make/high-level.mk'.
|
|
Until now, the '$(project-commit-hash)' Make variable of 'initialize.mk'
simply called 'git' to find the commit hash. However, due to one of the
recent software updates, we noticed that this command is no longer working
(and the project commit hash wasn't getting printed in the PDF)! The
problem was that Maneage's Git, couldn't find the 'libiconv' library that
it was built with.
With this commit, the '$(shell' command that calls Git, first exports
'LD_LIBRARY_PATH' to Maneage's software build directory. As a result, the
Git command can work and will report the commit as a LaTeX macro to be used
in the paper. To avoid relying on PATH outside of Make recipes, we now also
directly call the Git executable with Maneage.
Some other minor issues have been found and fixed in this commit:
- README-hacking.md: some minor edits and typo corrections.
- initialize.mk: the '$(curdir)' variable is now used in several places
that we were calling 'pwd'.
- versions.conf: 'xlsxio-version' now included with other programs. Until
now it was commented because GCC 11.1.0 had issues with it. However, GCC
11.2.0 doesn't have a problem any more, so it has been returned to the
list of all high-level programs.
- xorg.mk: used same format to comment recipe lines as the other Makefiles
(a '#' followed by a TAB).
- preamble-pgfplots.tex: lines to comment for building an EPS figure with
PGFPlots have been re-formatted to be more human-readable.
|
|
Until now, the './project make clean' command would only clean (remove) the
PDF file from the top source directory. However, if a user would run LaTeX
outside of Maneage, many extra latex output such as *.aux, *.log, *.synctex
and etc would be produced in the top source directory. These files can
interfere with './project make'.
With this commit, when './project make clean' is run, any possibly existing
LaTeX temporary files will also be deleted from the top source directory.
This problem was first reported by Matin Torkian.
|
|
SUMMARY: it is necessary to update your 'INPUTS.conf' and 'download.mk'.
Until now, adding an input file involved several steps that needed manual
(and inconvenient!) intervention: for every file, you needed to define four
variables in 'INPUTS.conf', and in 'reproduce/analysis/make/download.mk'
you had to use a (complex for large number of files) shell 'if/elif/else'
condition to link the names of the input files to those variables. Besides
inconvenience, this could cause bugs (typos!). Furthermore, a basic MD5
checksum was used for verifying the files.
With this commit, a new structure has been defined for 'INPUTS.conf' that
(thanks to some pretty useful GNU Make features), removes the need for
users to manually edit 'reproduce/analysis/make/download.mk', and reduces
the number of variables necessary for each file to three (from
four). Furthermore, we now use the SHA256 checksum for input data
validation.
Regarding the trick used in 'INPUTS.conf' (form the newly added description
in 'download.mk'): In GNU Make, '.VARIABLES' "... expands to a list of the
names of all global variables defined so far" (from the "Other Special
Variables" section of the GNU Make manual). Assuming that the pattern
'INPUT-%-sha256' is only used for input files, we find all the variables
that contain the input file names (the '%' is the filename). Finally, using
the pattern-substitution function ('patsubst'), we remove the fixed string
at the start and end of the variable name.
Steps you need to take:
- INPUTS.conf: translate your old format to the new format (after
carefully reading the description in the comments at the start of the
file). After applying the new standards, you don't need to use the
variables of 'INPUTS.conf' directly in your Makefiles! For example if
one of your input datasets is called 'abc.fits', the checksum variable
will be 'INPUT-abc.fits-sha256' and in your high-level Makefiles, you
can simply set '$(indir)/abc.fits' as a prerequisite (like you probably
did already).
- reproduce/analysis/make/download.mk: for the definition and rule of
'inputdatasets', simply use the Maneage branch, and remove anything you
had added in your project.
In the process, I also noticed that 'README-hacking.md' still referred to
'master' as the main project branch, while we have used 'main' in the paper
(and is the common convention with Git).
|
|
Until now, the definition of the prepare directory was wrong (not in the
'analysis' directory of the build directory). I noticed this after an
update of the Maneage branch of one project that requires the prepare step.
With this commit, this problem has been fixed.
|
|
This commit primarily affects the configuration step of Maneage'd projects,
and in particular, updated versions of the many of the software (see
P.S.). So it shouldn't affect your high-level analysis other than the
version bumps of the software you use (and the software's possibly
improve/changed behavior).
The following software (and thus their dependencies) couldn't be updated as
described below:
- Cryptography: isn't building because it depends on a new
setuptools-rust package that has problems
(https://savannah.nongnu.org/bugs/index.php?61731), so it has been
commented in 'versions.conf'.
- SecretStorage: because it depends on Cryptography.
- Keyring: because it depends on SecretStorage.
- Astroquery: because it depends on Keyring.
This is a "squashed" commit after rebasing a development branch of 60
commits corresponding to a roughly two-month time interval. The following
people contributed to this branch.
- Boudewijn Roukema added all the R software infrastructure and the R
packages, as well as greatly helping in fixing many bugs during the
update.
- Raul Infante-Sainz helped in testing and debugging the build.
- Pedram Ashofteh Ardakani found and fixed a bug.
- Zahra Sharbaf helped in testing and found several bugs.
Below a description of the most noteworthy points is given.
- Software tarballs: all updated software now have a unified format
tarball (ustar; if not possible, pax) and unified compression (Lzip) in
Maneage's software repository in Zenodo
(https://doi.org/10.5281/zenodo.3883409). For more on this See
https://savannah.nongnu.org/task/?15699 . This won't affect any extra
software you would like to add; you can use any format recognized by
GNU Tar, and all common compression algorithms. This new requirement is
only for software that get merged to the core Maneage branch.
- Metastore (and thus libbsd and libmd) moved to highlevel: Metastore
(and the packages it depends on) is a high-level product that is only
relevant during the project development (like Emacs!): when the user
wants the file meta data (like dates) to be unchanged after checking
out branches. So it should be considered a high-level software, not
basic. Metastore also usually causes many more headaches and error
messages, so personally, I have stopped using it! Instead I simply
merge my branches in a separate clone, then pull the merge commit: in
this way, the files of my project aren't re-written during the checkout
phase and therefore their dates are untouched (which can conflict with
Make's dates on configuration files).
- The un-official cloned version of Flex (2.6.4-91 until this commit) was
causing problems in the building of Netpbm, so with this commit, it has
been moved back to version 2.6.4.
- Netpbm's official page had version 10.73.38 as the latest stable
tarball that was just released in late 2021. But I couldn't find our
previously-used version 10.86.99 anywhere (to see when it was released
and why we used it! Its at last more than one year old!). So the
official stable version is being used now.
- Improved instructions in 'README.md' for building software environment
in a Docker container (while having project source and output data
products on the local system; including the usage of the host's
'/dev/shm' to speed up temporary operations).
- Until now, the convention in Maneage was to put eight SPACE characters
before the comment lines within recipes. This was done because by
default GNU Emacs (also many other editors) show a TAB as eight
characters. However, in other text editors, online browsers, or even
the Git diff, a TAB can correspond to a different number of
characters. In such cases, the Maneage recipes wouldn't look too
interesting (the comments and the recipe commands would show a
different indentation!).
With this commit, all the comment lines in the Makefiles within the
core Maneage branch have a hash ('#') as their first character and a
TAB as the second. This allows the comment lines in recipes to have the
same indentation as code; making the code much more easier to read in a
general scenario including a 'git diff' (editor agnostic!).
P.S. List of updated software with their old and new versions
- Software with no version update are not mentioned.
- The old version of newly added software are shown with '--'.
Name (Basic) Old version New version
------------ ----------- -----------
Bzip2 1.0.6 1.0.8
CURL 7.71.1 7.79.1
Dash 0.5.10.2 0.5.11.5
File 5.39 5.41
Flock 0.2.3 0.4.0
GNU Bash 5.0.18 5.1.8
GNU Binutils 2.35 2.37
GNU Coreutils 8.32 9.0
GNU GCC 10.2.0 11.2.0
GNU M4 1.4.18 1.4.19
GNU Readline 8.0 8.1.1
GNU Tar 1.32 1.34
GNU Texinfo 6.7 6.8
GNU diffutils 3.7 3.8
GNU findutils 4.7.0 4.8.0
GNU gmp 6.2.0 6.2.1
GNU grep 3.4 3.7
GNU gzip 1.10 1.11
GNU libunistring 0.9.10 1.0
GNU mpc 1.1.0 1.2.1
GNU mpfr 4.0.2 4.1.0
GNU nano 5.2 6.0
GNU ncurses 6.2 6.3
GNU wget 1.20.3 1.21.2
Git 2.28.0 2.34.0
Less 563 590
Libxml2 2.9.9 2.9.12
Lzip 1.22-rc2 1.22
OpenSLL 1.1.1a 3.0.0
Patchelf 0.10 0.13
Perl 5.32.0 5.34.0
Podlators -- 4.14
Name (Highlevel) Old version New version
---------------- ----------- -----------
Apachelog4cxx 0.10.0-603 0.12.1
Astrometry.net 0.80 0.85
Boost 1.73.0 1.77.0
CFITSIO 3.48 4.0.0
Cmake 3.18.1 3.21.4
Eigen 3.3.7 3.4.0
Expat 2.2.9 2.4.1
FFTW 3.3.8 3.3.10
Flex 2.6.4-91 2.6.4
Fontconfig 2.13.1 2.13.94
Freetype 2.10.2 2.11.0
GNU Astronomy Utilities 0.12 0.16.1-e0f1
GNU Autoconf 2.69.200-babc 2.71
GNU Automake 1.16.2 1.16.5
GNU Bison 3.7 3.8.2
GNU Emacs 27.1 27.2
GNU GDB 9.2 11.1
GNU GSL 2.6 2.7
GNU Help2man 1.47.11 1.48.5
Ghostscript 9.52 9.55.0
ICU -- 70.1
ImageMagick 7.0.8-67 7.1.0-13
Libbsd 0.10.0 0.11.3
Libffi 3.2.1 3.4.2
Libgit2 1.0.1 1.3.0
Libidn 1.36 1.38
Libjpeg 9b 9d
Libmd -- 1.0.4
Libtiff 4.0.10 4.3.0
Libx11 1.6.9 1.7.2
Libxt 1.2.0 1.2.1
Netpbm 10.86.99 10.73.38
OpenBLAS 0.3.10 0.3.18
OpenMPI 4.0.4 4.1.1
Pixman 0.38.0 0.40.0
Python 3.8.5 3.10.0
R 4.0.2 4.1.2
SWIG 3.0.12 4.0.2
Util-linux 2.35 2.37.2
Util-macros 1.19.2 1.19.3
Valgrind 3.15.0 3.18.1
WCSLIB 7.3 7.7
Xcb-proto 1.14 1.14.1
Xorgproto 2020.1 2021.5
Name (Python) Old version New version
------------- ----------- -----------
Astropy 4.0 5.0
Beautifulsoup4 4.7.1 4.10.0
Beniget -- 0.4.1
Cffi 1.12.2 1.15.0
Cryptography 2.6.1 36.0.1
Cycler 0.10.0 0.11.0+}
Cython 0.29.21 0.29.24
Esutil 0.6.4 0.6.9
Extension-helpers -- 0.1
Galsim 2.2.1 2.3.3
Gast -- 0.5.3
Jinja2 -- 3.0.3
MPI4py 3.0.3 3.1.3
Markupsafe -- 2.0.1
Numpy 1.19.1 1.21.3
Packaging -- 21.3
Pillow -- 8.4.0
Ply -- 3.11
Pyerfa -- 2.0.0.1
Pyparsing 2.3.1 3.0.4
Pythran -- 0.11.0
Scipy 1.5.2 1.7.3
Setuptools 41.6.0 58.3.0
Six 1.12.0 1.16.0
Uncertainties 3.1.2 3.1.6
Wheel -- 0.37.0
Name (R) Old version New version
-------- ----------- -----------
Cli -- 2.5.0
Colorspace -- 2.0-1
Cowplot -- 1.1.1
Crayon -- 1.4.1
Digest -- 0.6.27
Ellipsis -- 0.3.2
Fansi -- 0.5.0
Farver -- 2.1.0
Ggplot2 -- 3.3.4
Glue -- 1.4.2
GridExtra -- 2.3
Gtable -- 0.3.0
Isoband -- 0.2.4
Labeling -- 0.4.2
Lifecycle -- 1.0.0
Magrittr -- 2.0.1
MASS -- 7.3-54
Mgcv -- 1.8-36
Munsell -- 0.5.0
Pillar -- 1.6.1
R-Pkgconfig -- 2.0.3
R6 -- 2.5.0
RColorBrewer -- 1.1-2
Rlang -- 0.4.11
Scales -- 1.1.1
Tibble -- 3.1.2
Utf8 -- 1.2.1
Vctrs -- 0.3.8
ViridisLite -- 0.4.0
Withr -- 2.4.2
|
|
Summary:
- Use the new name of this variable in your Makefiles.
- In 'metadata.conf', remove fixed URL prefixes for DOIs
('https://doi.org/') or arXiv ('https://arxiv.org/abs').
Until now, the Make variable that would print the general metadata (of
whole project) into each to-be-published dataset was called
'print-copyright'! But it now does much more than simply printing the
copyright, it will also print a lot of metadata like arXiv ID, Zenodo DOI
and etc into plain-text outputs. The out-dated name could thus be
misleading and cause confusions.
With this commit, the variable is therefore called
'print-general-metadata'. After merging your project with the Maneage
branch, please replace any usage of 'print-copyright' to
'print-general-metadata'.
Also with this commit, 'README-hacking.md' mentions 'metadata.conf' and
'print-general-metadata' in the "Publication checklist" section and reminds
you to keep the first up to date, and use the second in your
to-be-published datasets.
|
|
When built in 'group' mode, the write permissions of all created files will
be activated for a certain group of users in the host operating system. The
user specifies the name of the group with the '--group' option at configure
time. At the very start, the './project' script checks to see if the given
group name actually exists or not (to avoid hard-to-debug errors popping up
later).
Until now, the checking 'sg' command (that was used to build the project
with group-writable permissions) would always fail due to the excessive
number of redirections. Therefore, it would always print the error message
and abort.
With this commit, the output of 'sg' is no longer re-directed (which also
helps users in debuggin). If the group does actually exist, it will just
print a small statement saying so, and if it fails, the error message is
printed. This fixed the problem, allowing maneage to be built in
group-mode.
I also noticed that the variable name keeping the group name
('reproducible_paper_group_name') used the old name for the project (which
was "Reproducible paper template"! So it has been changed/corrected to
'maneage_group_name'.
|
|
In the previous commit, some Gnuastro-specific initializations were
removed but a few more cases remained that are removed with this
commit.
|
|
Until now, important LaTeX packages like 'caption' (for managing figure
captions), 'hyperref' (for managing links) and 'xcolor' (for managing
colors) were being loaded inside the optional
'tex/src/preamble-maneagge-defualt-style.tex' file. We recommend to remove
this file from loading when you use custom journal sytels. However, these
packages will often be necessary after loading special journal styles also.
With this commit, these packages are now loaded into LaTeX as part of the
'tex/src/preamble-project.tex' file. This file is in charge of LaTeX
settings that are custom to the project and independent of its style.
Several other small corrections are made with this commit:
- I noticed that './project make texclean' crashes if no PDF exists in the
working directory! So a '-f' was added to the 'rm' command of the
'texclean' rule.
- As part of the LaTeX Hyperref, we can set general metadata or properties
for the PDF (that aren't written into the printable PDF, but into the
file metadata). They can be viewed in many PDF viewers as PDF
properties. Until now, we were only using the '\projecttitle' macro here
to write the paper's title. However, thanks to the recently added
'reproduce/analysis/config/metadata.conf', we now have a lot of useful
information that can also go here. So the 'metadata-copyright-owner' is
now used to define the PDF author, and the project's
'metadata-git-repository' and commit hash are written into the PDF
subject. But to import these, it was necessary to define them as LaTeX
macros, hence the addition of these macros in 'initialize.mk'.
- Some extra packages that aren't necessary to build the default PDF were
removed in 'preamble-project.tex'.
|
|
Until now, when you ran './project make dist', first it would delete the
temporary files (like files ending in '~' or '.swp' created by some
editors), then it had a place to add project-specific operations for the
distribution.
However, in the process of cleaning the temporary files, it would 'cd' into
the directory that would later be packaged. So project-specific operations
would first have to 'cd' back into the top source directory. This was prone
to hard-to-find bugs.
With this commit, to avoid the problem the project-specific operations are
now placed before the cleaning phase. This is also technically good because
in the project-specific operations there may also be temporary files that
shouldn't go into the distribution tarball.
|
|
Until now, the build directory contained a 'software/' directory (that
hosted all the built software), a 'tex/' subdirectory for the final
building of the paper, and many other directories containing
intermediate/final data of the specific project. But this mixing of built
software and data is against our modularity and minimal complexity
principles: built software and built data are separate things and keeping
them separate will enable many optimizations.
With this commit, the build directory of the core Maneage branch will only
contain two sub-directories: 'software/' and 'analysis/'. The 'software/'
directory has the same contents as before and is not touched in this
commit. However, the 'analysis/' directory is new and everything created in
the './project make' phase of the project will be created inside of this
directory.
To facilitate easy access to these top-level built directories, two new
variables are defined at the top of 'initialize.mk': 'badir', which is
short for "built-analysis directory" and 'bsdir', which is short for
"built-software directory".
HOW TO IMPLEMENT THIS CHANGE IN YOUR PROJECT. It is easy: simply replace
all occurances of '$(BDIR)' in your project's subMakefiles (except the ones
below) to '$(badir)'. To confirm if everything is fine before building your
project from scratch after merging, you can run the following command to
see where 'BDIR' is used and confirm the only remaning cases.
$ grep -r BDIR reproduce/analysis/*
--> make/verify.mk: innobdir=$$(echo $$infile | sed -e's|$(BDIR)/||g'); \
--> make/initialize.mk:badir=$(BDIR)/analysis
--> make/initialize.mk:bsdir=$(BDIR)/software
--> make/initialize.mk: $$sys_rm -rf $(BDIR)
--> make/top-prepare.mk:all: $(BDIR)/software/preparation-done.mk
'BDIR' should only be present in lines of the files above. If you see
'$(BDIR)' used anywhere else, simply change it to '$(badir)'. Ofcourse, if
your project assumes BDIR in other contexts, feel free to keep it, it will
not conflict. If anything un-expected happens, please post a comment on the
link below (you need to be registered on Savannah to post a comment):
https://savannah.nongnu.org/task/?15855
One consequence of this change is that the 'analysis/' subdirectory can be
optionally mounted on a separate partition. The need for this actually came
up for some new users of Maneage in a Docker image. Docker can fix
portability problems on systems that we haven't yet supported (even
Windows!), or had a chance to fix low-level issues on. However, Docker
doesn't have a GUI interface. So to see the built PDF or intermediate data,
it was necessary to copy the built data to the host system after every
change, which is annoying during working on a project. It would also need
two copies of the source: one in the host, one in the container. All these
frustrations can be fixed with this new feature.
To describe this scenario, README.md now has a new section titled "Only
software environment in the Docker image". It explains step-by-step how you
can make a Docker image to only host the built software environment. While
your project's source, software tarballs and 'BDIR/analysis' directories
are on your host operating system. It has been tested before this commit
and works very nicely.
|
|
Until now there was only a 'clean' (to delete all files created during the
'make' phase) and the 'distclean' (to delete all files during configuration
and make). But sometimes we don't want to delete all the files created
during the full 'make' phase, we only want to delete the files that were
created by LaTeX for building the paper.
Witht this commit, a new target has been added for this job. You can now
run the following command for this job:
./project make texclean
Only the files in '$(BDIR)/tex/build' will be deleted (and the 'tikz'
directory under that location is recreated, ready for a future build).
|
|
Having entered 2021, it was necessary to update the copyright years at the
top of the source files. We recommend that you do this for all your
project-specific source files also.
|
|
Until now, there was no warning when the 'maneage' branch didn't exist in
the Git history. This can happen when you forget to push the 'maneage'
branch to a remote for your project, and you later clone your project from
that remote (for example on another computer). We use the 'maneage' branch
to report the latest commit hash and date in the final paper (which can
greatly help future readers). Since we check the 'maneage' branch on every
run of './project make' (in 'initialize.mk') this would result in a printed
statement like this:
fatal: Not a valid object name maneage
Also until now, the description of what to do when TeXLive wasn't installed
properly wasn't complete: it didn't mention that it is necessary to delete
the TeXLive target files. This could confuse users (they would re-run
'./project configure -e', but with no effect).
With this commit, for the 'maneage' branch issue a complete warning will be
printed. Telling the user what to do to get the 'maneage' branch (and thus
fix this warning). Also, the LaTeX macros that go in the paper are now red
when the 'maneage' branch doesn't exist, telling the user to see the
printed warning (thus encouraging the user to get the branch). For the
TeXLive issue, the necessary commands to run are now also printed in the
warning.
|
|
Until now, Maneage only provided the commit hashes (of the project and
Maneage) as LaTeX macros to use in your paper. However, they are too
cryptic and not really human friendly (unless you have access to the Git
history on a computer).
With this commit, to make things easier for the readers, the date of both
commits are also available as LaTeX macros for use in the paper. The date
of the Maneage commit is also included in the acknowledgements.
Also, the paragraph above the acknowledgements has been updated with better
explanation on why adding this acknowledgement in the science papers is
good/necessary.
|
|
This only concerns the TeX sources in the default branch. In case you don't
use them, there should only be a clean conflict in 'paper.tex' (that is
obvious and easy to fix). Conflicts may only happen in some of the
'tex/src/preamble-*.tex' files if you have actually changed them for your
project. But generally any conflict that does arise by this commit with
your project branch should be very clear and easy to fix and test.
In short, from now on things will even be easier: any LaTeX configuration
that you want to do for your project can be done in
'tex/src/preamble-project.tex', so you don't have to worry about any other
LaTeX preamble file. They are either templates (like the ones for PGFPlots
and BibLaTeX) or low-level things directly related to Maneage. Until now,
this distinction wasn't too clear.
Here is a summary of the improvements:
- Two new options to './project make': with '--highlight-new' and
'--highlight-notes' it is now possible to activate highlighting on the
command-line. Until now, there was a LaTeX macro for this at the start
of 'paper.tex' (\highlightchanges). But changing that line would change
the Git commit hash, making it hard for the readers to trust that this
is the same PDF. With these two new run-time options, the printed commit
hash will not changed.
- paper.tex: the sentences are formatted as one sentence per line (and one
line per sentence). This helps in version controlling narrative and
following the changes per sentence. A description of this format (and
its advantages) is also included in the default text.
- The internal Maneage preambles have been modified:
- 'tex/src/preamble-header.tex' and 'tex/src/preamble-style.tex' have
been merged into one preamble file called
'tex/src/preamble-maneage-default-style.tex'. This helps a lot in
simply removing it when you use a journal style file for example.
- Things like the options to highlight parts of the text are now put in
a special 'tex/src/preamble-maneage.tex'. This helps highlight that
these are Maneage-specific features that are independent of the style
used in the paper.
- There is a new 'tex/src/preamble-project.tex' that is the place you
can add your project-specific customizations.
|
|
Until now, when the 'pdf-build-final' configuration variable (defined in
'reproduce/analysis/config/pdf-build.conf') was given any string a PDF
would be built. This was very confusing, because people could put a 'no'
and the PDF would still be built!
With this commit, only when this variable has a value of 'yes' will the PDF
be built. If given any other string (or no string at all), it will not
produce a PDF.
This issue was reported by Zahra Sharbaf.
|
|
The LaTeX macro files for these two subMakefiles are created on every run
of './project make'. So their commands are also printed every time and
hardly ever will a normal user want to modify or change these.
So to avoid populating the standard output of a Maneaged project with all
these extra lines every time (possibly getting mixed with the important
analysis or LaTeX outputs), an '@' has been placed at the start of the
recipes. With an '@' at the start of the recipe, Make is instructed to not
print the commands it wants to run in the standard output.
|
|
Until now, the core Maneage branch included some configuration files for
Gnuastro's programs. This was actually a remnant of the distant past when
Maneage didn't actually build its own software and we had to rely on the
host's software versions. This file contained the configuration files
specific to Gnuastro for this project and also had a feature to avoid
checking the host's own configuration files.
However, we now build all our software ourselves with fixed configuration
files (for the version that is being installed and its version is
stored). So those extra configuration files were just extra and caused
confusion and problems in some scenarios. With this commit, those extra
files are now removed.
Also, two small issues are also addressed in parallel with this commit:
- When running './project make clean', the 'hardware-parameters.tex' macro
file (which is created by './project configure' is not deleted.
- The project title is now written into the default output's PDF's
properties (through 'hypersetup' in 'tex/src/preamble-header.tex')
through the LaTeX macro.
All these issues were found and fixed with the help of Samane Raji.
|
|
Until now, no machine-related specifications were being documented in the
workflow. This information can become helpful when observing differences in
the outcome of both software and analysis segments of the workflow by
others (some software may behave differently based on host machine).
With this commit, the host machine's 'hardware class' and 'byte-order' are
collected and now available as LaTeX macros for the authors to use in the
paper. Currently it is placed in the acknowledgments, right after
mentioning the Maneage commit.
Furthermore, the project and configuration scripts are now capable of
dealing with input directory names that have SPACE (and other special
characters) by putting them inside double-quotes. However, having spaces
and metacharacters in the address of the build directory could cause
build/install failure for some software source files which are beyond the
control of Maneage. So we now check the user's given build directory
string, and if the string has any '@', '#', '$', '%', '^', '&', '*', '(',
')', '+', ';', and ' ' (SPACE), it will ask the user to provide a different
directory.
|
|
One of the LaTeX macros reported by 'initialize.mk' is the git commit hash
of the most recent 'maneage' branch that the project has been branched
from. However, not all projects will retain the maneage reference. This can
happen for example when people don't push the 'maneage' reference to their
repository and then clone from their own repository to a second
computer. Therefore, until now, in such situations, Maneage would break
with an error.
With this commit, in such scenarios, a place holder string is used instead,
clearly highlighting that there is no 'maneage' reference.
|
|
There are many different directory trees involved in Maneage system: the
top directory, the 'reproduce/' directory and its sub-directories,
'.build/' (that point to a user-defined build area), and a possibly
user-defined input directory. Until now, in the case of a download checksum
failure, it was not immediately obvious [1] to the user *where* the file
with a failed checksum is.
To clarify to the user *where* the suspicious file is now located, this
commit adds a line to 'reproduce/analysis/make/download.mk' to print out
this full path location: '$$unchecked' along with the expected and
calculated checksums.
[1] Euphemism for me spending lots of time debugging and being confused.
|
|
This commit clarifies the initial usage of Zenodo for reserving a Zenodo
identifier and starting an 'unpublished' upload. Some other minor wording
changes are done here.
|
|
Until this commit, the '$(project-package-contents)' rules in
'reproduce/analysis/make/initialize.mk' included a line to provide all
contents, recursively, of the directory 'reproduce/' in the package for
further distribution.
This could potentially lead to the distribution of private working files
that are used during development and not intended for general distribution.
With this commit, only those files in 'reproduce/' and 'tex/src' that are
under version control are copied to the temporary directory (that is later
used for creating an archive). With this change, the archiving commands
actually became more clean (we don't have to manually remove 'LOCAL.conf'
or other temporary files). Extensive comments have also been added above
each step to clarify each step's purpose and method.
|
|
Until now the './project make dist' command implicitly assumed that the
'tex/tikz' directory always contains PDF files (because of the 'cp
tex/tikz/*.pdf $$dir/tex/tikz' line). This was annoying for projects that
don't use TiKZ or PGFPlots to generate their plots, and they had to
manually comment this line.
With this commit a check has been placed to see if any PDF files exist in
there at all. If there aren't PDF files, the 'cp' command above is ignored.
|
|
Until now, when the bibliography file ('paper.bbl') had a LaTeX-related
error (for example the journal name was a LaTeX macro that isn't defined),
the first 'pdflatex' command that is run before 'biber' would crash, not
allowing the project to reach 'biber'. So the user would have to manually
remove 'paper.bbl' before running './project make'.
With this commit, we remove any possibly existing 'paper.bbl' file before
rebuilding it. Generally, this also helps in keeping things clean during
the generation of the new bibliography.
This bug was found by Mahdieh Nabavi.
|
|
To help in the documentation, the Git hash of the Maneage branch commit
that the project has most recently merged with (or branched from) is now
also provided as a LaTeX macro ('\maneageversion').
It is calculated in 'reproduce/analysis/make/initialize.mk' (in the recipe
to 'initialize.tex').
|
|
Until now, the dataset's configuration names had a 'WFPC2' prefix. But this
very alien to anyone that is not familiar with the history of the Hubble
Space Telescope (the camera is no longer used! Its just used here since its
one of the standard FITS files from the FITS standard webpage).
With this commit the variable names have been modified to be more readable
and clear (having a 'DEMO-' prefix). Also the comments of 'INPUTS.conf'
(describing the purpose of each variable) were edited and made more clear.
|
|
Until now, when the user wanted to complete remove all built files
(including software), the './project make distclean' command would fail if
the git hooks weren't installed. They are present when the project's
configuration has been successfully finished, but this bug can happen when
trying to re-do an incomplete build.
With this commit, this is fixed by adding an '-f' has been added before the
'rm' command for the Git hooks.
|
|
Until this commit, the file `BDIR/software/preparation-done.mk' were not
removed when cleaning the project with `./project make clean'. This file
is generated in the preparation of the data during the analysis step.
However, the cleaning is expected to remove anything generated in the
analysis process! Step by step, with the commands:
./project make ---> Will make the preparation and analysis
./project make clean ---> Will remove all analysis outputs (but
not `preparation-done.mk')
./project make ---> Won't do the preparation, only analysis!
However, in the last step it should do the preparation again, because
the input data could have change for any reason. With this commit, the
file `BDIR/software/preparation-done.mk' is removed when cleaning the
project, and consequently, in the analysis step the input data is
prepared.
|
|
The 'pdflatex' program is used to build the default Maneage-branch paper.
But since the default paper uses PGFPlots to build the figures within LaTeX
as an external PDF, PGFPlots requires 'pdflatex' to be called with the
'-shell-escape' option. Generally, this option can be considered as a
security risk (in particular when 'pdflatex' is being run by an external
LaTeX file: a malicious LaTeX writer may embed commands in the LaTeX source
that will be executed on the host if this option is present).
This is not too serious of an issue in Maneage, because when someone runs
Maneage, they intentionally let it run many on their system. Hence if
someone wants to exploit a host system, they can add the necessary commands
long before 'pdflatex' is run. After all, all commands in Maneage are run
with the calling user's permissions, hence they have access to many parts
of the user's accounts. If someone is worried about security on a
non-trusted Maneage project they should act the same as they do with any
software: define a new user for it, and call it with that user (as a
weak-level security), or run it in a virtual machine or container.
However, since this option has been explicity mentioned as a security risk
before, it helps if we have a comment explaining its usage in 'paper.mk'.
With this commit, the concerned user will read a brief explanation and can
read the brief discussion at [1] and possibly re-open the discussion or
propose ways of mitigating the security risk(s).
[1] https://savannah.nongnu.org/task/?15694
|
|
When publishing a project, it is necessary to also publish the source code
of all necessary software of the project. We had recently added a new
'./project make' target called 'dist-software' for this job, but had
forgotten to add it in the output of './project --help'! There was also a
small bug inside of it that didn't allow the successful copying of the
created tarball to the top project directory.
With this commit, an explanation for this target has been added in the
output of './project --help' and that bug has been fixed.
|
|
Summary of possible semantic conflicts
1. The recipe to download input datasets has been modified. You have to
re-set the old 'origname' variable to 'localname' (to avoid confusion)
and the default dataset URL should now be complete (including the
actual filename). See the newly added descriptions in 'INPUTS.conf' for
more on this.
Until now, when the dataset was already present on the host system, a link
couldn't be made to it, causing the project to crash in the checksum
phase. This has been fixed with properly naming the main variable as
'localname' to avoid the confusion that caused it.
Some other problems have been fixed in this recipe in the meantime:
- When the checksum is different, the expected and calculated checksums
are printed.
- In the default paper, we now print the full URL of the dataset, not just
the server, so the checksum of the 'download.tex' step has been updated.
|
|
Until now, in the 'print-copyright' function of 'initialize.mk' (that
prints a fixed set of common meta necessary in plain-text files), we were
simply printing this line:
# Pre-print server: arXiv:1234.56789
But given that all the other elements are click-able URLs, it now prints:
# Pre-print server: https://arxiv.org/abs/1234.56789
|
|
Possible semantic conflicts (that may not show up as Git conflicts but may
cause a crash in your project after the merge):
1) The project title (and other basic metadata) should be set in
'reproduce/analysis/conf/metadata.conf'. Please include this file in
your merge (if it is ignored because of '.gitattributes'!).
2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if
you have added all analysis Makefiles to the '.gitattributes' file
(thus not merging any change in them with your branch). For example
with this command:
git diff master...maneage -- reproduce/analysis/make/initialize.mk
3) The old 'verify-txt-no-comments-leading-space' function has been
replaced by 'verify-txt-no-comments-no-space'. The new function will
also remove all white-space characters between the columns (not just
white space characters at the start of the line). Thus the resulting
check won't involve spacing between columns.
A common set of steps are always necessary to prepare a project for
publication. Until now, we would simply look at previous submissions and
try to follow them, but that was prone to errors and could cause
confusion. The internal infrastructure also didn't have some useful
features to make good publication possible. Now that the submission of a
paper fully devoted to the founding criteria of Maneage is complete
(arXiv:2006.03018), it was time to formalize the necessary steps for easier
submission of a project using Maneage and implement some low-level features
that can make things easier.
With this commit a first draft of the publication checklist has been added
to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018
and zenodo.3872248. To help guide users on implementing the good practices
for output datasets, the outputs of the default project shown in the paper
now use the new features). After reading the checklist, please inspect
these.
Some other relevant changes in this commit:
- The publication involves a copy of the necessary software
tarballs. Hence a new target ('dist-software') was also added to
package all the project's software tarballs in one tarball for easy
distribution.
- A new 'dist-lzip' target has been defined for those who want to
distribute an Lzip-compressed tarball.
- The '\includetikz' LaTeX macro now has a second argument to allow
configuring the '\includegraphics' call when the plot should not be
built, but just imported.
|
|
In time, some of the copyright license description had been mistakenly
shortened to two paragraphs instead of the original three that is
recommended in the GPL. With this commit, they are corrected to be exactly
in the same three paragraph format suggested by GPL.
The following files also didn't have a copyright notice, so one was added
for them:
reproduce/software/make/README.md
reproduce/software/bibtex/healpix.tex
reproduce/analysis/config/delete-me-num.conf
reproduce/analysis/config/verify-outputs.conf
|
|
Recently (in Commit 8eb0892e) the Gnuastro configuration files moved under
"reproduce/analysis/config/gnuastro" directory (before that they were in
`reproduce/software/config/gnuastro)'. But this hadn't been reflected in it
the variable that defines this directory in `initialize.mk'.
With this commit, the address of the Gnuastro configuration files directory
is corrected, allowing Gnuastro programs to operate properly when it is
used.
|
|
Until now, throughout Maneage we were using the old name of "Reproducible
Paper Template". But we have finally decided to use Maneage, so to avoid
confusion, the name has been corrected in `README-hacking.md' and also in
the copyright notices.
Note also that in `README-hacking.md', the main Maneage branch is now
called `maneage', and the main Git remote has been changed to
`https://gitlab.com/maneage/project' (this is a new GitLab Group that I
have setup for all Maneage-related projects). In this repository there is
only one `maneage' branch to avoid complications with the `master' branch
of the projects using Maneage later.
|
|
Until now the software configuration parameters were defined under the
`reproduce/software/config/installation/' directory. This was because the
configuration parameters of analysis software (for example Gnuastro's
configurations) were placed under there too. But this was terribly
confusing, because the run-time options of programs falls under the
"analysis" phase of the project.
With this commit, the Gnuastro configuration files have been moved under
the new `reproduce/analysis/config/gnuastro' directory and the software
configuration files are directly under `reproduce/software/config'. A clean
build was done with this change and it didn't crash, but it may cause
crashes in derived projects, so after merging with Maneage, please
re-configure your project to see if anything has been missed. Please let us
know if there is a problem.
|
|
It is this time of year again: TeXLive has transitioned to its 2020 release
and the year is imprinted into the installation directory of TeXLive. Until
now, we have had to manually change this year and it caused complications
and was very annoying.
With this commit, the explicit year has been removed from TeXLive's
installation and we now simply put a `maneage' instead of the year. I tried
this on another system and it worked nicely. Until the time that we can
fully install LaTeX packages from source tarballs, this is the best thing
we could do for now.
|
|
Until now, the preparation phase was always executed before the final build
phase when running `./project make'. But when it becomes necessary, project
preparation can be slow and will un-necessarily slow down the project while
the project is growing (focus is on the analysis that is done after
preparation).
With this commit, preparation will be done automatically the first time
that the project is run (`.build/software/preparation-done.mk' doesn't
exist). However, after preperation is complete once, future runs of
`./project make' won't do preparation any more (by calling
`top-prepare.mk'). They will directly call `top-make.mk' for the analysis.
To manually invoke preparation after the first attempt, the `./project
make' script should be run with the new `--prepare-redo' option.
Also, since the preparation phase is now automatically done before the
analysis phase, the long notice that describes running `./project make' at
the end of the preparation phase has been removed in `top-prepare.mk'. It
now just prints a short line, saying the preparation has been complete.
Finally, when the project has not been run with the proper group
configuration, it ends with an `exit 1' so the main `./project' script
doesn't proceed any further.
|
|
Until now, the final preparation target of the preparation phase depended
on all the `$(makesrc)' files. This caused a problem because we were
telling it to also depend on `prepare.tex' (which is the same file that is
being built).
With this commit, we are applying the same solution we have already done in
`paper.mk' (for `paper.tex'): we are removing `prepare' from the list of
prerequisites.
This bug was found by Zahra Sharbaf.
|
|
Until now the shell scripts in the software building phase were in the
`reproduce/software/bash' directory. But given our recent change to a
POSIX-only start, the `configure.sh' shell script (which is the main
component of this directory) is no longer written with Bash.
With this commit, to fix that problem, that directory's name has been
changed to `reproduce/software/shell'.
|
|
Until now, the project would first ask for the basic directories, then it
would start testing the compiler. But that was problematic because the
build directory can come from a previous setting (with `./project configure
-e'). Also, it could confuse users to first ask for details, then suddently
tell them that you don't have a working C library! We also need to store
the CPATH variable in the `LOCAL.conf' because in some cases, the compiler
won't work without it.
With this commit, the compiler checking has been moved at the start of the
configure script. Instead of putting the test program in the build
directory, we now make a temporary hidden directory in the source directory
and delete that directory as soon as the tests are done.
In the process, I also noticed that the copyright year of the two hidden
files weren't updated and corrected them.
|
|
Until now, the configuration Makefiles (in
`reproduce/software/config/installation' and `reproduce/analysis/config')
had a `.mk' suffix, similar to the workhorse Makefiles. Although they are
indeed Makefiles, but given their nature (to only keep configuration
parameters), it is confusing (especially to early users) for them to also
have a `.mk' (similar to the analysis or software building Makefiles).
To address this issue, with this commit, all the configuration Makefiles
(in those directories) are now given a `.conf' suffix. This is also assumed
for all the files that are loaded.
The configuration (software building) and running of the template have been
checked with this change from scratch, but please report any error that may
not have been noticed.
THIS IS AN IMPORTANT CHANGE AND WILL CAUSE CRASHES OR UNEXPECTED BEHAVIORS
FOR PROJECTS THAT HAVE BRANCHED FROM THIS TEMPLATE. PLEASE CORRECT THE
SUFFIX OF ALL YOUR PROJECT'S CONFIGURATION MAKEFILES (IN THE DIRECTORIES
ABOVE), OTHERWISE THEY AREN'T AUTOMATICALLY LOADED ANYMORE.
|