Age | Commit message (Collapse) | Author | Lines |
|
TeXLive recently transitioned from its 2019 version to its 2020 version
thanks to Elham Saremi's trial of the this project. The fact that
traditionally Maneage installs all TeXLive packages in a per-year directory
is very annoying and required an update in the core Maneage system every
year. So I suddently recognized that we can fix this by setting a different
name for the directory holding the release year. This has been implemented
with this commit.
I have also done this change in the main Maneage branch for other projects
to also benefit from this correction.
|
|
The contents until two commits ago when I started to summarize the paper
are now in a new and shorter format: previously the discussion started on
page 25, but now it starts on page 17. It is still a little longer than
8000 words, but not as significantly as before. I will add the discussion
and also try to summarize it futher before submission.
|
|
A few minor conflicts occurred and were fixed.
|
|
With this commit a description of these two important parts have been added
to the project, along with several figures showing various parts of the
files that are discussed. I also done some other restructuring of the
figures and files to make things fit better into the the description of the
paper.
|
|
Until now, I was mistakenly multiplying the fraction of papers in that
journal. This is corrected with this commit.
|
|
Until now, there was no explanation on an actual analysis phase, therefore
with this commit an example scenario with a readable Makefile is included.
The Data lineage graph was also simplified to both be more readable, and
also to correspond to this new explanation and subMakefile.
Some random edits/typos were also corrected and some references added for
discussion.
|
|
Until now, the preparation phase was always executed before the final build
phase when running `./project make'. But when it becomes necessary, project
preparation can be slow and will un-necessarily slow down the project while
the project is growing (focus is on the analysis that is done after
preparation).
With this commit, preparation will be done automatically the first time
that the project is run (`.build/software/preparation-done.mk' doesn't
exist). However, after preperation is complete once, future runs of
`./project make' won't do preparation any more (by calling
`top-prepare.mk'). They will directly call `top-make.mk' for the analysis.
To manually invoke preparation after the first attempt, the `./project
make' script should be run with the new `--prepare-redo' option.
Also, since the preparation phase is now automatically done before the
analysis phase, the long notice that describes running `./project make' at
the end of the preparation phase has been removed in `top-prepare.mk'. It
now just prints a short line, saying the preparation has been complete.
Finally, when the project has not been run with the proper group
configuration, it ends with an `exit 1' so the main `./project' script
doesn't proceed any further.
|
|
Until now, the final preparation target of the preparation phase depended
on all the `$(makesrc)' files. This caused a problem because we were
telling it to also depend on `prepare.tex' (which is the same file that is
being built).
With this commit, we are applying the same solution we have already done in
`paper.mk' (for `paper.tex'): we are removing `prepare' from the list of
prerequisites.
This bug was found by Zahra Sharbaf.
|
|
This was done just to get going with describing the analysis process.
|
|
The main problems with this dataset was the names of the journals (which
sometimes have single quotes or apostrophes in them that is really annoying
for SED)! But ultimately, for the simple study we want to do here, the
journal names are irrelevant, so in the end I just ignored the names. Later
we can set an identifier for the journals if necessary.
But now we have the basic information in a way that is usable in a plot to
show in this paper.
|
|
Until now, the main download script could only check one server for the
given URL. However, ultimately the actual server that a file is downloaded
from is irrelevant for this project: we actually check its
checksum. Especially in the case of software (which are distributed over
many servers), this can usually be very annoying: the servers may not
properly communicate with the running system and even the 10 trials won't
be enough.
With this commit, the download script
`reproduce/analysis/bash/download-multi-try' can take a new optional
argument (a 5th argument). It assumes this argument is a space-separated
list of server(s) to use as backup for the original URL. When downloading
from the original URL fails, it will look into this list and try
downloading the same file from each given server.
|
|
Until now the shell scripts in the software building phase were in the
`reproduce/software/bash' directory. But given our recent change to a
POSIX-only start, the `configure.sh' shell script (which is the main
component of this directory) is no longer written with Bash.
With this commit, to fix that problem, that directory's name has been
changed to `reproduce/software/shell'.
|
|
Until now, the project would first ask for the basic directories, then it
would start testing the compiler. But that was problematic because the
build directory can come from a previous setting (with `./project configure
-e'). Also, it could confuse users to first ask for details, then suddently
tell them that you don't have a working C library! We also need to store
the CPATH variable in the `LOCAL.conf' because in some cases, the compiler
won't work without it.
With this commit, the compiler checking has been moved at the start of the
configure script. Instead of putting the test program in the build
directory, we now make a temporary hidden directory in the source directory
and delete that directory as soon as the tests are done.
In the process, I also noticed that the copyright year of the two hidden
files weren't updated and corrected them.
|
|
Until now, the configuration Makefiles (in
`reproduce/software/config/installation' and `reproduce/analysis/config')
had a `.mk' suffix, similar to the workhorse Makefiles. Although they are
indeed Makefiles, but given their nature (to only keep configuration
parameters), it is confusing (especially to early users) for them to also
have a `.mk' (similar to the analysis or software building Makefiles).
To address this issue, with this commit, all the configuration Makefiles
(in those directories) are now given a `.conf' suffix. This is also assumed
for all the files that are loaded.
The configuration (software building) and running of the template have been
checked with this change from scratch, but please report any error that may
not have been noticed.
THIS IS AN IMPORTANT CHANGE AND WILL CAUSE CRASHES OR UNEXPECTED BEHAVIORS
FOR PROJECTS THAT HAVE BRANCHED FROM THIS TEMPLATE. PLEASE CORRECT THE
SUFFIX OF ALL YOUR PROJECT'S CONFIGURATION MAKEFILES (IN THE DIRECTORIES
ABOVE), OTHERWISE THEY AREN'T AUTOMATICALLY LOADED ANYMORE.
|
|
Until now, I was writing the paper without the template. But we will soon
be adding a tutorial to the template, and I thought it will be good to have
an example demonstration here too. So I just brought the hole project into
the template structure, allowing us to add the template analysis later when
its ready, and also allowing us to easily reproduce this paper ofcourse
(without having to worry about the host's TeXLive installation.
|
|
The unnecessary parts were removed and the project now runs.
|
|
The checklist descriptions were slightly edited to be more clear. Also,
while following them, I noticed that while removing the "delete-me" parts
on `verify.mk', would cause an error: the `if [ $$m == delete-me ];'
statement we were saying to delete cause an error because `elif' was the
first statement Bash would see. So with this commit, the `download'
conditional (which isn't instructed to be deleted) was set to be the top
(with an `if') and the `delete-me' conditional now has an `elif'.
|
|
Until now, if the file to be verified didn't exist, a different checksum
would be generated, and it would stop, but it wasn't immediately clear if
the differing checksum is because the file doesn't exist at all!
With this commit, before calculating the checksum, we first make sure if
the file exists. If it doesn't exist an explicit error is printed and thus
will help the project editor to find the cause of the problem.
|
|
Until now, the only verification that the template provided was the
published PDF. Users had to manually compare the published and generated
PDFs (numbers, plots, tables) and see if they obtained the same
result. However, this type of manual verification is not good and is prone
to frustration and missing important differences.
With this commit, a new Makefile has been added in the analysis steps:
`verify.mk'. It provides facilities to easily verify the results that go
into the paper. For example tables that go into making the paper's plots,
or the LaTeX macros that blend into the text. See the updated parts in
`README-hacking.md` for a more complete explanation.
This completes task #15497.
|
|
Now that its 2020, its necessary to include this year in the copyright
statements.
|
|
Until now, when an input dataset already exists in `INDIR', the template
would just make a symbolic link to it in the build directory. However, in
many cases, the files in INDIR will actually be links to other locations on
the filesystem and some programs have problems following too many links.
With this commit, the template is now using the `readlink' program (part of
GNU Coreutils) to follow a possible link and point the link in the build
directory directly to an actual non-link file.
|
|
Some minor corrections were made in the template:
- When making the distribution, `.swp' files (created by Vim) are also
removed.
- Autoconf is set as a prerequisite of Automake
I was also trying to add the Apache log4cxx, but its default 0.10.0 tarball
needs some patches, so I have just left it half done until someone actually
needs it and we apply the patch.
|
|
The steps to package the project have been made slightly more clear and
also the temporary directory that is created for packaging is deleted after
the tarball is made.
|
|
Until now, when you ran `make clean', all the directories under
`$(BDIR)/tex/' would be deleted except for `macros' and `build'. This was
good for the single-user mode. But in group mode, this would delete the
user-specific TeX build directory because its called `build-USER', not
`build'.
With this commit, to fix the problem, we define the new `texbtopdir' and
based on the group condition, and use that to specify which directory to
not delete.
|
|
Until now, this script would always only work with a file-lock. But in some
scenarios, we might want to download in parallel. For example when the
system has multiple ports to the internet.
With this commit, we have added this feature: when the lockfile name is
`nolock', it won't lock and will download in parallel.
|
|
Since ImageMagick can take long to build, we are now building it in
parallel. Also, the part where we replace an `_' with `\_' in the software
version at the end of the configure script was removed. It is more
clear/readable that the actual rule that includes such a name deals with
the underline (as is the case for `sip_tpv' which already dealt with it).
Finally, I noticed that the checks at the start of `top-prepare' were
missing new-lines. I had forgot that the Make single-shell variable isn't
activated in this stage yet.
|
|
A special directory is now defined in `initialize.mk' that can be used in
both the preparation and build phases. Also, the contents of prepared
results can now be conditionally read during `./project make'.
|
|
In many real-world scenarios, `./project make' can really benefit from
having some basic information about the data before being run. For example
when quering a server. If we know how many datasets were downloaded and
their general properties, it can greatly optmize the process when we are
designing the solution to be run in `./project make'.
Therefore with this commit, a new phase has been added to the template's
design: `./project prepare'. In the raw template this is empty, because the
simple analysis done in the template doesn't warrant it. But everything is
ready for projects using the template to add preparation phases prior to
the analysis.
|
|
Until now, we were assuming that the users would just clone the project in
Git. But after submitting arXiv:1909.11230, and trying to build directly
from the arXiv source, I noticed several problems that wouldn't allow users
to build it automatically. So I tried the build step by step and was able
to find a fix for the several issues that came up.
The scripting parts of the fix were primarily related to the fact that the
unpacked arXiv tarball isn't under version control, so some checks had to
be put there. Also, we wanted to make it easy to remove the extra files, so
an extra `--clean-texdit' option was added to `./project'.
Finally, some manual corrections were necessary (prior to running
`./project', which are now described in `README.md'. Most of the later
steps can be automated and we should do it later, I just don't have enough
time now.
|
|
Until now, the pipeline was instructed to only ignore the current temporary
project distribution directory. So if there were directories from previous
builds, they were wrongly included in the current tarball.
With this commit, we don't just ignore the directory of the current
distribution, but generally, all directories starting with `paper-v*'.
|
|
Until now, when the commit was tagged, `git describe' would just print the
tag and no longer the commit checksum. This is bad because the checksum is
a much more robust way to confirm the point in history.
With this commit the `--long' option has been added to `git describe' to
fix this issue. From now on, when we are on a tag, it will print the tag
followed by a `-0-' and the first characters of the checksum.
|
|
`./project make dist' will package all the LaTeX-specific files (and
analysis source files) into one `tar.gz' file that is ready to upload to
servers like arXiv. However, it wasn't updated for some time, so running it
would complain about not having a `configure' script in the top of the
project.
With this commit, it now works with the new file-structure of the project
and also copies all the BibLaTeX source files and `paper.bbl' into the top
tarball directory, which allows arXiv to build the paper as intended.
The output of `./project make dist' has been uploaded and tested on arXiv
and it is built by arXiv perfectly.
Also, a short description of all the special `make' targets was added to
the output of `./project --help'.
|
|
Until now, OpenMPI would complain about not having `ssh' or `rsh' as a
remote shell feature. However, such features should not be necessary in a
reproducible scenario and they also have major security issues.
With this commit, we are now using OpenMPI's `OMPI_MCA_plm_rsh_agent'
environment variable to disable any remote shell dependency for it (as
suggested by Boud). Therefore, any dependency for OpenSSH has been
removed. But I thought to keep the build instructions incase it may be
useful under some un-foreseen scenario. However, to discourage people from
building it, a notice was added ontop of the build instructions.
This bug was found, tested and solved thanks to Roberto Baena Gallé and
Boud Roukema.
This fixes bug #56724.
|
|
Until now, when you needed to completely clean a project (with `./project
make distclean') the Git hooks that are installed during configure time
would cause problems when committing (the `pre-commit' hook in particular
won't allow you to commit anything!).
With this commit, before deleting the software, the template first removes
these Git hooks.
|
|
Until now the only way to define the environment of the Make recipes was
through the exported Make variables (mostly in `initialize.mk' for the
analysis steps for example). However, there is only so much you can do with
environment variables! In some situations you want slightly more
complicated environment control, like setting an alias or running of
scripts (things that are commonly done in the `~/.bashrc' file of users to
configure their interactive, non-login shells).
With this commit, a `reproduce/software/bash/bashrc.sh' has been defined
for this job (which is currently empty!). Every major Make step of the
project adds this file as the `BASH_ENV' environment variable, so the shell
that is created to execute a recipe first executes this file, then the
recipe. Each top-level Makefile also defines a `PROJECT_STATUS' environment
variable that enables users to limit their envirnoment setup based on the
condition it is being setup (in particular in the early phase of
`basic.mk', where the user can't make any assumption about the programs and
has to write a portable shell script).
|
|
We recently moved the system's `rm' program absolute address to a shell
variable that is found during the `./project' script. But I had forgot to
account for the difference between the Make and Bash variable naming
differences. I had also forgot to add a value to the HOME variable.
With this commit both are corrected: the system's `rm' path is now called
`sys_rm' and the HOME variable is set.
|
|
Until now, to work on a project, it was necessary to `./configure' it and
build the software. Then we had to run `.local/bin/make' to run the project
and do the analysis every time. If the project was a shared project between
many users on a large server, it was necessary to call the `./for-group'
script.
This way of managing the project had a major problem: since the user
directly called the lower-level `./configure' or `.local/bin/make' it was
not possible to provide high-level control (for example limiting the
environment variables). This was especially noticed recently with a bug
that was related to environment variables (bug #56682).
With this commit, this problem is solved using a single script called
`project' in the top directory. To configure and build the project, users
can now run these commands:
$ ./project configure
$ ./project make
To work on the project with other users in a group these commands can be
used:
$ ./project configure --group=GROUPNAME
$ ./project make --group=GROUPNAME
The old options to both configure and make the project are still valid. Run
`./project --help' to see a list. For example:
$ ./project configure -e --host-cc
$ ./project make -j8
The old `configure' script has been moved to
`reproduce/software/bash/configure.sh' and is called by the new `./project'
script. The `./project' script now just manages the options, then passes
control to the `configure.sh' script. For the "make" step, it also reads
the options, then calls Make. So in the lower-level nothing has
changed. Only the `./project' script is now the single/direct user
interface of the project.
On a parallel note: as part of bug #56682, we also found out that on some
macOS systems, the `DYLD_LIBRARY_PATH' environment variable has to be set
to blank. This is no problem because RPATH is automatically set in macOS
and the executables and libraries contain the absolute address of the
libraries they should link with. But having `DYLD_LIBRARY_PATH' can
conflict with some low-level system libraries and cause very hard to debug
linking errors (like that reported in the bug report).
This fixes bug #56682.
|
|
Until now we were only setting the `LD_LIBRARY_PATH' environment variable
for GNU/Linux systems. But macOS systems use the `DYLD_LIBRARY_PATH'.
With this commit, for better control over the environment, we are also
fixing `DYLD_LIBRARY_PATH' in all the places that we are setting the
general environment variables.
|
|
While reviewing Prasenjit's commits, I noticed that we had forgot to add
the citation for TIDES, also to make things clear, the program/library
build rules are now sorted alphabetically.
Finally, I noticed that after building the TiKZ PDF figures, it is crashing
(like on Prasenjit's computer). After looking around, I noticed its because
we were setting the of the `TEXINPUTS' environment variable to be the
installed TeX Live directory (which was ultimately redundant because by
default TeX will look into where it was installed). The important thing is
just that we remove any possible value the host system has, not to set new
directories.
|
|
TIDES is an ODE integrator with multiple-precision arithmetic.
|
|
Several corrections were necessary in the basic build: 1) the
version of GCC on some systems includes an `_' which would cause
a crash when building the PDF. 2) libcharset had to be manually
added to the Git build.
|
|
Until now, for some reason, the permission flags of `top.mk' were 755 (good
for an executable), not 644 (which is what they should be for a plain text
file that is run by another program).
This is corrected with this commit.
|
|
There weren't any conflicts in this merge.
|
|
With this commit, ImageMagick software has been added into the project.
This software is useful to deal and treat images from the command line.
Since it is widely used and a lot of other programs rely on it, it is
worth to have it into the project.
|
|
Until now, the `tex/build' symbolic link was put in the clone/source tree
when the build-directory's `tex' directory was being built. Thanks to
Roberto Baena, we just found a bug because of this behavior: when a second
group member is trying to build the pipeline, since the build directory's
`tex' directory already exists, no `tex/build' will be put in their
clone/source directory. As a result, the PDF building will crash.
To fix this (and keep things organized), the two `tex/build' and `tex/tikz'
links (to the build directory) are now built in the configure step while it
is building all the top-level directories. They are no longer built within
the Makefiles.
Also, a comment was added on top of every directory built during the
configuration phase to be clear.
This fixes bug #56362.
|
|
Until now, the `download-multi-try' script assumed GNU Bash features (when
comparing the number of attempts at downloading), but it didn't explicitly
ask the operating system to be run with Bash. As a result, when weaker
shells were used (like the default Debian minimalist `dash' shell), the `>'
("larger than" operator in a math context) is interpreted a redirection and
two extra files are created: `1' and `maxcount'!
With this commit, we now start this script with `/bin/bash'. Ofcourse, this
will assume that the host has GNU Bash installed, but we are also making
this assumption in the configure script. So atleast for now, Bash (any
version) is a critical dependency of this template anyway.
|
|
When we need to quote the new-line character we end the line with a
backslash (`\'). Until now, our convention has been to put all such
backslashes under each other to help in visual inspection.
But this causes a lot of confusion in version control: if only one line's
length is larger, the whole block will be marked as changed and thus makes
it hard to visually see the actual change. It also makes debuging the code
(adding some temporary lines) hard.
With this commit, I went through all the files and tried to fix all such
cases so only a single white space character is between the last command
character and the backslash. Where there was an empty line (ending with a
backslash, to help in visually separating the code into blocks), I put the
backslash right under the previous line's.
This completes task #15259.
|
|
In a few cases, `reproduce/analysis/make/initialize.mk' still assumed the
old architecture. With this commit, they have been corrected.
|
|
Until now, there were erros in the citation of Astrometry-net and Scamp
papers.
With this commit, we fix these problems. The Swarp bibtex has also been
modify to follow the stetic of the citation style we have right now in
the project.
We also added the `dependency-bib.tex' as a prerequisite of `paper.bbl'.
|
|
Until now, the software building and analysis steps of the pipeline were
intertwined. However, these steps (of how to build a software, and how to
use it) are logically completely independent.
Therefore with this commit, the pipeline now has a new architecture
(particularly in the `reproduce' directory) to emphasize this distinction:
The `reproduce' directory now has the two `software' and `analysis'
subdirectories and the respective parts of the previous architecture have
been broken up between these two based on their function. There is also no
more `src' directory. The `config' directory for software and analysis is
now mixed with the language-specific directories.
Also, some of the software versions were also updated after some checks
with their webpages.
This new architecture will allow much more focused work on each part of the
pipeline (to install the software and to run them for an analysis).
|