Age | Commit message (Collapse) | Author | Lines |
|
As described in Maneage's commit 2bd2e2f18 (which I found while testing
this project), the existing download recipe had problems when using a local
copy of the input dataset. It was first fixed here, then implemented there.
Also, to clarify things for a new user, some long comments were added at
the top of 'INPUTS.conf' to describe each of the variables, that comment
has also been put here (and is also in commit 2bd2e2f18 of Maneage).
|
|
There were two small warnings that are removed with this commit:
- In the end, when we print the number of words in the PDF, we hadn't
accounted for the fact that 'paper.pdf' doesn't always exist (for
example when './project make clean' is run). So a check was added to
only print the number of words when a PDF exists.
- I noticed that the '$(texdir)/to-publish' directory was being built both
in 'initialize.mk' and in 'demo-plot.mk'. So the one in 'demo-plot.mk'
has been removed.
|
|
Some minor conflicts came up in 'initialize.mk' and 'verify.mk'. For the
former, I chose the version on Maneage, for the latter, I kept the 'master'
version on the checksums of this project, but kept the Maneage version for
the rest of the improvements there (like printing the verified files as
LaTeX comments in 'verify.tex'.
While testing the conflicts, I noticed a bug (in the LaTeX macro for the
number of years in the Menke+20 paper) in the previous build, thanks to the
verification step :-)! Fortunately it wasn't actually printed in the PDF,
so a normal reader won't recognize.
The bug was caused by the recently added meta-data/commented lines in the
'tools-per-year.txt' file: when calculating the number of years studied in
that paper, we were simply counting all the lines and we had forgot to
correct this after adding comments. As a result, the un-used LaTeX macro
file was saying that they have studied 47 years instead of the real 31
years! This element was actually used in the very first (+40 page!) draft
of the paper that was summarized to fit into the journal limits.
|
|
Possible semantic conflicts (that may not show up as Git conflicts but may
cause a crash in your project after the merge):
1) The project title (and other basic metadata) should be set in
'reproduce/analysis/conf/metadata.conf'. Please include this file in
your merge (if it is ignored because of '.gitattributes'!).
2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if
you have added all analysis Makefiles to the '.gitattributes' file
(thus not merging any change in them with your branch). For example
with this command:
git diff master...maneage -- reproduce/analysis/make/initialize.mk
3) The old 'verify-txt-no-comments-leading-space' function has been
replaced by 'verify-txt-no-comments-no-space'. The new function will
also remove all white-space characters between the columns (not just
white space characters at the start of the line). Thus the resulting
check won't involve spacing between columns.
A common set of steps are always necessary to prepare a project for
publication. Until now, we would simply look at previous submissions and
try to follow them, but that was prone to errors and could cause
confusion. The internal infrastructure also didn't have some useful
features to make good publication possible. Now that the submission of a
paper fully devoted to the founding criteria of Maneage is complete
(arXiv:2006.03018), it was time to formalize the necessary steps for easier
submission of a project using Maneage and implement some low-level features
that can make things easier.
With this commit a first draft of the publication checklist has been added
to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018
and zenodo.3872248. To help guide users on implementing the good practices
for output datasets, the outputs of the default project shown in the paper
now use the new features). After reading the checklist, please inspect
these.
Some other relevant changes in this commit:
- The publication involves a copy of the necessary software
tarballs. Hence a new target ('dist-software') was also added to
package all the project's software tarballs in one tarball for easy
distribution.
- A new 'dist-lzip' target has been defined for those who want to
distribute an Lzip-compressed tarball.
- The '\includetikz' LaTeX macro now has a second argument to allow
configuring the '\includegraphics' call when the plot should not be
built, but just imported.
|
|
This paper doesn't use pdflatex or biblatex, so it was necessary to make
some small corrections in the make-dist rule of initialize.mk. Also, while
testing the upload on arXiv, I noticed that it complains about an empty
'verify.tex' file, so that is also corrected.
|
|
All the steps following the to-be-added (in 'README-hacking.md')
publication checklist prior to the final check from new clone have been
added:
- 'README.md' file has been set.
- "Reproducible supplement" was added just above the keywords, pointing to
Zenodo.
- A link to the to-be-uploaded data underlying the plot was added in the
caption of the tools-per-year plot.
- A new meta-data configuration file was added to store basic project
metadata to be used throughout the project. This will later be taken
into Maneage. For examle the project title is now stored here and
written into the paper's LaTeX source and output datasets automatically.
- Verification was activated and plot's data and LaTeX macro files are now
automatically verified.
- A complete metadata was added for the data underlying the plot.
- A generic function was added in 'initialize.mk' that will automatically
write project info and copyright in all plain-text outputs.
|
|
The minor conflict was with 'reproduce/software/make/high-level.mk', and in
particular because we implemented the fix to Maneage's Task #15664 in this
project first. After it was moved to the main Maneage branch some minor
stylistic corrections were done to it, thus causing the conflict. To
resolve the conflict, I simply imported the full Maneage version of the
file with this command:
git checkout maneage -- reproduce/software/make/high-level.mk
The other conflicts were due to the deleted files (that were resolved as
described in 'README-hacking.md') and the LaTeX files that I had told
'.gitattributes' to ignore from the Maneage branch.
|
|
Publishing a paper on reproducible research without making it easy for
readers to read the references would defeat the point. Of course we have to
make some compromises with some journals' reluctance to shift towards the
free world, but to satisfy scientific ethics, we should at least provide
clickable URLs to the references, preferably to the ArXiv version if
available [1], and also to the DOI, again, preferably to an open-access
version of the URL if available.
I was not able to fully get this done in the .bst file, so there's an
sed/tr hack done to the .bbl file in `reproduce/analysis/make/paper.mk` to
tidy up commas and spaces.
This commit also reverts some of the hacks in the Akhlaghi IAU Symposium
`tex/src/references.tex` entry, to match the improved .bst file,
`tex/src/IEEEtran_openaccess.bst`, provided here with a different name to
the original, in order to satisfy the LaTeX licence.
[1] https://cosmo.torun.pl/blog/arXiv_refs
|
|
In time, some of the copyright license description had been mistakenly
shortened to two paragraphs instead of the original three that is
recommended in the GPL. With this commit, they are corrected to be exactly
in the same three paragraph format suggested by GPL.
The following files also didn't have a copyright notice, so one was added
for them:
reproduce/software/make/README.md
reproduce/software/bibtex/healpix.tex
reproduce/analysis/config/delete-me-num.conf
reproduce/analysis/config/verify-outputs.conf
|
|
Following the fact that the DSJ editor decided that this paper doesn't fit
into their scope, we decided to submit it to IEEE's Computing in Science
and Engineering (CiSE). So with this commit the text was re-written to fit
into their style and word-count limitations.
|
|
The paper is no longer using LuaLaTeX, but raw LaTeX (that saves a DVI), it
is so much faster! Initially I had used LuaLaTeX to use special fonts to
resemble the CODATA Data Science Journal, but all that overhead is no
longer necessary. Therefore I also removed the MANY extra LaTeX packages we
were importing. The paper builds and is able to construct one of its images
(the git-branching figure) with only 7 packages beyond the minimal
TeX/LaTeX installation. Also in terms of processing it is so much faster.
The text is just temporary now, and mainly just a place holder. With the
next commit, I'll fill it with proper text.
|
|
A few small conflicts showed up here and there. They are fixed with this
merge.
|
|
Recently (in Commit 8eb0892e) the Gnuastro configuration files moved under
"reproduce/analysis/config/gnuastro" directory (before that they were in
`reproduce/software/config/gnuastro)'. But this hadn't been reflected in it
the variable that defines this directory in `initialize.mk'.
With this commit, the address of the Gnuastro configuration files directory
is corrected, allowing Gnuastro programs to operate properly when it is
used.
|
|
|
|
[Compared to first submission to DSJ last week with 11436 words in raw PDF,
we have decreased the paper by ~1000 words to 10493 :-)]
As with the previous commits, the moment Boud changed the structure of
sentences, I was able to find the redundancies and remove them! This is a
fascinating feature of collaboration I had never felt before: it is so hard
to find redundancies in my own raw text, but even a minor correction by
someone else suddeny breaks my mental memories/barrier on the sentence,
allowing me to be more critical to it!
Anyway, besides such corrections, I fixed a few other things: 1) In the
DSJ's recently published papers, ther is no `~' between "Figure" and its
number. 2) I noticed that in `tex/src/figure-src-inputconf.tex' I was
actually using manually input strings for the filename, checksum and size!
This was contrary to the whole philosophy of Maneage(!), I must have rushed
and forgot! So LaTeX variables are now defined and used.
|
|
Until now, throughout Maneage we were using the old name of "Reproducible
Paper Template". But we have finally decided to use Maneage, so to avoid
confusion, the name has been corrected in `README-hacking.md' and also in
the copyright notices.
Note also that in `README-hacking.md', the main Maneage branch is now
called `maneage', and the main Git remote has been changed to
`https://gitlab.com/maneage/project' (this is a new GitLab Group that I
have setup for all Maneage-related projects). In this repository there is
only one `maneage' branch to avoid complications with the `master' branch
of the projects using Maneage later.
|
|
A few minor conflicts came up that were easily fixed.
|
|
Until now the software configuration parameters were defined under the
`reproduce/software/config/installation/' directory. This was because the
configuration parameters of analysis software (for example Gnuastro's
configurations) were placed under there too. But this was terribly
confusing, because the run-time options of programs falls under the
"analysis" phase of the project.
With this commit, the Gnuastro configuration files have been moved under
the new `reproduce/analysis/config/gnuastro' directory and the software
configuration files are directly under `reproduce/software/config'. A clean
build was done with this change and it didn't crash, but it may cause
crashes in derived projects, so after merging with Maneage, please
re-configure your project to see if anything has been missed. Please let us
know if there is a problem.
|
|
Since the journal doesn't accept supplementary files during initial
submission, I have put this link on the PDF for the referee and editors to
access if they want.
Also the `tex/img' file was added to the distribution tarball.
|
|
I was using some special Bash feature before to ignore the distribution
directory itself when copying the files, but that had some problems, so I
just used a simple for loop over a `find' command to ignore it. Also, for
now, we don't need BibLaTeX sources in the project (that is primarily for
arXiv), so to help the referee see a more cleaner contents of this
supplement file.
|
|
TeXLive recently transitioned from its 2019 version to its 2020 version
thanks to Elham Saremi's trial of the this project. The fact that
traditionally Maneage installs all TeXLive packages in a per-year directory
is very annoying and required an update in the core Maneage system every
year. So I suddently recognized that we can fix this by setting a different
name for the directory holding the release year. This has been implemented
with this commit.
I have also done this change in the main Maneage branch for other projects
to also benefit from this correction.
|
|
It is this time of year again: TeXLive has transitioned to its 2020 release
and the year is imprinted into the installation directory of TeXLive. Until
now, we have had to manually change this year and it caused complications
and was very annoying.
With this commit, the explicit year has been removed from TeXLive's
installation and we now simply put a `maneage' instead of the year. I tried
this on another system and it worked nicely. Until the time that we can
fully install LaTeX packages from source tarballs, this is the best thing
we could do for now.
|
|
The contents until two commits ago when I started to summarize the paper
are now in a new and shorter format: previously the discussion started on
page 25, but now it starts on page 17. It is still a little longer than
8000 words, but not as significantly as before. I will add the discussion
and also try to summarize it futher before submission.
|
|
A few minor conflicts occurred and were fixed.
|
|
With this commit a description of these two important parts have been added
to the project, along with several figures showing various parts of the
files that are discussed. I also done some other restructuring of the
figures and files to make things fit better into the the description of the
paper.
|
|
Until now, I was mistakenly multiplying the fraction of papers in that
journal. This is corrected with this commit.
|
|
Until now, there was no explanation on an actual analysis phase, therefore
with this commit an example scenario with a readable Makefile is included.
The Data lineage graph was also simplified to both be more readable, and
also to correspond to this new explanation and subMakefile.
Some random edits/typos were also corrected and some references added for
discussion.
|
|
Until now, the preparation phase was always executed before the final build
phase when running `./project make'. But when it becomes necessary, project
preparation can be slow and will un-necessarily slow down the project while
the project is growing (focus is on the analysis that is done after
preparation).
With this commit, preparation will be done automatically the first time
that the project is run (`.build/software/preparation-done.mk' doesn't
exist). However, after preperation is complete once, future runs of
`./project make' won't do preparation any more (by calling
`top-prepare.mk'). They will directly call `top-make.mk' for the analysis.
To manually invoke preparation after the first attempt, the `./project
make' script should be run with the new `--prepare-redo' option.
Also, since the preparation phase is now automatically done before the
analysis phase, the long notice that describes running `./project make' at
the end of the preparation phase has been removed in `top-prepare.mk'. It
now just prints a short line, saying the preparation has been complete.
Finally, when the project has not been run with the proper group
configuration, it ends with an `exit 1' so the main `./project' script
doesn't proceed any further.
|
|
Until now, the final preparation target of the preparation phase depended
on all the `$(makesrc)' files. This caused a problem because we were
telling it to also depend on `prepare.tex' (which is the same file that is
being built).
With this commit, we are applying the same solution we have already done in
`paper.mk' (for `paper.tex'): we are removing `prepare' from the list of
prerequisites.
This bug was found by Zahra Sharbaf.
|
|
This was done just to get going with describing the analysis process.
|
|
The main problems with this dataset was the names of the journals (which
sometimes have single quotes or apostrophes in them that is really annoying
for SED)! But ultimately, for the simple study we want to do here, the
journal names are irrelevant, so in the end I just ignored the names. Later
we can set an identifier for the journals if necessary.
But now we have the basic information in a way that is usable in a plot to
show in this paper.
|
|
Until now the shell scripts in the software building phase were in the
`reproduce/software/bash' directory. But given our recent change to a
POSIX-only start, the `configure.sh' shell script (which is the main
component of this directory) is no longer written with Bash.
With this commit, to fix that problem, that directory's name has been
changed to `reproduce/software/shell'.
|
|
Until now, the project would first ask for the basic directories, then it
would start testing the compiler. But that was problematic because the
build directory can come from a previous setting (with `./project configure
-e'). Also, it could confuse users to first ask for details, then suddently
tell them that you don't have a working C library! We also need to store
the CPATH variable in the `LOCAL.conf' because in some cases, the compiler
won't work without it.
With this commit, the compiler checking has been moved at the start of the
configure script. Instead of putting the test program in the build
directory, we now make a temporary hidden directory in the source directory
and delete that directory as soon as the tests are done.
In the process, I also noticed that the copyright year of the two hidden
files weren't updated and corrected them.
|
|
Until now, the configuration Makefiles (in
`reproduce/software/config/installation' and `reproduce/analysis/config')
had a `.mk' suffix, similar to the workhorse Makefiles. Although they are
indeed Makefiles, but given their nature (to only keep configuration
parameters), it is confusing (especially to early users) for them to also
have a `.mk' (similar to the analysis or software building Makefiles).
To address this issue, with this commit, all the configuration Makefiles
(in those directories) are now given a `.conf' suffix. This is also assumed
for all the files that are loaded.
The configuration (software building) and running of the template have been
checked with this change from scratch, but please report any error that may
not have been noticed.
THIS IS AN IMPORTANT CHANGE AND WILL CAUSE CRASHES OR UNEXPECTED BEHAVIORS
FOR PROJECTS THAT HAVE BRANCHED FROM THIS TEMPLATE. PLEASE CORRECT THE
SUFFIX OF ALL YOUR PROJECT'S CONFIGURATION MAKEFILES (IN THE DIRECTORIES
ABOVE), OTHERWISE THEY AREN'T AUTOMATICALLY LOADED ANYMORE.
|
|
Until now, I was writing the paper without the template. But we will soon
be adding a tutorial to the template, and I thought it will be good to have
an example demonstration here too. So I just brought the hole project into
the template structure, allowing us to add the template analysis later when
its ready, and also allowing us to easily reproduce this paper ofcourse
(without having to worry about the host's TeXLive installation.
|
|
The unnecessary parts were removed and the project now runs.
|
|
The checklist descriptions were slightly edited to be more clear. Also,
while following them, I noticed that while removing the "delete-me" parts
on `verify.mk', would cause an error: the `if [ $$m == delete-me ];'
statement we were saying to delete cause an error because `elif' was the
first statement Bash would see. So with this commit, the `download'
conditional (which isn't instructed to be deleted) was set to be the top
(with an `if') and the `delete-me' conditional now has an `elif'.
|
|
Until now, if the file to be verified didn't exist, a different checksum
would be generated, and it would stop, but it wasn't immediately clear if
the differing checksum is because the file doesn't exist at all!
With this commit, before calculating the checksum, we first make sure if
the file exists. If it doesn't exist an explicit error is printed and thus
will help the project editor to find the cause of the problem.
|
|
Until now, the only verification that the template provided was the
published PDF. Users had to manually compare the published and generated
PDFs (numbers, plots, tables) and see if they obtained the same
result. However, this type of manual verification is not good and is prone
to frustration and missing important differences.
With this commit, a new Makefile has been added in the analysis steps:
`verify.mk'. It provides facilities to easily verify the results that go
into the paper. For example tables that go into making the paper's plots,
or the LaTeX macros that blend into the text. See the updated parts in
`README-hacking.md` for a more complete explanation.
This completes task #15497.
|
|
Now that its 2020, its necessary to include this year in the copyright
statements.
|
|
Until now, when an input dataset already exists in `INDIR', the template
would just make a symbolic link to it in the build directory. However, in
many cases, the files in INDIR will actually be links to other locations on
the filesystem and some programs have problems following too many links.
With this commit, the template is now using the `readlink' program (part of
GNU Coreutils) to follow a possible link and point the link in the build
directory directly to an actual non-link file.
|
|
Some minor corrections were made in the template:
- When making the distribution, `.swp' files (created by Vim) are also
removed.
- Autoconf is set as a prerequisite of Automake
I was also trying to add the Apache log4cxx, but its default 0.10.0 tarball
needs some patches, so I have just left it half done until someone actually
needs it and we apply the patch.
|
|
The steps to package the project have been made slightly more clear and
also the temporary directory that is created for packaging is deleted after
the tarball is made.
|
|
Until now, when you ran `make clean', all the directories under
`$(BDIR)/tex/' would be deleted except for `macros' and `build'. This was
good for the single-user mode. But in group mode, this would delete the
user-specific TeX build directory because its called `build-USER', not
`build'.
With this commit, to fix the problem, we define the new `texbtopdir' and
based on the group condition, and use that to specify which directory to
not delete.
|
|
Since ImageMagick can take long to build, we are now building it in
parallel. Also, the part where we replace an `_' with `\_' in the software
version at the end of the configure script was removed. It is more
clear/readable that the actual rule that includes such a name deals with
the underline (as is the case for `sip_tpv' which already dealt with it).
Finally, I noticed that the checks at the start of `top-prepare' were
missing new-lines. I had forgot that the Make single-shell variable isn't
activated in this stage yet.
|
|
A special directory is now defined in `initialize.mk' that can be used in
both the preparation and build phases. Also, the contents of prepared
results can now be conditionally read during `./project make'.
|
|
In many real-world scenarios, `./project make' can really benefit from
having some basic information about the data before being run. For example
when quering a server. If we know how many datasets were downloaded and
their general properties, it can greatly optmize the process when we are
designing the solution to be run in `./project make'.
Therefore with this commit, a new phase has been added to the template's
design: `./project prepare'. In the raw template this is empty, because the
simple analysis done in the template doesn't warrant it. But everything is
ready for projects using the template to add preparation phases prior to
the analysis.
|
|
Until now, we were assuming that the users would just clone the project in
Git. But after submitting arXiv:1909.11230, and trying to build directly
from the arXiv source, I noticed several problems that wouldn't allow users
to build it automatically. So I tried the build step by step and was able
to find a fix for the several issues that came up.
The scripting parts of the fix were primarily related to the fact that the
unpacked arXiv tarball isn't under version control, so some checks had to
be put there. Also, we wanted to make it easy to remove the extra files, so
an extra `--clean-texdit' option was added to `./project'.
Finally, some manual corrections were necessary (prior to running
`./project', which are now described in `README.md'. Most of the later
steps can be automated and we should do it later, I just don't have enough
time now.
|
|
Until now, the pipeline was instructed to only ignore the current temporary
project distribution directory. So if there were directories from previous
builds, they were wrongly included in the current tarball.
With this commit, we don't just ignore the directory of the current
distribution, but generally, all directories starting with `paper-v*'.
|
|
Until now, when the commit was tagged, `git describe' would just print the
tag and no longer the commit checksum. This is bad because the checksum is
a much more robust way to confirm the point in history.
With this commit the `--long' option has been added to `git describe' to
fix this issue. From now on, when we are on a tag, it will print the tag
followed by a `-0-' and the first characters of the checksum.
|