Age | Commit message (Collapse) | Author | Lines |
|
Some very minor conflicts came up and were easily corrected. They were
mostly in parts that are also shared with the demonstration in the core
Maneage branch.
|
|
The '.bbl' suffix in the comment of one call to LaTeX was incorrectly
written as '.bb'.
|
|
One of the LaTeX macros reported by 'initialize.mk' is the git commit hash
of the most recent 'maneage' branch that the project has been branched
from. However, not all projects will retain the maneage reference. This can
happen for example when people don't push the 'maneage' reference to their
repository and then clone from their own repository to a second
computer. Therefore, until now, in such situations, Maneage would break
with an error.
With this commit, in such scenarios, a place holder string is used instead,
clearly highlighting that there is no 'maneage' reference.
|
|
There are many different directory trees involved in Maneage system: the
top directory, the 'reproduce/' directory and its sub-directories,
'.build/' (that point to a user-defined build area), and a possibly
user-defined input directory. Until now, in the case of a download checksum
failure, it was not immediately obvious [1] to the user *where* the file
with a failed checksum is.
To clarify to the user *where* the suspicious file is now located, this
commit adds a line to 'reproduce/analysis/make/download.mk' to print out
this full path location: '$$unchecked' along with the expected and
calculated checksums.
[1] Euphemism for me spending lots of time debugging and being confused.
|
|
This commit clarifies the initial usage of Zenodo for reserving a Zenodo
identifier and starting an 'unpublished' upload. Some other minor wording
changes are done here.
|
|
Until this commit, the '$(project-package-contents)' rules in
'reproduce/analysis/make/initialize.mk' included a line to provide all
contents, recursively, of the directory 'reproduce/' in the package for
further distribution.
This could potentially lead to the distribution of private working files
that are used during development and not intended for general distribution.
With this commit, only those files in 'reproduce/' and 'tex/src' that are
under version control are copied to the temporary directory (that is later
used for creating an archive). With this change, the archiving commands
actually became more clean (we don't have to manually remove 'LOCAL.conf'
or other temporary files). Extensive comments have also been added above
each step to clarify each step's purpose and method.
|
|
Until now the './project make dist' command implicitly assumed that the
'tex/tikz' directory always contains PDF files (because of the 'cp
tex/tikz/*.pdf $$dir/tex/tikz' line). This was annoying for projects that
don't use TiKZ or PGFPlots to generate their plots, and they had to
manually comment this line.
With this commit a check has been placed to see if any PDF files exist in
there at all. If there aren't PDF files, the 'cp' command above is ignored.
|
|
Until now, when the bibliography file ('paper.bbl') had a LaTeX-related
error (for example the journal name was a LaTeX macro that isn't defined),
the first 'pdflatex' command that is run before 'biber' would crash, not
allowing the project to reach 'biber'. So the user would have to manually
remove 'paper.bbl' before running './project make'.
With this commit, we remove any possibly existing 'paper.bbl' file before
rebuilding it. Generally, this also helps in keeping things clean during
the generation of the new bibliography.
This bug was found by Mahdieh Nabavi.
|
|
To help in the documentation, the Git hash of the Maneage branch commit
that the project has most recently merged with (or branched from) is now
also provided as a LaTeX macro ('\maneageversion').
It is calculated in 'reproduce/analysis/make/initialize.mk' (in the recipe
to 'initialize.tex').
|
|
Until now, the dataset's configuration names had a 'WFPC2' prefix. But this
very alien to anyone that is not familiar with the history of the Hubble
Space Telescope (the camera is no longer used! Its just used here since its
one of the standard FITS files from the FITS standard webpage).
With this commit the variable names have been modified to be more readable
and clear (having a 'DEMO-' prefix). Also the comments of 'INPUTS.conf'
(describing the purpose of each variable) were edited and made more clear.
|
|
In 'README.md' I tried to explain a little better that TeXLive will only
install its necessary packages, not the full TeXLive library! Also in
paper.mk, I slightly improved the comments with very minor edits.
Both these parts are slated to go into the core Maneage branch, so its
important to maintain them here for now.
|
|
Until now, when the user wanted to complete remove all built files
(including software), the './project make distclean' command would fail if
the git hooks weren't installed. They are present when the project's
configuration has been successfully finished, but this bug can happen when
trying to re-do an incomplete build.
With this commit, this is fixed by adding an '-f' has been added before the
'rm' command for the Git hooks.
This commit was also done in the core Maneage branch.
|
|
Until now, when the user wanted to complete remove all built files
(including software), the './project make distclean' command would fail if
the git hooks weren't installed. They are present when the project's
configuration has been successfully finished, but this bug can happen when
trying to re-do an incomplete build.
With this commit, this is fixed by adding an '-f' has been added before the
'rm' command for the Git hooks.
|
|
Until now, the Zenodo identifier was manually written in the paper. But now
we have the Zenodo DOI in 'metadata.conf', so its much more robust to get
it from there (in case updated versions of the paper is published).
|
|
Only two conflicts came up in the newly added comments of 'paper.mk' in the
Maneage branch. It happened because in this project we don't use
'pdflatex', but 'latex' alone.
|
|
Until this commit, the file `BDIR/software/preparation-done.mk' were not
removed when cleaning the project with `./project make clean'. This file
is generated in the preparation of the data during the analysis step.
However, the cleaning is expected to remove anything generated in the
analysis process! Step by step, with the commands:
./project make ---> Will make the preparation and analysis
./project make clean ---> Will remove all analysis outputs (but
not `preparation-done.mk')
./project make ---> Won't do the preparation, only analysis!
However, in the last step it should do the preparation again, because
the input data could have change for any reason. With this commit, the
file `BDIR/software/preparation-done.mk' is removed when cleaning the
project, and consequently, in the analysis step the input data is
prepared.
|
|
The 'pdflatex' program is used to build the default Maneage-branch paper.
But since the default paper uses PGFPlots to build the figures within LaTeX
as an external PDF, PGFPlots requires 'pdflatex' to be called with the
'-shell-escape' option. Generally, this option can be considered as a
security risk (in particular when 'pdflatex' is being run by an external
LaTeX file: a malicious LaTeX writer may embed commands in the LaTeX source
that will be executed on the host if this option is present).
This is not too serious of an issue in Maneage, because when someone runs
Maneage, they intentionally let it run many on their system. Hence if
someone wants to exploit a host system, they can add the necessary commands
long before 'pdflatex' is run. After all, all commands in Maneage are run
with the calling user's permissions, hence they have access to many parts
of the user's accounts. If someone is worried about security on a
non-trusted Maneage project they should act the same as they do with any
software: define a new user for it, and call it with that user (as a
weak-level security), or run it in a virtual machine or container.
However, since this option has been explicity mentioned as a security risk
before, it helps if we have a comment explaining its usage in 'paper.mk'.
With this commit, the concerned user will read a brief explanation and can
read the brief discussion at [1] and possibly re-open the discussion or
propose ways of mitigating the security risk(s).
[1] https://savannah.nongnu.org/task/?15694
|
|
When publishing a project, it is necessary to also publish the source code
of all necessary software of the project. We had recently added a new
'./project make' target called 'dist-software' for this job, but had
forgotten to add it in the output of './project --help'! There was also a
small bug inside of it that didn't allow the successful copying of the
created tarball to the top project directory.
With this commit, an explanation for this target has been added in the
output of './project --help' and that bug has been fixed.
|
|
As described in Maneage's commit 2bd2e2f18 (which I found while testing
this project), the existing download recipe had problems when using a local
copy of the input dataset. It was first fixed here, then implemented there.
Also, to clarify things for a new user, some long comments were added at
the top of 'INPUTS.conf' to describe each of the variables, that comment
has also been put here (and is also in commit 2bd2e2f18 of Maneage).
|
|
Summary of possible semantic conflicts
1. The recipe to download input datasets has been modified. You have to
re-set the old 'origname' variable to 'localname' (to avoid confusion)
and the default dataset URL should now be complete (including the
actual filename). See the newly added descriptions in 'INPUTS.conf' for
more on this.
Until now, when the dataset was already present on the host system, a link
couldn't be made to it, causing the project to crash in the checksum
phase. This has been fixed with properly naming the main variable as
'localname' to avoid the confusion that caused it.
Some other problems have been fixed in this recipe in the meantime:
- When the checksum is different, the expected and calculated checksums
are printed.
- In the default paper, we now print the full URL of the dataset, not just
the server, so the checksum of the 'download.tex' step has been updated.
|
|
Until now, in the 'print-copyright' function of 'initialize.mk' (that
prints a fixed set of common meta necessary in plain-text files), we were
simply printing this line:
# Pre-print server: arXiv:1234.56789
But given that all the other elements are click-able URLs, it now prints:
# Pre-print server: https://arxiv.org/abs/1234.56789
|
|
There were two small warnings that are removed with this commit:
- In the end, when we print the number of words in the PDF, we hadn't
accounted for the fact that 'paper.pdf' doesn't always exist (for
example when './project make clean' is run). So a check was added to
only print the number of words when a PDF exists.
- I noticed that the '$(texdir)/to-publish' directory was being built both
in 'initialize.mk' and in 'demo-plot.mk'. So the one in 'demo-plot.mk'
has been removed.
|
|
Some minor conflicts came up in 'initialize.mk' and 'verify.mk'. For the
former, I chose the version on Maneage, for the latter, I kept the 'master'
version on the checksums of this project, but kept the Maneage version for
the rest of the improvements there (like printing the verified files as
LaTeX comments in 'verify.tex'.
While testing the conflicts, I noticed a bug (in the LaTeX macro for the
number of years in the Menke+20 paper) in the previous build, thanks to the
verification step :-)! Fortunately it wasn't actually printed in the PDF,
so a normal reader won't recognize.
The bug was caused by the recently added meta-data/commented lines in the
'tools-per-year.txt' file: when calculating the number of years studied in
that paper, we were simply counting all the lines and we had forgot to
correct this after adding comments. As a result, the un-used LaTeX macro
file was saying that they have studied 47 years instead of the real 31
years! This element was actually used in the very first (+40 page!) draft
of the paper that was summarized to fit into the journal limits.
|
|
Possible semantic conflicts (that may not show up as Git conflicts but may
cause a crash in your project after the merge):
1) The project title (and other basic metadata) should be set in
'reproduce/analysis/conf/metadata.conf'. Please include this file in
your merge (if it is ignored because of '.gitattributes'!).
2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if
you have added all analysis Makefiles to the '.gitattributes' file
(thus not merging any change in them with your branch). For example
with this command:
git diff master...maneage -- reproduce/analysis/make/initialize.mk
3) The old 'verify-txt-no-comments-leading-space' function has been
replaced by 'verify-txt-no-comments-no-space'. The new function will
also remove all white-space characters between the columns (not just
white space characters at the start of the line). Thus the resulting
check won't involve spacing between columns.
A common set of steps are always necessary to prepare a project for
publication. Until now, we would simply look at previous submissions and
try to follow them, but that was prone to errors and could cause
confusion. The internal infrastructure also didn't have some useful
features to make good publication possible. Now that the submission of a
paper fully devoted to the founding criteria of Maneage is complete
(arXiv:2006.03018), it was time to formalize the necessary steps for easier
submission of a project using Maneage and implement some low-level features
that can make things easier.
With this commit a first draft of the publication checklist has been added
to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018
and zenodo.3872248. To help guide users on implementing the good practices
for output datasets, the outputs of the default project shown in the paper
now use the new features). After reading the checklist, please inspect
these.
Some other relevant changes in this commit:
- The publication involves a copy of the necessary software
tarballs. Hence a new target ('dist-software') was also added to
package all the project's software tarballs in one tarball for easy
distribution.
- A new 'dist-lzip' target has been defined for those who want to
distribute an Lzip-compressed tarball.
- The '\includetikz' LaTeX macro now has a second argument to allow
configuring the '\includegraphics' call when the plot should not be
built, but just imported.
|
|
This paper doesn't use pdflatex or biblatex, so it was necessary to make
some small corrections in the make-dist rule of initialize.mk. Also, while
testing the upload on arXiv, I noticed that it complains about an empty
'verify.tex' file, so that is also corrected.
|
|
All the steps following the to-be-added (in 'README-hacking.md')
publication checklist prior to the final check from new clone have been
added:
- 'README.md' file has been set.
- "Reproducible supplement" was added just above the keywords, pointing to
Zenodo.
- A link to the to-be-uploaded data underlying the plot was added in the
caption of the tools-per-year plot.
- A new meta-data configuration file was added to store basic project
metadata to be used throughout the project. This will later be taken
into Maneage. For examle the project title is now stored here and
written into the paper's LaTeX source and output datasets automatically.
- Verification was activated and plot's data and LaTeX macro files are now
automatically verified.
- A complete metadata was added for the data underlying the plot.
- A generic function was added in 'initialize.mk' that will automatically
write project info and copyright in all plain-text outputs.
|
|
The minor conflict was with 'reproduce/software/make/high-level.mk', and in
particular because we implemented the fix to Maneage's Task #15664 in this
project first. After it was moved to the main Maneage branch some minor
stylistic corrections were done to it, thus causing the conflict. To
resolve the conflict, I simply imported the full Maneage version of the
file with this command:
git checkout maneage -- reproduce/software/make/high-level.mk
The other conflicts were due to the deleted files (that were resolved as
described in 'README-hacking.md') and the LaTeX files that I had told
'.gitattributes' to ignore from the Maneage branch.
|
|
Publishing a paper on reproducible research without making it easy for
readers to read the references would defeat the point. Of course we have to
make some compromises with some journals' reluctance to shift towards the
free world, but to satisfy scientific ethics, we should at least provide
clickable URLs to the references, preferably to the ArXiv version if
available [1], and also to the DOI, again, preferably to an open-access
version of the URL if available.
I was not able to fully get this done in the .bst file, so there's an
sed/tr hack done to the .bbl file in `reproduce/analysis/make/paper.mk` to
tidy up commas and spaces.
This commit also reverts some of the hacks in the Akhlaghi IAU Symposium
`tex/src/references.tex` entry, to match the improved .bst file,
`tex/src/IEEEtran_openaccess.bst`, provided here with a different name to
the original, in order to satisfy the LaTeX licence.
[1] https://cosmo.torun.pl/blog/arXiv_refs
|
|
In time, some of the copyright license description had been mistakenly
shortened to two paragraphs instead of the original three that is
recommended in the GPL. With this commit, they are corrected to be exactly
in the same three paragraph format suggested by GPL.
The following files also didn't have a copyright notice, so one was added
for them:
reproduce/software/make/README.md
reproduce/software/bibtex/healpix.tex
reproduce/analysis/config/delete-me-num.conf
reproduce/analysis/config/verify-outputs.conf
|
|
Following the fact that the DSJ editor decided that this paper doesn't fit
into their scope, we decided to submit it to IEEE's Computing in Science
and Engineering (CiSE). So with this commit the text was re-written to fit
into their style and word-count limitations.
|
|
The paper is no longer using LuaLaTeX, but raw LaTeX (that saves a DVI), it
is so much faster! Initially I had used LuaLaTeX to use special fonts to
resemble the CODATA Data Science Journal, but all that overhead is no
longer necessary. Therefore I also removed the MANY extra LaTeX packages we
were importing. The paper builds and is able to construct one of its images
(the git-branching figure) with only 7 packages beyond the minimal
TeX/LaTeX installation. Also in terms of processing it is so much faster.
The text is just temporary now, and mainly just a place holder. With the
next commit, I'll fill it with proper text.
|
|
A few small conflicts showed up here and there. They are fixed with this
merge.
|
|
Recently (in Commit 8eb0892e) the Gnuastro configuration files moved under
"reproduce/analysis/config/gnuastro" directory (before that they were in
`reproduce/software/config/gnuastro)'. But this hadn't been reflected in it
the variable that defines this directory in `initialize.mk'.
With this commit, the address of the Gnuastro configuration files directory
is corrected, allowing Gnuastro programs to operate properly when it is
used.
|
|
|
|
[Compared to first submission to DSJ last week with 11436 words in raw PDF,
we have decreased the paper by ~1000 words to 10493 :-)]
As with the previous commits, the moment Boud changed the structure of
sentences, I was able to find the redundancies and remove them! This is a
fascinating feature of collaboration I had never felt before: it is so hard
to find redundancies in my own raw text, but even a minor correction by
someone else suddeny breaks my mental memories/barrier on the sentence,
allowing me to be more critical to it!
Anyway, besides such corrections, I fixed a few other things: 1) In the
DSJ's recently published papers, ther is no `~' between "Figure" and its
number. 2) I noticed that in `tex/src/figure-src-inputconf.tex' I was
actually using manually input strings for the filename, checksum and size!
This was contrary to the whole philosophy of Maneage(!), I must have rushed
and forgot! So LaTeX variables are now defined and used.
|
|
Until now, throughout Maneage we were using the old name of "Reproducible
Paper Template". But we have finally decided to use Maneage, so to avoid
confusion, the name has been corrected in `README-hacking.md' and also in
the copyright notices.
Note also that in `README-hacking.md', the main Maneage branch is now
called `maneage', and the main Git remote has been changed to
`https://gitlab.com/maneage/project' (this is a new GitLab Group that I
have setup for all Maneage-related projects). In this repository there is
only one `maneage' branch to avoid complications with the `master' branch
of the projects using Maneage later.
|
|
A few minor conflicts came up that were easily fixed.
|
|
Until now the software configuration parameters were defined under the
`reproduce/software/config/installation/' directory. This was because the
configuration parameters of analysis software (for example Gnuastro's
configurations) were placed under there too. But this was terribly
confusing, because the run-time options of programs falls under the
"analysis" phase of the project.
With this commit, the Gnuastro configuration files have been moved under
the new `reproduce/analysis/config/gnuastro' directory and the software
configuration files are directly under `reproduce/software/config'. A clean
build was done with this change and it didn't crash, but it may cause
crashes in derived projects, so after merging with Maneage, please
re-configure your project to see if anything has been missed. Please let us
know if there is a problem.
|
|
Since the journal doesn't accept supplementary files during initial
submission, I have put this link on the PDF for the referee and editors to
access if they want.
Also the `tex/img' file was added to the distribution tarball.
|
|
I was using some special Bash feature before to ignore the distribution
directory itself when copying the files, but that had some problems, so I
just used a simple for loop over a `find' command to ignore it. Also, for
now, we don't need BibLaTeX sources in the project (that is primarily for
arXiv), so to help the referee see a more cleaner contents of this
supplement file.
|
|
TeXLive recently transitioned from its 2019 version to its 2020 version
thanks to Elham Saremi's trial of the this project. The fact that
traditionally Maneage installs all TeXLive packages in a per-year directory
is very annoying and required an update in the core Maneage system every
year. So I suddently recognized that we can fix this by setting a different
name for the directory holding the release year. This has been implemented
with this commit.
I have also done this change in the main Maneage branch for other projects
to also benefit from this correction.
|
|
It is this time of year again: TeXLive has transitioned to its 2020 release
and the year is imprinted into the installation directory of TeXLive. Until
now, we have had to manually change this year and it caused complications
and was very annoying.
With this commit, the explicit year has been removed from TeXLive's
installation and we now simply put a `maneage' instead of the year. I tried
this on another system and it worked nicely. Until the time that we can
fully install LaTeX packages from source tarballs, this is the best thing
we could do for now.
|
|
The contents until two commits ago when I started to summarize the paper
are now in a new and shorter format: previously the discussion started on
page 25, but now it starts on page 17. It is still a little longer than
8000 words, but not as significantly as before. I will add the discussion
and also try to summarize it futher before submission.
|
|
A few minor conflicts occurred and were fixed.
|
|
With this commit a description of these two important parts have been added
to the project, along with several figures showing various parts of the
files that are discussed. I also done some other restructuring of the
figures and files to make things fit better into the the description of the
paper.
|
|
Until now, I was mistakenly multiplying the fraction of papers in that
journal. This is corrected with this commit.
|
|
Until now, there was no explanation on an actual analysis phase, therefore
with this commit an example scenario with a readable Makefile is included.
The Data lineage graph was also simplified to both be more readable, and
also to correspond to this new explanation and subMakefile.
Some random edits/typos were also corrected and some references added for
discussion.
|
|
Until now, the preparation phase was always executed before the final build
phase when running `./project make'. But when it becomes necessary, project
preparation can be slow and will un-necessarily slow down the project while
the project is growing (focus is on the analysis that is done after
preparation).
With this commit, preparation will be done automatically the first time
that the project is run (`.build/software/preparation-done.mk' doesn't
exist). However, after preperation is complete once, future runs of
`./project make' won't do preparation any more (by calling
`top-prepare.mk'). They will directly call `top-make.mk' for the analysis.
To manually invoke preparation after the first attempt, the `./project
make' script should be run with the new `--prepare-redo' option.
Also, since the preparation phase is now automatically done before the
analysis phase, the long notice that describes running `./project make' at
the end of the preparation phase has been removed in `top-prepare.mk'. It
now just prints a short line, saying the preparation has been complete.
Finally, when the project has not been run with the proper group
configuration, it ends with an `exit 1' so the main `./project' script
doesn't proceed any further.
|
|
Until now, the final preparation target of the preparation phase depended
on all the `$(makesrc)' files. This caused a problem because we were
telling it to also depend on `prepare.tex' (which is the same file that is
being built).
With this commit, we are applying the same solution we have already done in
`paper.mk' (for `paper.tex'): we are removing `prepare' from the list of
prerequisites.
This bug was found by Zahra Sharbaf.
|
|
This was done just to get going with describing the analysis process.
|