aboutsummaryrefslogtreecommitdiff
path: root/reproduce/analysis/make
AgeCommit message (Collapse)AuthorLines
2020-06-10Corrected bug in using local copy of input datasetMohammad Akhlaghi-11/+15
As described in Maneage's commit 2bd2e2f18 (which I found while testing this project), the existing download recipe had problems when using a local copy of the input dataset. It was first fixed here, then implemented there. Also, to clarify things for a new user, some long comments were added at the top of 'INPUTS.conf' to describe each of the variables, that comment has also been put here (and is also in commit 2bd2e2f18 of Maneage).
2020-06-09Two minor corrections to avoid warnings in make and make cleanMohammad Akhlaghi-10/+2
There were two small warnings that are removed with this commit: - In the end, when we print the number of words in the PDF, we hadn't accounted for the fact that 'paper.pdf' doesn't always exist (for example when './project make clean' is run). So a check was added to only print the number of words when a PDF exists. - I noticed that the '$(texdir)/to-publish' directory was being built both in 'initialize.mk' and in 'demo-plot.mk'. So the one in 'demo-plot.mk' has been removed.
2020-06-09Imported Maneage, minor conflicts fixed, a bug found and fixedMohammad Akhlaghi-38/+120
Some minor conflicts came up in 'initialize.mk' and 'verify.mk'. For the former, I chose the version on Maneage, for the latter, I kept the 'master' version on the checksums of this project, but kept the Maneage version for the rest of the improvements there (like printing the verified files as LaTeX comments in 'verify.tex'. While testing the conflicts, I noticed a bug (in the LaTeX macro for the number of years in the Menke+20 paper) in the previous build, thanks to the verification step :-)! Fortunately it wasn't actually printed in the PDF, so a normal reader won't recognize. The bug was caused by the recently added meta-data/commented lines in the 'tools-per-year.txt' file: when calculating the number of years studied in that paper, we were simply counting all the lines and we had forgot to correct this after adding comments. As a result, the un-used LaTeX macro file was saying that they have studied 47 years instead of the real 31 years! This element was actually used in the very first (+40 page!) draft of the paper that was summarized to fit into the journal limits.
2020-06-06IMPORTANT: Added publication checklist, improved relevant infrastructureMohammad Akhlaghi-66/+216
Possible semantic conflicts (that may not show up as Git conflicts but may cause a crash in your project after the merge): 1) The project title (and other basic metadata) should be set in 'reproduce/analysis/conf/metadata.conf'. Please include this file in your merge (if it is ignored because of '.gitattributes'!). 2) Consider importing the changes in 'initialize.mk' and 'verify.mk' (if you have added all analysis Makefiles to the '.gitattributes' file (thus not merging any change in them with your branch). For example with this command: git diff master...maneage -- reproduce/analysis/make/initialize.mk 3) The old 'verify-txt-no-comments-leading-space' function has been replaced by 'verify-txt-no-comments-no-space'. The new function will also remove all white-space characters between the columns (not just white space characters at the start of the line). Thus the resulting check won't involve spacing between columns. A common set of steps are always necessary to prepare a project for publication. Until now, we would simply look at previous submissions and try to follow them, but that was prone to errors and could cause confusion. The internal infrastructure also didn't have some useful features to make good publication possible. Now that the submission of a paper fully devoted to the founding criteria of Maneage is complete (arXiv:2006.03018), it was time to formalize the necessary steps for easier submission of a project using Maneage and implement some low-level features that can make things easier. With this commit a first draft of the publication checklist has been added to 'README-hacking.md', it was tested in the submission of arXiv:2006.03018 and zenodo.3872248. To help guide users on implementing the good practices for output datasets, the outputs of the default project shown in the paper now use the new features). After reading the checklist, please inspect these. Some other relevant changes in this commit: - The publication involves a copy of the necessary software tarballs. Hence a new target ('dist-software') was also added to package all the project's software tarballs in one tarball for easy distribution. - A new 'dist-lzip' target has been defined for those who want to distribute an Lzip-compressed tarball. - The '\includetikz' LaTeX macro now has a second argument to allow configuring the '\includegraphics' call when the plot should not be built, but just imported.
2020-06-04Minor improvements in the make dist command for this paperMohammad Akhlaghi-7/+12
This paper doesn't use pdflatex or biblatex, so it was necessary to make some small corrections in the make-dist rule of initialize.mk. Also, while testing the upload on arXiv, I noticed that it complains about an empty 'verify.tex' file, so that is also corrected.
2020-06-04Verification activated, README added, Proper metadata in plot dataMohammad Akhlaghi-16/+64
All the steps following the to-be-added (in 'README-hacking.md') publication checklist prior to the final check from new clone have been added: - 'README.md' file has been set. - "Reproducible supplement" was added just above the keywords, pointing to Zenodo. - A link to the to-be-uploaded data underlying the plot was added in the caption of the tools-per-year plot. - A new meta-data configuration file was added to store basic project metadata to be used throughout the project. This will later be taken into Maneage. For examle the project title is now stored here and written into the paper's LaTeX source and output datasets automatically. - Verification was activated and plot's data and LaTeX macro files are now automatically verified. - A complete metadata was added for the data underlying the plot. - A generic function was added in 'initialize.mk' that will automatically write project info and copyright in all plain-text outputs.
2020-06-03Imported recent updated in Maneage, minor conflict fixedMohammad Akhlaghi-57/+78
The minor conflict was with 'reproduce/software/make/high-level.mk', and in particular because we implemented the fix to Maneage's Task #15664 in this project first. After it was moved to the main Maneage branch some minor stylistic corrections were done to it, thus causing the conflict. To resolve the conflict, I simply imported the full Maneage version of the file with this command: git checkout maneage -- reproduce/software/make/high-level.mk The other conflicts were due to the deleted files (that were resolved as described in 'README-hacking.md') and the LaTeX files that I had told '.gitattributes' to ignore from the Maneage branch.
2020-05-29Reproducible research based on open-access papersBoud Roukema-0/+17
Publishing a paper on reproducible research without making it easy for readers to read the references would defeat the point. Of course we have to make some compromises with some journals' reluctance to shift towards the free world, but to satisfy scientific ethics, we should at least provide clickable URLs to the references, preferably to the ArXiv version if available [1], and also to the DOI, again, preferably to an open-access version of the URL if available. I was not able to fully get this done in the .bst file, so there's an sed/tr hack done to the .bbl file in `reproduce/analysis/make/paper.mk` to tidy up commas and spaces. This commit also reverts some of the hacks in the Akhlaghi IAU Symposium `tex/src/references.tex` entry, to match the improved .bst file, `tex/src/IEEEtran_openaccess.bst`, provided here with a different name to the original, in order to satisfy the LaTeX licence. [1] https://cosmo.torun.pl/blog/arXiv_refs
2020-05-22Corrected copyright notices to fit GPL suggested formatMohammad Akhlaghi-65/+89
In time, some of the copyright license description had been mistakenly shortened to two paragraphs instead of the original three that is recommended in the GPL. With this commit, they are corrected to be exactly in the same three paragraph format suggested by GPL. The following files also didn't have a copyright notice, so one was added for them: reproduce/software/make/README.md reproduce/software/bibtex/healpix.tex reproduce/analysis/config/delete-me-num.conf reproduce/analysis/config/verify-outputs.conf
2020-05-22Re-write of the paper to fit in ~6000 words and IEEE formatMohammad Akhlaghi-6/+11
Following the fact that the DSJ editor decided that this paper doesn't fit into their scope, we decided to submit it to IEEE's Computing in Science and Engineering (CiSE). So with this commit the text was re-written to fit into their style and word-count limitations.
2020-05-02First implementation of style in IEEEtran styleMohammad Akhlaghi-7/+14
The paper is no longer using LuaLaTeX, but raw LaTeX (that saves a DVI), it is so much faster! Initially I had used LuaLaTeX to use special fonts to resemble the CODATA Data Science Journal, but all that overhead is no longer necessary. Therefore I also removed the MANY extra LaTeX packages we were importing. The paper builds and is able to construct one of its images (the git-branching figure) with only 7 packages beyond the minimal TeX/LaTeX installation. Also in terms of processing it is so much faster. The text is just temporary now, and mainly just a place holder. With the next commit, I'll fill it with proper text.
2020-05-01Imported recent changes in Maneage, minor conflicts fixedMohammad Akhlaghi-73/+58
A few small conflicts showed up here and there. They are fixed with this merge.
2020-04-26Corrected Gnuastro configuration directory in initialize.mkZahra Sharbaf-1/+1
Recently (in Commit 8eb0892e) the Gnuastro configuration files moved under "reproduce/analysis/config/gnuastro" directory (before that they were in `reproduce/software/config/gnuastro)'. But this hadn't been reflected in it the variable that defines this directory in `initialize.mk'. With this commit, the address of the Gnuastro configuration files directory is corrected, allowing Gnuastro programs to operate properly when it is used.
2020-04-24TypoBoud Roukema-1/+1
2020-04-23Further edits to summarize the parts corrected by BoudMohammad Akhlaghi-1/+4
[Compared to first submission to DSJ last week with 11436 words in raw PDF, we have decreased the paper by ~1000 words to 10493 :-)] As with the previous commits, the moment Boud changed the structure of sentences, I was able to find the redundancies and remove them! This is a fascinating feature of collaboration I had never felt before: it is so hard to find redundancies in my own raw text, but even a minor correction by someone else suddeny breaks my mental memories/barrier on the sentence, allowing me to be more critical to it! Anyway, besides such corrections, I fixed a few other things: 1) In the DSJ's recently published papers, ther is no `~' between "Figure" and its number. 2) I noticed that in `tex/src/figure-src-inputconf.tex' I was actually using manually input strings for the filename, checksum and size! This was contrary to the whole philosophy of Maneage(!), I must have rushed and forgot! So LaTeX variables are now defined and used.
2020-04-20Maneage instead of Template in README-hacking.md and copyright noticesMohammad Akhlaghi-83/+65
Until now, throughout Maneage we were using the old name of "Reproducible Paper Template". But we have finally decided to use Maneage, so to avoid confusion, the name has been corrected in `README-hacking.md' and also in the copyright notices. Note also that in `README-hacking.md', the main Maneage branch is now called `maneage', and the main Git remote has been changed to `https://gitlab.com/maneage/project' (this is a new GitLab Group that I have setup for all Maneage-related projects). In this repository there is only one `maneage' branch to avoid complications with the `master' branch of the projects using Maneage later.
2020-04-17Imported recent work in Maneage, minor conflicts fixedMohammad Akhlaghi-4/+4
A few minor conflicts came up that were easily fixed.
2020-04-17IMPORTANT: software config directly under reproduce/software/configMohammad Akhlaghi-4/+4
Until now the software configuration parameters were defined under the `reproduce/software/config/installation/' directory. This was because the configuration parameters of analysis software (for example Gnuastro's configurations) were placed under there too. But this was terribly confusing, because the run-time options of programs falls under the "analysis" phase of the project. With this commit, the Gnuastro configuration files have been moved under the new `reproduce/analysis/config/gnuastro' directory and the software configuration files are directly under `reproduce/software/config'. A clean build was done with this change and it didn't crash, but it may cause crashes in derived projects, so after merging with Maneage, please re-configure your project to see if anything has been missed. Please let us know if there is a problem.
2020-04-14Added note with link to paper's distribution tarballMohammad Akhlaghi-0/+1
Since the journal doesn't accept supplementary files during initial submission, I have put this link on the PDF for the referee and editors to access if they want. Also the `tex/img' file was added to the distribution tarball.
2020-04-14Corrected package distribution step and not including BibLaTeX packagesMohammad Akhlaghi-3/+6
I was using some special Bash feature before to ignore the distribution directory itself when copying the files, but that had some problems, so I just used a simple for loop over a `find' command to ignore it. Also, for now, we don't need BibLaTeX sources in the project (that is primarily for arXiv), so to help the referee see a more cleaner contents of this supplement file.
2020-04-13Installation year removed from TeXLive installationMohammad Akhlaghi-1/+1
TeXLive recently transitioned from its 2019 version to its 2020 version thanks to Elham Saremi's trial of the this project. The fact that traditionally Maneage installs all TeXLive packages in a per-year directory is very annoying and required an update in the core Maneage system every year. So I suddently recognized that we can fix this by setting a different name for the directory holding the release year. This has been implemented with this commit. I have also done this change in the main Maneage branch for other projects to also benefit from this correction.
2020-04-13Configure (TeXLive): Year of distribution no longer in directoryMohammad Akhlaghi-1/+1
It is this time of year again: TeXLive has transitioned to its 2020 release and the year is imprinted into the installation directory of TeXLive. Until now, we have had to manually change this year and it caused complications and was very annoying. With this commit, the explicit year has been removed from TeXLive's installation and we now simply put a `maneage' instead of the year. I tried this on another system and it worked nicely. Until the time that we can fully install LaTeX packages from source tarballs, this is the best thing we could do for now.
2020-04-13Full existing contents summaried, only discussion to goMohammad Akhlaghi-0/+8
The contents until two commits ago when I started to summarize the paper are now in a new and shorter format: previously the discussion started on page 25, but now it starts on page 17. It is still a little longer than 8000 words, but not as significantly as before. I will add the discussion and also try to summarize it futher before submission.
2020-04-02Imported recent work on Maneage, minor conflicts fixedMohammad Akhlaghi-31/+30
A few minor conflicts occurred and were fixed.
2020-03-23Analysis and configuration file sections completeMohammad Akhlaghi-18/+71
With this commit a description of these two important parts have been added to the project, along with several figures showing various parts of the files that are discussed. I also done some other restructuring of the figures and files to make things fit better into the the description of the paper.
2020-03-08Menke+20 example: properly count number of papers with softwareMohammad Akhlaghi-4/+15
Until now, I was mistakenly multiplying the fraction of papers in that journal. This is corrected with this commit.
2020-03-02Described the first analysis phase with a demo subMakefileMohammad Akhlaghi-10/+10
Until now, there was no explanation on an actual analysis phase, therefore with this commit an example scenario with a readable Makefile is included. The Data lineage graph was also simplified to both be more readable, and also to correspond to this new explanation and subMakefile. Some random edits/typos were also corrected and some references added for discussion.
2020-02-29IMPORTANT: re-preparation can only be done with --prepare-redoMohammad Akhlaghi-13/+2
Until now, the preparation phase was always executed before the final build phase when running `./project make'. But when it becomes necessary, project preparation can be slow and will un-necessarily slow down the project while the project is growing (focus is on the analysis that is done after preparation). With this commit, preparation will be done automatically the first time that the project is run (`.build/software/preparation-done.mk' doesn't exist). However, after preperation is complete once, future runs of `./project make' won't do preparation any more (by calling `top-prepare.mk'). They will directly call `top-make.mk' for the analysis. To manually invoke preparation after the first attempt, the `./project make' script should be run with the new `--prepare-redo' option. Also, since the preparation phase is now automatically done before the analysis phase, the long notice that describes running `./project make' at the end of the preparation phase has been removed in `top-prepare.mk'. It now just prints a short line, saying the preparation has been complete. Finally, when the project has not been run with the proper group configuration, it ends with an `exit 1' so the main `./project' script doesn't proceed any further.
2020-02-20Preparation phase: prepare.tex not needed to finish preparationMohammad Akhlaghi-1/+3
Until now, the final preparation target of the preparation phase depended on all the `$(makesrc)' files. This caused a problem because we were telling it to also depend on `prepare.tex' (which is the same file that is being built). With this commit, we are applying the same solution we have already done in `paper.mk' (for `paper.tex'): we are removing `prepare' from the list of prerequisites. This bug was found by Zahra Sharbaf.
2020-02-16Two values from the input dataset are now written into the paperMohammad Akhlaghi-2/+15
This was done just to get going with describing the analysis process.
2020-02-16Menke+2020 data is now imported and ready for later steps in plain textMohammad Akhlaghi-6/+72
The main problems with this dataset was the names of the journals (which sometimes have single quotes or apostrophes in them that is really annoying for SED)! But ultimately, for the simple study we want to do here, the journal names are irrelevant, so in the end I just ignored the names. Later we can set an identifier for the journals if necessary. But now we have the basic information in a way that is usable in a plot to show in this paper.
2020-02-01IMPORTANT: reproduce/software/bash renamed to reproduce/software/shellMohammad Akhlaghi-2/+2
Until now the shell scripts in the software building phase were in the `reproduce/software/bash' directory. But given our recent change to a POSIX-only start, the `configure.sh' shell script (which is the main component of this directory) is no longer written with Bash. With this commit, to fix that problem, that directory's name has been changed to `reproduce/software/shell'.
2020-01-31Configure step: compiler checks done before basic settingsMohammad Akhlaghi-0/+6
Until now, the project would first ask for the basic directories, then it would start testing the compiler. But that was problematic because the build directory can come from a previous setting (with `./project configure -e'). Also, it could confuse users to first ask for details, then suddently tell them that you don't have a working C library! We also need to store the CPATH variable in the `LOCAL.conf' because in some cases, the compiler won't work without it. With this commit, the compiler checking has been moved at the start of the configure script. Instead of putting the test program in the build directory, we now make a temporary hidden directory in the source directory and delete that directory as soon as the tests are done. In the process, I also noticed that the copyright year of the two hidden files weren't updated and corrected them.
2020-01-20IMPORTANT!!! Configuration Makefiles now have a .conf suffixMohammad Akhlaghi-18/+20
Until now, the configuration Makefiles (in `reproduce/software/config/installation' and `reproduce/analysis/config') had a `.mk' suffix, similar to the workhorse Makefiles. Although they are indeed Makefiles, but given their nature (to only keep configuration parameters), it is confusing (especially to early users) for them to also have a `.mk' (similar to the analysis or software building Makefiles). To address this issue, with this commit, all the configuration Makefiles (in those directories) are now given a `.conf' suffix. This is also assumed for all the files that are loaded. The configuration (software building) and running of the template have been checked with this change from scratch, but please report any error that may not have been noticed. THIS IS AN IMPORTANT CHANGE AND WILL CAUSE CRASHES OR UNEXPECTED BEHAVIORS FOR PROJECTS THAT HAVE BRANCHED FROM THIS TEMPLATE. PLEASE CORRECT THE SUFFIX OF ALL YOUR PROJECT'S CONFIGURATION MAKEFILES (IN THE DIRECTORIES ABOVE), OTHERWISE THEY AREN'T AUTOMATICALLY LOADED ANYMORE.
2020-01-18Raw draft (until now as a separate repository) importedMohammad Akhlaghi-3/+4
Until now, I was writing the paper without the template. But we will soon be adding a tutorial to the template, and I thought it will be good to have an example demonstration here too. So I just brought the hole project into the template structure, allowing us to add the template analysis later when its ready, and also allowing us to easily reproduce this paper ofcourse (without having to worry about the host's TeXLive installation.
2020-01-18First set of customizations doneMohammad Akhlaghi-132/+1
The unnecessary parts were removed and the project now runs.
2020-01-18README-hacking.md: edits and corrections for easier customizationMohammad Akhlaghi-2/+2
The checklist descriptions were slightly edited to be more clear. Also, while following them, I noticed that while removing the "delete-me" parts on `verify.mk', would cause an error: the `if [ $$m == delete-me ];' statement we were saying to delete cause an error because `elif' was the first statement Bash would see. So with this commit, the `download' conditional (which isn't instructed to be deleted) was set to be the top (with an `if') and the `delete-me' conditional now has an `elif'.
2020-01-01Verification function checks if file existsMohammad Akhlaghi-3/+13
Until now, if the file to be verified didn't exist, a different checksum would be generated, and it would stop, but it wasn't immediately clear if the differing checksum is because the file doesn't exist at all! With this commit, before calculating the checksum, we first make sure if the file exists. If it doesn't exist an explicit error is printed and thus will help the project editor to find the cause of the problem.
2020-01-01Verification of output values and data added within templateMohammad Akhlaghi-22/+143
Until now, the only verification that the template provided was the published PDF. Users had to manually compare the published and generated PDFs (numbers, plots, tables) and see if they obtained the same result. However, this type of manual verification is not good and is prone to frustration and missing important differences. With this commit, a new Makefile has been added in the analysis steps: `verify.mk'. It provides facilities to easily verify the results that go into the paper. For example tables that go into making the paper's plots, or the LaTeX macros that blend into the text. See the updated parts in `README-hacking.md` for a more complete explanation. This completes task #15497.
2020-01-01Copyright statements updated to include 2020Mohammad Akhlaghi-7/+7
Now that its 2020, its necessary to include this year in the copyright statements.
2019-11-29Download links directly to actual file if it exists in INDIRMohammad Akhlaghi-2/+8
Until now, when an input dataset already exists in `INDIR', the template would just make a symbolic link to it in the build directory. However, in many cases, the files in INDIR will actually be links to other locations on the filesystem and some programs have problems following too many links. With this commit, the template is now using the `readlink' program (part of GNU Coreutils) to follow a possible link and point the link in the build directory directly to an actual non-link file.
2019-10-31Minor corrections in distribution and autoconf prerequisite of automakeMohammad Akhlaghi-0/+1
Some minor corrections were made in the template: - When making the distribution, `.swp' files (created by Vim) are also removed. - Autoconf is set as a prerequisite of Automake I was also trying to add the Apache log4cxx, but its default 0.10.0 tarball needs some patches, so I have just left it half done until someone actually needs it and we apply the patch.
2019-10-19Minor improvments in packaging of project with make distMohammad Akhlaghi-12/+14
The steps to package the project have been made slightly more clear and also the temporary directory that is created for packaging is deleted after the tarball is made.
2019-10-11Properly working make clean when in group modeMohammad Akhlaghi-3/+4
Until now, when you ran `make clean', all the directories under `$(BDIR)/tex/' would be deleted except for `macros' and `build'. This was good for the single-user mode. But in group mode, this would delete the user-specific TeX build directory because its called `build-USER', not `build'. With this commit, to fix the problem, we define the new `texbtopdir' and based on the group condition, and use that to specify which directory to not delete.
2019-10-01Minor corrections in configure and prepare phaseMohammad Akhlaghi-10/+10
Since ImageMagick can take long to build, we are now building it in parallel. Also, the part where we replace an `_' with `\_' in the software version at the end of the configure script was removed. It is more clear/readable that the actual rule that includes such a name deals with the underline (as is the case for `sip_tpv' which already dealt with it). Finally, I noticed that the checks at the start of `top-prepare' were missing new-lines. I had forgot that the Make single-shell variable isn't activated in this stage yet.
2019-10-01Infrastructure to keep preparation resultsMohammad Akhlaghi-17/+50
A special directory is now defined in `initialize.mk' that can be used in both the preparation and build phases. Also, the contents of prepared results can now be conditionally read during `./project make'.
2019-10-01Preparation phase added before final buildingMohammad Akhlaghi-0/+126
In many real-world scenarios, `./project make' can really benefit from having some basic information about the data before being run. For example when quering a server. If we know how many datasets were downloaded and their general properties, it can greatly optmize the process when we are designing the solution to be run in `./project make'. Therefore with this commit, a new phase has been added to the template's design: `./project prepare'. In the raw template this is empty, because the simple analysis done in the template doesn't warrant it. But everything is ready for projects using the template to add preparation phases prior to the analysis.
2019-09-26Working project when downloaded from arXivMohammad Akhlaghi-2/+4
Until now, we were assuming that the users would just clone the project in Git. But after submitting arXiv:1909.11230, and trying to build directly from the arXiv source, I noticed several problems that wouldn't allow users to build it automatically. So I tried the build step by step and was able to find a fix for the several issues that came up. The scripting parts of the fix were primarily related to the fact that the unpacked arXiv tarball isn't under version control, so some checks had to be put there. Also, we wanted to make it easy to remove the extra files, so an extra `--clean-texdit' option was added to `./project'. Finally, some manual corrections were necessary (prior to running `./project', which are now described in `README.md'. Most of the later steps can be automated and we should do it later, I just don't have enough time now.
2019-09-25Won't copy previous distribution builds in new distributionMohammad Akhlaghi-1/+1
Until now, the pipeline was instructed to only ignore the current temporary project distribution directory. So if there were directories from previous builds, they were wrongly included in the current tarball. With this commit, we don't just ignore the directory of the current distribution, but generally, all directories starting with `paper-v*'.
2019-09-16Git checksum printed even when on a tagMohammad Akhlaghi-2/+2
Until now, when the commit was tagged, `git describe' would just print the tag and no longer the commit checksum. This is bad because the checksum is a much more robust way to confirm the point in history. With this commit the `--long' option has been added to `git describe' to fix this issue. From now on, when we are on a tag, it will print the tag followed by a `-0-' and the first characters of the checksum.