<feed xmlns='http://www.w3.org/2005/Atom'>
<title>paper-concept.git/reproduce/src/bash, branch maneage</title>
<subtitle>Paper (Towards Long-term and Archivable Reproducibility)</subtitle>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/'/>
<entry>
<title>New architecture to separate software-building and analysis steps</title>
<updated>2019-04-15T01:24:09+00:00</updated>
<author>
<name>Mohammad Akhlaghi</name>
<email>mohammad@akhlaghi.org</email>
</author>
<published>2019-04-15T00:47:58+00:00</published>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/commit/?id=313b936b502d22b6a2ff43f560dee0bb51fd01d0'/>
<id>313b936b502d22b6a2ff43f560dee0bb51fd01d0</id>
<content type='text'>
Until now, the software building and analysis steps of the pipeline were
intertwined. However, these steps (of how to build a software, and how to
use it) are logically completely independent.

Therefore with this commit, the pipeline now has a new architecture
(particularly in the `reproduce' directory) to emphasize this distinction:
The `reproduce' directory now has the two `software' and `analysis'
subdirectories and the respective parts of the previous architecture have
been broken up between these two based on their function. There is also no
more `src' directory. The `config' directory for software and analysis is
now mixed with the language-specific directories.

Also, some of the software versions were also updated after some checks
with their webpages.

This new architecture will allow much more focused work on each part of the
pipeline (to install the software and to run them for an analysis).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now, the software building and analysis steps of the pipeline were
intertwined. However, these steps (of how to build a software, and how to
use it) are logically completely independent.

Therefore with this commit, the pipeline now has a new architecture
(particularly in the `reproduce' directory) to emphasize this distinction:
The `reproduce' directory now has the two `software' and `analysis'
subdirectories and the respective parts of the previous architecture have
been broken up between these two based on their function. There is also no
more `src' directory. The `config' directory for software and analysis is
now mixed with the language-specific directories.

Also, some of the software versions were also updated after some checks
with their webpages.

This new architecture will allow much more focused work on each part of the
pipeline (to install the software and to run them for an analysis).
</pre>
</div>
</content>
</entry>
<entry>
<title>Replaced all occurances of pipeline in text</title>
<updated>2019-04-14T16:49:55+00:00</updated>
<author>
<name>Mohammad Akhlaghi</name>
<email>mohammad@akhlaghi.org</email>
</author>
<published>2019-04-14T16:48:40+00:00</published>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/commit/?id=4722ea598edd6b630227404c48c1c09ac527e9b8'/>
<id>4722ea598edd6b630227404c48c1c09ac527e9b8</id>
<content type='text'>
All occurances of "pipeline" have been chanaged to "project" or "template"
withint the text (comments, READMEs, and comments) of the template. The
main template branch is now also named `template'.

This was all because `pipeline' is too generic and couldn't be
distinguished from the base, and customized project.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All occurances of "pipeline" have been chanaged to "project" or "template"
withint the text (comments, READMEs, and comments) of the template. The
main template branch is now also named `template'.

This was all because `pipeline' is too generic and couldn't be
distinguished from the base, and customized project.
</pre>
</div>
</content>
</entry>
<entry>
<title>Corrected copyright notices and info about adding copyright info</title>
<updated>2019-04-13T22:43:07+00:00</updated>
<author>
<name>Mohammad Akhlaghi</name>
<email>mohammad@akhlaghi.org</email>
</author>
<published>2019-04-13T22:43:07+00:00</published>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/commit/?id=0cbd2243458611caa2a3564b577987531bcd6934'/>
<id>0cbd2243458611caa2a3564b577987531bcd6934</id>
<content type='text'>
Until now, the files where the people were meant to change didn't have a
proper copyright notice (for example `Copyright (C) YOUR NAME.'). This was
wrong because the license does not convey copyright ownership. So the name
of the file's original author must always be included and when people
modify it (and add their own copyright-able modifications).

With this commit, the file's original author (and email) are added to the
copyright notice and when more than one person modified a file, both names
have their individual copyright notice.

Based on this, the description for adding a copyright notice in
`README-hacking.md' has also been modified.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now, the files where the people were meant to change didn't have a
proper copyright notice (for example `Copyright (C) YOUR NAME.'). This was
wrong because the license does not convey copyright ownership. So the name
of the file's original author must always be included and when people
modify it (and add their own copyright-able modifications).

With this commit, the file's original author (and email) are added to the
copyright notice and when more than one person modified a file, both names
have their individual copyright notice.

Based on this, the description for adding a copyright notice in
`README-hacking.md' has also been modified.
</pre>
</div>
</content>
</entry>
<entry>
<title>Copyright notice added to all files missing one</title>
<updated>2019-04-06T23:09:14+00:00</updated>
<author>
<name>Mohammad Akhlaghi</name>
<email>mohammad@akhlaghi.org</email>
</author>
<published>2019-04-06T23:09:14+00:00</published>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/commit/?id=234d6a6e8a4f73ddea627dd4fd78dfb4a91d5a83'/>
<id>234d6a6e8a4f73ddea627dd4fd78dfb4a91d5a83</id>
<content type='text'>
Until now, for short files, we only had a license notice, not an actual
copyright notice. With this commit, a copyright notice has also been
added. We use this new command to find these files, suggested by
`ineiev@gnu.org'.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now, for short files, we only had a license notice, not an actual
copyright notice. With this commit, a copyright notice has also been
added. We use this new command to find these files, suggested by
`ineiev@gnu.org'.
</pre>
</div>
</content>
</entry>
<entry>
<title>Copyright notice added to remaining files</title>
<updated>2019-04-02T11:34:00+00:00</updated>
<author>
<name>Mohammad Akhlaghi</name>
<email>mohammad@akhlaghi.org</email>
</author>
<published>2019-04-02T11:34:00+00:00</published>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/commit/?id=5d56820e0ab1fc147b45728c6ac89c4ac0b90e54'/>
<id>5d56820e0ab1fc147b45728c6ac89c4ac0b90e54</id>
<content type='text'>
After doing a systematic search for files without a copyright notice, a few
more were found that didn't have a notice. So a notice was added for them.

I used this Bash command to find the files:

for f in $(find ./ -type f); do \
  if [[ $f != *.git* ]]; then \
    n=$(grep -i copyright $f | wc -l); \
    echo "$n $f"; \
  fi; \
done | awk '$1==0'
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After doing a systematic search for files without a copyright notice, a few
more were found that didn't have a notice. So a notice was added for them.

I used this Bash command to find the files:

for f in $(find ./ -type f); do \
  if [[ $f != *.git* ]]; then \
    n=$(grep -i copyright $f | wc -l); \
    echo "$n $f"; \
  fi; \
done | awk '$1==0'
</pre>
</div>
</content>
</entry>
<entry>
<title>flock is now built in configure, to allow serial downloads</title>
<updated>2019-03-28T09:27:41+00:00</updated>
<author>
<name>Mohammad Akhlaghi</name>
<email>mohammad@akhlaghi.org</email>
</author>
<published>2019-03-27T19:53:18+00:00</published>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/commit/?id=98d31767a965ec75f4920b666f236cbb6baa91ab'/>
<id>98d31767a965ec75f4920b666f236cbb6baa91ab</id>
<content type='text'>
Until now, we were using `flock' (file-lock) for downloading the input
datasets in series. But we couldn't do this when downloading the software
tarballs because `flock' wasn't yet available. Generally, unlike
processing, downloading is much better done in series than in parallel.

To enable serial downloads of the software also, with this commit we are
installing `flock' in the configure script (not in a Makefile). As a
result, besides `flock', we can also benefit from the other good features
of the `reproduce/src/bash/download-multi-try' script *(for example
attempting download again after some time).

Some GNU mirrors may have problems at the time of download, so with this
commit, we are using the main GNU FTP server for GNU programs.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now, we were using `flock' (file-lock) for downloading the input
datasets in series. But we couldn't do this when downloading the software
tarballs because `flock' wasn't yet available. Generally, unlike
processing, downloading is much better done in series than in parallel.

To enable serial downloads of the software also, with this commit we are
installing `flock' in the configure script (not in a Makefile). As a
result, besides `flock', we can also benefit from the other good features
of the `reproduce/src/bash/download-multi-try' script *(for example
attempting download again after some time).

Some GNU mirrors may have problems at the time of download, so with this
commit, we are using the main GNU FTP server for GNU programs.
</pre>
</div>
</content>
</entry>
<entry>
<title>Git hooks for metastore check for the existance of metastore</title>
<updated>2019-02-28T12:32:19+00:00</updated>
<author>
<name>Raul Infante-Sainz</name>
<email>infantesainz@gmail.com</email>
</author>
<published>2019-02-28T12:32:19+00:00</published>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/commit/?id=62cb377a954921ef7940059e6dfb8521f9698c32'/>
<id>62cb377a954921ef7940059e6dfb8521f9698c32</id>
<content type='text'>
Until now, once the Git hooks have been installed (after the
installation of Metastore), if metastore doesn't exist (for example by
manually deleting the build directory for a re-build with same
configurations as before) we can't run `git commit' and `git checkout'
will print an ugly warning.

With this commit, the two Git hooks check for the existance of Metastore
and if it doesn't exist, they won't do anything.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now, once the Git hooks have been installed (after the
installation of Metastore), if metastore doesn't exist (for example by
manually deleting the build directory for a re-build with same
configurations as before) we can't run `git commit' and `git checkout'
will print an ugly warning.

With this commit, the two Git hooks check for the existance of Metastore
and if it doesn't exist, they won't do anything.
</pre>
</div>
</content>
</entry>
<entry>
<title>Minor correction in description of downloading wrapper</title>
<updated>2019-02-06T18:25:56+00:00</updated>
<author>
<name>Mohammad Akhlaghi</name>
<email>mohammad@akhlaghi.org</email>
</author>
<published>2019-02-06T18:23:33+00:00</published>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/commit/?id=b506248839c0c1f63f51bcb0ff6a586426d722f4'/>
<id>b506248839c0c1f63f51bcb0ff6a586426d722f4</id>
<content type='text'>
In the example running code of the wrapper script, I had just written
`./download-multi-try', but this script is meant to be run from the top of
the project directory. This could cause confusion.

So the example script now starts with `/path/to/download-multi-try'.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the example running code of the wrapper script, I had just written
`./download-multi-try', but this script is meant to be run from the top of
the project directory. This could cause confusion.

So the example script now starts with `/path/to/download-multi-try'.
</pre>
</div>
</content>
</entry>
<entry>
<title>Removed .sh suffix in download wrapper script</title>
<updated>2019-02-06T18:21:52+00:00</updated>
<author>
<name>Mohammad Akhlaghi</name>
<email>mohammad@akhlaghi.org</email>
</author>
<published>2019-02-06T18:16:41+00:00</published>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/commit/?id=340a7ece34013345e1520dcac38218b7d41c7c26'/>
<id>340a7ece34013345e1520dcac38218b7d41c7c26</id>
<content type='text'>
We don't have a `.sh' suffix in the other scripts of `reproduce/src/bash',
so it was also removed from this script.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We don't have a `.sh' suffix in the other scripts of `reproduce/src/bash',
so it was also removed from this script.
</pre>
</div>
</content>
</entry>
<entry>
<title>Wrapper script for multiple attempts at downloading inputs</title>
<updated>2019-02-06T18:08:19+00:00</updated>
<author>
<name>Mohammad Akhlaghi</name>
<email>mohammad@akhlaghi.org</email>
</author>
<published>2019-02-06T18:08:19+00:00</published>
<link rel='alternate' type='text/html' href='http://git.maneage.org/paper-concept.git/commit/?id=1c508e636b90ae170213ccf71771711156dd8f52'/>
<id>1c508e636b90ae170213ccf71771711156dd8f52</id>
<content type='text'>
Until now, downloading was treated similar to any other operation in the
Makefile: if it crashes, the pipeline would crash. But network errors
aren't like processing errors: attempting to download a second time will
probably not crash (network relays are very complex and not reproducible
and packages get lost all the time)!

This is usually not felt in downloading one or two files, but when
downloading many thousands of files, it will happen every once and a while
and its a real waste of time until you check to just press enter again!

With this commit we have the `reproduce/src/bash/download-multi-try.sh'
script in the pipeline which will repeat the downoad several times (with
incrasing time intervals) before crashing and thus fix the problem.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now, downloading was treated similar to any other operation in the
Makefile: if it crashes, the pipeline would crash. But network errors
aren't like processing errors: attempting to download a second time will
probably not crash (network relays are very complex and not reproducible
and packages get lost all the time)!

This is usually not felt in downloading one or two files, but when
downloading many thousands of files, it will happen every once and a while
and its a real waste of time until you check to just press enter again!

With this commit we have the `reproduce/src/bash/download-multi-try.sh'
script in the pipeline which will repeat the downoad several times (with
incrasing time intervals) before crashing and thus fix the problem.
</pre>
</div>
</content>
</entry>
</feed>
