paper-concept.git - Paper (Towards Long-term and Archivable Reproducibility)

Age	Commit message (Collapse)	Author	Lines
2020-05-01	Abstract: three minor language edits	Boud Roukema	-4/+4
	The difference between `that` and `which` is not strictly required, but it helps clarify the difference in meaning, which is important in science and software :). This is best shown by an example: * Maneage provides reproducibility, which is a good thing. The sentence would make sense if we drop `, which is a good thing.` The last part of the sentence is a comment rather than a necessary part of the sentence. * Maneage provides a quality of reproducibility that is missing from other implementations. The sentence would not quite make sense if we drop `that is ...`, since we would not know what sort of quality is provided. The fact that the quality is missing is key to the intended meaning of the sentence.
2020-05-01	Merged David's suggestions, further edited to be more clear	Mohammad Akhlaghi	-7/+5
	It is also slightly shorter with this commit, without loosing anything substantial.
2020-05-01	Minor edits in abstract	David Valls-Gabaud	-7/+7
	No need to invent a new word (archive-able) when an existing one (archivable) does the job. One issue that we have not included and which perhaps we could discuss in the paper (space permitting), is that this tool could bypass the use of blockchains in this context.
2020-05-01	Minor edits in abstract, link between analysis and narrative added	Mohammad Akhlaghi	-3/+3
	As discussed by Boud in the previous commit, this is an important feature that was lost in the new abstract. So I added it as a criteria.
2020-05-01	Several minor edits to the title + abstract	Boud Roukema	-12/+13
	Most are minor English tidying, e.g. * spelling: achieving * archivable - https://en.wiktionary.org/wiki/archivable * `i.e.` does not look good in an abstract; * `when` didn't sound quite right; Comment: we no longer state one of the most interesting aspects of Maneage - producing the draft paper that is submittable for peer review in a way that makes it natural for the authors to achieve automatic consistency between the calculations/analysis and the values in the paper. But this is hard to describe in a compact way without disrupting the overall argument of the abstract, so it's a bit of a pity, but people will learn about it anyway from the body of the article (or from trying out the package!) `Peer-review verification` does not directly state producing a pdf. Related to this absence of talking about reproducing the paper, not just the calculations, I suggest dropping `, with snapshot \projectversion` from the abstract initially sent to the journal (they can't stop us updating it afterwards), because without the context of explaining that the paper itself is produced from the package, it's not clear what the snapshot means - a snapshot of the abstract? In the `real` paper, it makes sense, because the reader will have access to the rest of the paper.
2020-05-01	Edited abstract for more clarity, still in the 250 word limit	Mohammad Akhlaghi	-28/+15
	Boud's suggestions in the previous commit were great and really helped in improving the tone of the abstract (and thus the whole paper shortly!), better putting it in the big picture. I had forgot to give the exact word limit (which was 250), so Boud had set it to a very conservative value of 190, I added around 22 words to better highlight the points we want to make, while still being below the limit.
2020-05-01	Abstract re-organized to be more research-oriented	Boud Roukema	-7/+28
	To make this a research article, we either have to present it as a theoretical advance, or as an empirical advance. An empirical research result would be something like doing a survey of users and getting statistics of their success/failure in using the system, and of whether their experience is consistent with the claimed properties and principles of Maneage (e.g. success/failure in creating paper.pdf as expected? was the user's system POSIX? did the user do the install with non-root privileges? was this a with-network or without-network ./project configure ?) This is doable, but would require a bit of extra work that we are not necessarily motivated to do or have the time to do right now. I think it's possible to present Maneage as a theoretical advance, but it has to be worded properly. Maneage is a tool, but it's a tool that satisfies what we can reasonably present as a unique theoretical proposal. Here's my proposed rewrite. I've aimed at minimum word length. I've also included (commented out) keywords for a structured research abstract - these are just for us, as a guideline to improve the abstract. I think "criteria" is safer than "standards". Whether a principle is good or bad tends to lead to debate. Whether a criterion is satisfied or not is a more objective question, independent of whether you agree with the criterion or not. In the rewrite below, we propose a theoretical standard and show that the new standard can be satisfied. Maneage is used as a tool to prove that the standard is not too difficult to achieve. Maneage is no longer the subject of the paper. (That won't change the main body of the paper too much, apart from compression, but the way it's presented will have to change, under this proposal.) The title would need to match this. E.g. TITLE.1: Evidence that a higher standard of reproducibility criteria is attainable TITLE.2: Evidence that a rigorous standard of reproducibility criteria is attainable TITLE.3: Towards a more rigorous standard of reproducibility criteria I would probably go for TITLE.3.
2020-05-01	Abstract re-written to better highlight the uniqueness of Maneage	Mohammad Akhlaghi	-9/+9
	This abstract is a first step in order to put more focus on the research aspects of Maneage.
2020-05-01	Removed Definition and Summary sections and low-level figures	Mohammad Akhlaghi	-129/+19
	Given the very strict limits of journals, we needed to remove these sections and images. The removed images are: the `figure-file-architecture', `figure-src-topmake' and `figure-src-inputconf'. In total, with `wc' we now have 9019 words. This will be futher reduced when we remove all the technical parts of the Maneage section, in short, we will only describe the generalities, not any specific details.
2020-04-27	Thanked Fabrizio, Tamara and Nadia for their support	Mohammad Akhlaghi	-1/+4
	They supported my visit and talk on Maneage at the Barcelona Super Computing center. They have also offerred to read the paper and are providing comments. Also, I noticed that in the author list, we had forgot to put an `,' after Boud's name. That is also corrected here.
2020-04-25	Demonstration cloning URL set to https://git.maneage.org/project.git	Mohammad Akhlaghi	-2/+2
	Until now, we were using GitLab as the main Git repository of Maneage. But today I finally setup our own Git repository under `git.maneage.org' and enabled a CGit web interface for a simple and fast viewing of the commits and changes. Since this URL is under our own control, we can always ensure that it will point to somewhere meaningful, on any server so in the long-run its much better than publishing the paper an explicit reliance of `gitlab.com'.
2020-04-25	IMPORTANT: Primary Maneage repositories are now under maneage.org	Mohammad Akhlaghi	-15/+11
	Until now, the primary Maneage URLs were under GitLab, but since we now have a dedicated URL and Git repository, its better to transfer to this as soon as possible. Therefore with this commit, throughout Maneage, any place that Maneage was referenced through GitLab has been corrected. Please correct your project's remote to point to the new repository at `git.maneage.org/project.git', and please make sure it follows the `maneage' branch. There is no more `master' branch on Maneage.
2020-04-23	Minor edits on Boud's great corrections	Mohammad Akhlaghi	-13/+12
	Reading over Boud's edits, I noticed a few other parts that I could summarize more and corrected one or two other parts to fit the original purpose of the sentence better.
2020-04-23	Conclusion	Boud Roukema	-9/+9
	Reduction by about 5 words. Although it's true that the low-level tools - make, bash, gcc - are still being actively developed, only expert users will tend to notice the differences, and in this context, it's probably more useful to point out that these are actively maintained. (Comment: I felt that the first sentence in the Conclusion is missing one of the obvious criteria for handling big data - citizen control so that big data could hopefully become less Orwellian than it is right now, with GAFAM having the main big data databases that are used by AI researchers and will tend to affect people's lives more than traditional "scientific" databases. But there's no point adding this here, since the criteria that tend to satisfy the scientific requirements ("principles") and citizens' rights tend to overlap to a fair degree...)
2020-04-23	Discussion/caveats section.	Boud Roukema	-27/+27
	Reduction of about 50 words. There were a couple of expressions that look a bit like some sort of software/research analysis jargon, such as `Research Objects`, `Software Heritage`, `Machine actionable`. Unless these are defined, capitalising them makes the reader assume that there is some well-known formal meaning and that s/he has to search for that him/herself. As lower case expressions, the reader can guess some reasonable meanings of these. The word "embargo" was introduced for proposal 2) to handle the third caveat.
2020-04-23	Further edits to summarize the parts corrected by Boud	Mohammad Akhlaghi	-43/+39
	[Compared to first submission to DSJ last week with 11436 words in raw PDF, we have decreased the paper by ~1000 words to 10493 :-)] As with the previous commits, the moment Boud changed the structure of sentences, I was able to find the redundancies and remove them! This is a fascinating feature of collaboration I had never felt before: it is so hard to find redundancies in my own raw text, but even a minor correction by someone else suddeny breaks my mental memories/barrier on the sentence, allowing me to be more critical to it! Anyway, besides such corrections, I fixed a few other things: 1) In the DSJ's recently published papers, ther is no `~' between "Figure" and its number. 2) I noticed that in `tex/src/figure-src-inputconf.tex' I was actually using manually input strings for the filename, checksum and size! This was contrary to the whole philosophy of Maneage(!), I must have rushed and forgot! So LaTeX variables are now defined and used.
2020-04-23	4.6 Project analysis - publication	Boud Roukema	-14/+12
	About 20 words less. The ArXiv URL is added - this adds no extra length in words, and some readers will not be familiar with ArXiv (although the COVID-19 pandemic has attracted attention to BiorXiv).
2020-04-23	4.5 Project analysis - multi-user	Boud Roukema	-4/+4
	Increase by 5 words. We don't need to give a big warning here, but "Permissions management" is meant to be a brief way of saying that whether or not different users can really read/write/execute in subdirectories will firstly depend on whether the user who cloned Maneage has handled these permissions correctly and whether s/he is able to allow others to edit in his/her subdirectories. Comment: Users would have to check who else is logged in at the time, who else is running jobs, and so on. On a supercomputer this might make sense, to avoid unnecessary recompiles. Anyway, this edit summary is not the place to discuss this...
2020-04-23	4.4 Project analysis - git branches	Boud Roukema	-12/+12
	Reduction by 15 words. "Branch" is fine as a verb, and "off" is fine as a preposition; there's no need for a second preposition. "We branched off the main forest path onto a smaller path".
2020-04-23	4.3.6 Project analysis - configure files	Boud Roukema	-13/+14
	Length reduction by about 15 words. A semantically significant change is from `leading to more robust scientific results` to `evolves in the case of exploratory research papers, and better self-consistency in hypothesis testing papers`. I said this in a previous commit, but it can't hurt repeating: In the covidian epoch (though not only), it is especially important to distinguish bayesian type exploratory research (typical in astronomy or searching for a good COVID-19 treatment or vaccine) from hypothesis testing (clinical testing in double-blind random access trials with clinical trials methods published on a public registry prior to the trials taking place). In the latter case, you want your results to be analysed consistently with the plan published before the trials even begin, and ideally you want them to be published (or at least posted on the trial registry website) even if your results are insignificant, to avoid a publication bias in favour of significant results. Test homeopathy against placebos in 1000 independent experiments, analyse them all the same way, and 2-3 experiments will be significant at the 3 sigma level...
2020-04-23	4.3.5 Project analysis - downloads	Boud Roukema	-5/+4
	Reduction by about 7 words. I added "internet security" as an extra reason for having all the downloads in a single file. Modularity and minimal complexity in themselves generally contribute to internet security, but in this case, it's obvious that having all the communication with the outside world managed through a single file makes internet security management much simpler. I replaced the "fake URL" by the real one, because at least in the present format, the URL fits in nicely. So both `paper.tex` and `tex/src/figure-src-inputconf.tex` are modified in this commit.
2020-04-23	4.3.4 Project analysis - the analysis itself	Boud Roukema	-14/+14
	Reduction by about 20 words - minor rewording.
2020-04-22	Acknowledged the help of Idafen in Maneage	Mohammad Akhlaghi	-0/+1
	Idafen has helped in testing Maneage a lot during the last year and has provided very useful feedback and suggestions.
2020-04-22	Applied futher comments by Konrad	Mohammad Akhlaghi	-7/+8
	Regarding Docker Konrad pointed out that "Linux has an excellent track record for stability. It's more likely that the Docker itself becomes incompatible with older containers. Docker isn't developed for reproducibility after all". So I tried to modify that paragraph to include this important point too. In the process, I also shrank it a little more (without loosing anything substantial), so it doesn't add to the paper's length.
2020-04-22	Minor edits to summarize section on project.tex and verify.tex	Mohammad Akhlaghi	-15/+13
	After going through Boud's corrections, I thought it can be further summarized without loosing any major point.
2020-04-22	4.3.3 Project analysis - verification	Boud Roukema	-6/+6
	Reduction by 4 words. Minor rewording; removal of "Note that" and "simply" (the opposite of "complicatedly"). If a checksum is simple for a given user, then s/he already knows that; if s/he doesn't yet know what a checksum is, then stating that it's simple doesn't help very much. :)
2020-04-22	4.3.2 Project analysis - values within text	Boud Roukema	-11/+11
	Reduction of about 15 words. The phrase "which does not need it" is removed. On its own, this is a claim, not an explanation. If the reader is wondering why `paper.tex` is not a produced file, then stating that the file is not needed will not help very much. Looking at the diagram will show that `paper.tex` is the overall article template; and the diagram strongly suggests that values from initialize.tex, ..., are passed into verify.tex, and from there into project.tex, which goes into paper.tex. The phrase "files, possibly in another subMakefile" should really be something like "files, possibly created by another subMakefile". But this would add more words, and given that the user has full control to modify and adapt the overall scheme (including making a mess of it), we can safely drop the info that the scheme can be made more complicated. :)
2020-04-22	4.3.1 Project analysis - paper.pdf	Boud Roukema	-2/+2
	Only 3 words are reduced in this commit, but I think the improvements are worth it. "Note that" and "It is worth mentioning" are phrases still quite often used by academics (even in astronomy) that can be politely described as "pontification" or informally as "empty blabla"; these add no meaning except "I am teaching you something and I expect you to pay attention to what I am saying". :) There are also less polite descriptions.
2020-04-22	4.3 Project analysis intro	Boud Roukema	-17/+17
	Minor rewording of 4.3 Project analysis - introduction. Reduction of about 40 words. 4.2 `parallel` quote: s/http:/https:/
2020-04-22	Implemented Konrad's suggestions, minor edits here and there	Mohammad Akhlaghi	-65/+64
	Today Konrad made the following suggestions after reading through the paper (created from Commit 1ac5c12). Thanks a lot Konrad ;-). I tried to address them all in this commit. Afterwards, while looking over the corrected parts, some minor edits came up to me to remove redundant parts and add extra points where it helps. In particular to be able to print the International Phonetic Alphabet (IPA), I had to include the LaTeX `TIPA' package, but it was interesting to see that it was already available in the project as a dependency of another package we loaded.
2020-04-20	Added link to citation from GNU Parallel, slightly summarized it	Mohammad Akhlaghi	-1/+1
	Boud previously pointed out that that he couldn't find a reference to the citation, so I added it as a link over "its FAQ" (since its described in its `doc/citation-notice-faq.txt' file). I also removed the first part of the quote which was not really necessary, the heart of the quote is the latter part that still remains.
2020-04-20	Minor edits on Boud's corrections to merge	Mohammad Akhlaghi	-33/+31
	I tried to make it slightly shorter, but I felt that it is important to keep the quote from GNU Parallel and in particular the financial aid it asks for. It will help readers feel the gravity of the sitution for this software author. The precise citation of the quote was given in the long version.
2020-04-20	Minor copyedits - 4.2.2 software citation	Boud Roukema	-14/+12
	This reduces the length by about 70 words. The biggest change is to remove what looks like a citation from `parallel'. I couldn't find the citation in GNU parallel 20161222-1 (Debian/stretch), nor with search engines. I don't think that the quote is really so useful (even assuming it's a valid quote from somewhere): citation practices are a mix between ethics, preparation to convince referees, citing those who are already cited frequently, and the practicality of searching for and verifying references against the information for which they are used. Showing that Maneage makes citation not only easy, but more or less automatic, bypasses some of the compromises between practicality and ethics.
2020-04-20	Minor copyedits - 4.2.1 source verification	Boud Roukema	-8/+8
	Minor rewording; a reduction of about 12 words.
2020-04-20	Minor copyedits - 4.2 intro configuration	Boud Roukema	-9/+9
	Minor edits - reduces about 17 words.
2020-04-20	Minor copyedits - 4.1 Maneage orchestration	Boud Roukema	-13/+13
	This commit reduces about 25 words from the 4.1 Maneage orchestration, aka `make`, section.
2020-04-20	Minor edits to 4 Maneage intro	Boud Roukema	-11/+11
	This drops the word count in the introductory part of the Maneage section by about 15 words.
2020-04-20	Clarfication on free software complementing reproducibility	Mohammad Akhlaghi	-1/+1
	Thanks to Boud's corrections, I see that the sentence can be confusing and not convey the point I wanted to make properly, so I am clarifying it here. The main point is that this principle complements the definition of reproducibility, not the other principls.
2020-04-20	minor language edits	Boud Roukema	-4/+4
	These tiny language edits add 1 word in length.
2020-04-20	Boud moved to third author, Lyon affiliation for Mohammad, minor edits	Mohammad Akhlaghi	-16/+17
	Boud has contributed a lot to Maneage over the last few years and with the last few commits he also contributed significantly to this paper, so I am moving him to third author. Thanks to Boud, I also remembered that even though I done the most important parts of Maneage in Lyon, I hadn't added it as an affiliation for myself, so I added it. Maneage became a separate project in Lyon. Finally, I tried to decrease the length of the acknowledgments by adding some abbreviations that were shared between various parts.
2020-04-20	boud authorship/affil/acknowl	Boud Roukema	-3/+10
	Unfortunately, adding in my name/affiliations/acknowledgments adds about 90 words to the text. We don't really know if these are counted by the editor in the 8000-word limit. I changed `funded' to `funded/supported'. I only get funding from one out of the three sources I acknowledge, but it's important to acknowledge all three.
2020-04-20	Minor edits in the text	Mohammad Akhlaghi	-4/+4
	While looking over the PDF, a few small edits were made to be more clear.
2020-04-19	Imported the recent parallel works on the principles section	Mohammad Akhlaghi	-65/+64
	The conflict was only on the list of existing tools and that was easily corrected.
2020-04-19	Further summarized the principles section	Mohammad Akhlaghi	-50/+49
	Following Boud's great corrections, I was able to futher summarize this section, decreasing roughly 150 more words from this section.
2020-04-19	List of existing tools made cleaner in LaTeX source	Mohammad Akhlaghi	-1/+11
	Until now the list of existing tools was written in one line which made it hard to read and follow, especially since we added links. It is now expanded into a one-line per item which makes to no difference in the final PDF.
2020-04-19	Principles - P7 FOSS	Boud Roukema	-5/+5
	Reduction by 15 words.
2020-04-19	Principles - P6 Scalability	Boud Roukema	-2/+2
	Reduction by 7 words. For a regular GNU/Linux of other unix-like system user, the bit about ISO C compilers even existing for Microsoft systems more or less says "despite there being no point ever trying to do science on a Microsoft system, you could hypothetically compile and run any ISO C program on it". Interesting, but not directly of interest to this user, who is unlikely to actually want to do it. A Microsoft user who thinks that s/he can do science on a Microsoft system will typically think "Microsoft is good, so of course I can run anything I want on it". So the message here could more likely be seen as provocative rather than useful, since this user is unaware of the fundamental problems of Microsoft as an authoritarian, manipulative, centralised organisation providing bad software. So either way, the parenthesis about Microsoft can be safely removed given the space constraints.
2020-04-19	Principles - P5 History and temporal provenance	Boud Roukema	-3/+3
	Reduction by 5 words. The term "exploratory research" is intended in the specific sense listed at en.Wikipedia: https://en.wikipedia.org/wiki/Exploratory_research to distinguish it from hypothesis testing. The final phases of clinical (medical) research, for example, to test whether a candidate SARS-CoV-2 vaccine is (i) effective and (ii) safe in homo sapiens, cannot accept the exploratory methods that are acceptable in astronomy, or in other exploratory research (which is acceptable in the early stages of medical research). Clinical trial registration is aimed at preventing scientists from modifying their methods in a given project: https://en.wikipedia.org/wiki/Clinical_trial_registration
2020-04-19	Principles - P4 verifiable inputs and outputs	Boud Roukema	-1/+1
	One superfluous word was removed.
2020-04-19	Principles - P3 minimal complexity	Boud Roukema	-5/+5
	Minor wording changes - reduction by 10 words.