aboutsummaryrefslogtreecommitdiff
path: root/peer-review/1-answer.txt
blob: 6ccf8d4543a3a6008fe3f7e54e525b807439c8f1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1.  [EiC] Some reviewers request additions, and overview of other
    tools.

ANSWER: Indeed, there is already a large body work in various issues that
have been touched upon in this paper. Before submitting the paper, we had
already done a very comprehensive review of the tools (as you may notice
from the Git repository[1]). However, the CiSE Author Information
explicitly states: "The introduction should provide a modicum of background
in one or two paragraphs, but should not attempt to give a literature
review". This is also practiced in previously published papers at CiSE and
is in line with the very limited word-count and maximum of 12 references to
be used in bibliography.

We were also eager to get that extensive review out (which took a lot of
time, and most of the tools were actually run and tested). Hence we
discussed this privately with the editors and this solution was agreed
upon: we include that extended review as appendices on the arXiv[2] and
Zenodo[3] pre-prints of this paper and mention those publicly available
appendices in the submitted paper for an interested reader to followup.

[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/tex/src/paper-long.tex#L1579
[2] https://arxiv.org/abs/2006.03018
[3] https://doi.org/10.5281/zenodo.3872247

------------------------------





2.  [Associate Editor] There are general concerns about the paper
    lacking focus

ANSWER:

------------------------------





3.  [Associate Editor] Some terminology is not well-defined
    (e.g. longevity).

ANSWER: It has now been clearly defined in the first paragraph of Section
II. With this definition, the main argument of the paper is much more clear,
thank you (and the referees for highlighting this).

------------------------------





4.  [Associate Editor] The discussion of tools could benefit from some
    categorization to characterize their longevity.

ANSWER: The longevity of the general tools reviewed in Section II are now
mentioned immediately after each (highlighted in green).

------------------------------





5.  [Associate Editor] Background and related efforts need significant
    improvement. (See below.)

ANSWER: This has been done, as mentioned in (1).

------------------------------





6.  [Associate Editor] There is consistency among the reviews that
    related work is particularly lacking.

ANSWER: This has been done, as mentioned in (1).

------------------------------





7.  [Associate Editor] The current work needs to do a better job of
    explaining how it deals with the nagging problem of running on CPU
    vs. different architectures.

ANSWER: The CPU architecture of the running system is now reported in the
"Acknowledgments" section and a description of the problem and its solution
in Maneage is also added in the "Proof of concept: Maneage" Section.

------------------------------





8.  [Associate Editor] At least one review commented on the need to
    include a discussion of continuous integration (CI) and its
    potential to help identify problems running on different
    architectures. Is CI employed in any way in the work presented in
    this article?

ANSWER: CI has been added in the discussion as one solution to find
breaking points in operating system updates and new/different
architectures. For the core Maneage branch, we have defined task #15741 [1]
to add CI on many architectures in the near future.

[1] http://savannah.nongnu.org/task/?15741

------------------------------





9.  [Associate Editor] The presentation of the Maneage tool is both
    lacking in clarity and consistency with the public
    information/documentation about the tool. While our review focus
    is on the article, it is important that readers not be confused
    when they visit your site to use your tools.

###########################
ANSWER [NOT COMPLETE]: We should separate the various sections of the
README-hacking.md webpage into smaller pages that can be entered.
###########################

------------------------------





10. [Associate Editor] A significant question raised by one review is
    how this work compares to "executable" papers and Jupyter
    notebooks.  Does this work embody similar/same design principles
    or expand upon the established alternatives? In any event, a
    discussion of this should be included in background/motivation and
    related work to help readers understand the clear need for a new
    approach, if this is being presented as new/novel.

ANSWER: Thank you for highlighting this important point. We saw that its
necessary to contrast our proof of concept demonstration more directly with
Maneage. Two paragraphs have been added in Sections II and IV for this.

------------------------------





11. [Reviewer 1] Adding an explicit list of contributions would make
    it easier to the reader to appreciate these. These are not
    mentioned/cited and are highly relevant to this paper (in no
    particular order):
     1.  Git flows, both in general and in particular for research.
     2.  Provenance work, in general and with git in particular
     3.  Reprozip: https://www.reprozip.org/
     4.  OCCAM: https://occam.cs.pitt.edu/
     5.  Popper: http://getpopper.io/
     6.  Whole Tale: https://wholetale.org/
     7.  Snakemake: https://github.com/snakemake/snakemake
     8.  CWL https://www.commonwl.org/ and WDL https://openwdl.org/
     9.  Nextflow: https://www.nextflow.io/
     10. Sumatra: https://pythonhosted.org/Sumatra/
     11. Podman: https://podman.io
     12. AppImage (https://appimage.org/)
     13. Flatpack (https://flatpak.org/)
     14. Snap (https://snapcraft.io/)
     15. nbdev https://github.com/fastai/nbdev and jupytext
     16. Bazel: https://bazel.build/
     17. Debian reproducible builds: https://wiki.debian.org/ReproducibleBuilds

ANSWER:

1.  In Section IV, we have added that "Generally, any git flow (branching
    strategies) can be used by the high-level project authors or future
    readers."
2.  We have mentioned research objects as one mode of provenance tracking
    and the related provenance work that has already been done and can be
    exploited using these criteria and our proof of concept is indeed very
    large. However, the 6250 word-count limit is very tight and if we add
    more on it in this length, we would have to remove more directly
    relevant points. Hopefully this can be the subject of a follow up
    paper.
3.  A review of ReproZip is in Appendix B.
4.  A review of Occam is in Appendix B.
5.  A review of Popper is in Appendix B.
6.  A review of Whole tale is in Appendix B.
7.  A review of Snakemake is in Appendix A.
8.  CWL and WDL are described in Appendix A (job management).
9.  Nextflow is described in Appendix A (job management).
10. Sumatra is described in Appendix B.
11. Podman is mentioned in Appendix A (containers).
12. AppImage is mentioned in Appendix A (package management).
13. Flatpak is mentioned in Appendix A (package management).
14. nbdev and jupytext are high-level tools to generate documentation and
    packaging custom code in Conda or pypi. High-level package managers
    like Conda and Pypi have already been thoroughly reviewed in Appendix A
    for their longevity issues, so we feel there is no need to include
    these.
15. Bazel has been mentioned in Appendix A (job management).
16. Debian's reproducible builds is only for ensuring that software
    packaged for Debian are bitwise reproducible. As mentioned in the
    discussion of this paper, the bitwise reproducibility of software is
    not an issue in the context discussed here, the reproducibility of the
    relevant output data of the software is the main issue.


------------------------------





12. [Reviewer 1] Existing guidelines similar to the proposed "Criteria
    for longevity". Many articles of these in the form "10 simple
    rules for X", for example (not exhaustive list):
     * https://doi.org/10.1371/journal.pcbi.1003285
     * https://arxiv.org/abs/1810.08055
     * https://osf.io/fsd7t/
     * A model project for reproducible papers: https://arxiv.org/abs/1401.2000
     * Executable/reproducible paper articles and original concepts

ANSWER: Thank you for highlighting these points. Appendix B starts with a
subsection titled "suggested rules, checklists or criteria" that review of
existing criteria. That include the proposed sources here (and others).

arXiv:1401.2000 has been added in Appendix A as an example paper using
virtual machines. We thank the referee for bringing up this paper, because
the link to the VM provided in the paper no longer works (the file has been
removed on the server). Therefore added with SHARE, it very nicely
highlighting our main issue with binary containers or VMs and their lack of
longevity.

------------------------------





13. [Reviewer 1] Several claims in the manuscript are not properly
    justified, neither in the text nor via citation. Examples (not
    exhaustive list):
     1. "it is possible to precisely identify the Docker “images” that
        are imported with their checksums, but that is rarely practiced
        in most solutions that we have surveyed [which ones?]"
     2. "Other OSes [which ones?] have similar issues because pre-built
        binary files are large and expensive to maintain and archive."
     3. "Researchers using free software tools have also already had
        some exposure to it"
     4. "A popular framework typically falls out of fashion and
        requires significant resources to translate or rewrite every
        few years."

ANSWER: They have been clarified in the highlighted parts of the text:

1. Many examples have been given throughout the newly added appendices. To
   avoid confusion in the main body of the paper, we have removed the "we
   have surveyed" part. It is already mentioned above it that a large
   survey of existing methods/solutions is given in the appendices.

2. Due to the thorough discussion of this issue in the appendices with
   precise examples, this line has been removed to allow space for the
   other points raised by the referees. The main point (high cost of
   keeping binaries) is aldreay abundantly clear.

   On a similar topic, Dockerhub's recent announcement that inactive images
   (for over 6 months) will be deleted has also been added. The announcemnt
   URL is here (it was too long to include in the paper, if IEEE has a
   special short-url format, we can add it):
   https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates

3. A small statement has been added, reminding the readers that almost all
   free software projects are built with Make (note that CMake is just a
   high-level wrapper over Make: it finally produces a 'Makefile').

4. The example of Python 2 has been added.


------------------------------





14. [Reviewer 1] As mentioned in the discussion by the authors, not
    even Bash, Git or Make is reproducible, thus not even Maneage can
    address the longevity requirements. One possible alternative is
    the use of CI to ensure that papers are re-executable (several
    papers have been written on this topic). Note that CI is
    well-established technology (e.g. Jenkins is almost 10 years old).

ANSWER: Thank you for raising this issue. We had initially planned to add
this issue also, but like many discussion points, we were forced to remove
it before the first submission due to the very tight word-count limit. We
have now added a sentence on CI in the discussion.

On the initial note, indeed, the "executable" files of Bash, Git or Make
are not bitwise reproducible/identical on different systems. However, as
mentioned in the discussion, we are concerned with the _output_ of the
software's executable file, _after_ the execution of its job. We (or any
user of Bash) is not interested in the executable file itself. The
reproducibility of the binary file only becomes important if a bug is found
(very rare for common usage in such core software of the OS).  Hence even
though the compiled binary files of specific versions of Git, Bash or Make
will not be bitwise reproducible/identical on different systems, their
outputs are exactly reproducible: 'git describe' or Bash's 'for' loop will
have the same output on GNU/Linux, macOS or FreeBSD (that produce bit-wise
different executables).

------------------------------





15. [Reviewer 1] Criterion has been proposed previously. Maneage itself
    provides little novelty (see comments below).

ANSWER: The previously suggested criteria that were mentioned are reviewed
in the newly added Appendix B, and the novelty/necessity of the proposed
criteria is shown by comparison there.

------------------------------





16. [Reviewer 2] Authors should add indication that using good practices it
    is possible to use Docker or VM to obtain identical OS usable for
    reproducible research.

ANSWER: In the submitted version we had stated that "Ideally, it is
possible to precisely identify the Docker “images” that are imported with
their checksums ...". But to be more clear and directly to the point, it
has been edited to explicity say "... to recreate an identical OS image
later".

------------------------------





17. [Reviewer 2] The CPU architecture of the platform used to run the
    workflow is not discussed in the manuscript. Authors should probably
    take into account the architecture used in their workflow or at least
    report it.

ANSWER: Thank you very much for raising this important point. We hadn't
seen other reproducibility papers mention this important point and missed
it. In the acknowledgments (where we also mention the commit hashes) we now
explicity mention the exact CPU architecture used to build this paper:
"This project was built on an x86_64 machine with Little Endian byte-order
and address sizes 39 bits physical, 48 bits virtual.". This is because we
have already seen cases where the architecture is the same, but programs
fail because of the byte-order.

Generally, Maneage will now extract this information from the running
system during its configuration phase and provide the users with three
different LaTeX macros that they can use anywhere in their paper.

------------------------------





18. [Reviewer 2] I don’t understand the "no dependency beyond
    POSIX". Authors should more explained what they mean by this sentence.

ANSWER: This has been clarified with the short extra statement "a minimal
Unix-like standard that is shared between many operating systems". We would
have liked to explain this more, but the word-limit is very constraining.

------------------------------





19. [Reviewer 2] Unfortunately, sometime we need proprietary or specialized
    software to read raw data... For example in genetics, micro-array raw
    data are stored in binary proprietary formats. To convert this data
    into a plain text format, we need the proprietary software provided
    with the measurement tool.

ANSWER: Thank you very much for this good point. A description of a
possible solution to this has been added after criteria 8.

------------------------------





20. [Reviewer 2] I was not able to properly set up a project with
    Maneage. The configuration step failed during the download of tools
    used in the workflow. This is probably due to a firewall/antivirus
    restriction out of my control. How frequent this failure happen to
    users?

ANSWER: Thank you for mentioning this. This has been fixed by archiving all
Maneage'd software on Zenodo (https://doi.org/10.5281/zenodo.3883409) and
also downloading from there.

Until recently we would directly access each software's own webpage to
download the files, and this caused many problems like this. In other
cases, we were very frustrated when a software's webpage would temporarily
be unavailable (for maintainance reasons), this wouldn't allow us to build
new projects.

Since all the software are free, we are allowed to re-distribute them and
Zenodo is defined for long-term archival of academic artifacts, so we
figured that a software source code repository on Zenodo would be the most
reliable solution. At configure time, Maneage now accesses Zenodo's DOI and
resolves the most recent URL to automatically download any necessary
software source code that the project needs from there.

Generally, we also keep all software in a Git repository on our own
webpage: http://git.maneage.org/tarballs-software.git/tree. Also, Maneage
users can also identify their own custom URLs for downloading software,
which will be given higher priority than Zenodo (useful for situations when
a custom software is downloaded and built in a project branch (not the core
'maneage' branch).

------------------------------





21. [Reviewer 2] The time to configure a new project is quite long because
    everything needs to be compiled. Authors should compare the time
    required to set up a project Maneage versus time used by other
    workflows to give an indication to the readers.

ANSWER: Thank you for raising this point. it takes about 1.5 hours to
configure the default Maneage branch on an 8-core CPU (more than half of
this time is devoted to GCC on GNU/Linux operating systems, and the
building of GCC can optionally be disabled with the '--host-cc' option to
significantly speed up the build when the host's GCC is
similar). Furthermore, Maneage can be built within a Docker container.

Generally, a paragraph has been added in Section IV on this issue (the
build time and building within a Docker container). We have also defined
task #15818 [1] to have our own core Docker image that is ready to build a
Maneaged project and will be adding it shortly.

[1] https://savannah.nongnu.org/task/index.php?15818

------------------------------





22. [Reviewer 3] Authors should define their use of the term [Replicability
    or Reproducibility] briefly for their readers.

ANSWER: "Reproducibility" has been defined along with "Longevity" and
"usage" at the start of Section II.

------------------------------





23. [Reviewer 3] The introduction is consistent with the proposal of the
    article, but deals with the tools separately, many of which can be used
    together to minimize some of the problems presented. The use of
    Ansible, Helm, among others, also helps in minimizing problems.

ANSWER: Ansible and Helm are primarily designed for distributed
computing. For example Helm is just a high-level package manager for a
Kubernetes cluster that is based on containers. A review of them can be
added in the Appendix, but we feel they may not be too relevant for this
paper.

------------------------------





24. [Reviewer 3] When the authors use the Python example, I believe it is
    interesting to point out that today version 2 has been discontinued by
    the maintaining community, which creates another problem within the
    perspective of the article.

ANSWER: Thank you very much for highlighting that this point was not included
for the sake of length, it has been fitted into the introduction now.

------------------------------





25. [Reviewer 3] Regarding the use of VM's and containers, I believe that
    the discussion presented by THAIN et al., 2015 is interesting to
    increase essential points of the current work.

ANSWER: Thank you very much for pointing this the works by Thain. We
couldn't find any first-author papers in 2015, but found Meng & Thain
(https://doi.org/10.1016/j.procs.2017.05.116) which had a related
discussion of why they didn't use Docker containers in their work. That
paper is now cited in the discussion of Containers in Appendix A.

------------------------------





26. [Reviewer 3] About the Singularity, the description article was missing
    (Kurtzer GM, Sochat V, Bauer MW, 2017).

ANSWER: Thank you for the reference, we could not put it in the main body
of the paper (like many others) due to the strict bibliography limit of 12,
but it has been cited in Appendix A (where we discuss Singularity).

------------------------------





27. [Reviewer 3] I also believe that a reference to FAIR is interesting
    (WILKINSON et al., 2016).

ANSWER: The FAIR principles have been mentioned in the main body of the
paper, but unfortunately we had to remove its citation the main paper (like
many others) within the maximum limit 12 references. We have cited it in
Appendix B.

------------------------------





28. [Reviewer 3] In my opinion, the paragraph on IPOL seems to be out of
    context with the previous ones. This issue of end-to-end
    reproducibility of a publication could be better explored, which would
    further enrich the tool presented.

#####################################
ANSWER:
#####################################

------------------------------





29. [Reviewer 3] On the project website, I suggest that the information
    contained in README-hacking be presented on the same page as the
    Tutorial. A topic breakdown is interesting, as the markdown reading may
    be too long to find information.

#####################################
ANSWER:
#####################################

------------------------------





31. [Reviewer 3] The tool is suitable for Unix users, keeping users away
    from Microsoft environments.

ANSWER: The issue of building on Windows has been discussed in Section IV,
either using Docker (or VMs) or using the Windows Subsystem for Linux.

------------------------------




32. [Reviewer 3] Important references are missing; more references are
    needed

ANSWER: Two comprehensive Appendices have beed added to address this issue.

------------------------------





33. [Reviewer 4] Revisit the criteria, show how you have come to decide on
    them, give some examples of why they are important, and address
    potential missing criteria.

for example the referee already points to "how code is written" as a
criteria (for example for threading or floating point errors), or
"performance".

#################################
ANSWER:
#################################

------------------------------





34. [Reviewer 4] Clarify the discussion of challenges to adoption and make
    it clearer which tradeoffs are important to practitioners.

##########################
ANSWER:
##########################

------------------------------





35. [Reviewer 4] Be clearer about which sorts of research workflow are best
    suited to this approach.

################################
ANSWER:
################################

------------------------------





36. [Reviewer 4] There is also the challenge of mathematical
    reproducibility, particularly of the handling of floating point number,
    which might occur because of the way the code is written, and the
    hardware architecture (including if code is optimised / parallelised).

################################
ANSWER:
################################

------------------------------





37. [Reviewer 4] Performance ... is never mentioned

################################
ANSWER:
################################

------------------------------

38. [Reviewer 4] Tradeoff, which might affect Criterion 3 is time to result,
    people use popular frameworks because it is easier to use them.

################################
ANSWER:
################################

------------------------------





39. [Reviewer 4] I would liked to have seen explanation of how these
    challenges to adoption were identified: was this anecdotal, through
    surveys? participant observation?

ANSWER: The results mentioned here are based on private discussions after
holding multiple seminars and Webinars with RDA's support, and also a
workshop that was planned for non-astronomers. We even invited (funded)
early career researchers to come to the workshop with the RDA funding,
however, that workshop was cancelled due to the pandemic and we had private
communications after.

We would very much like to elaborate on this experience of training new
researchers with these tools. However, as with many of the cases above, the
very strict word-limit doesn't allow us to elaborate beyond what is already
there.

------------------------------





40. [Reviewer 4] Potentially an interesting sidebar to investigate how
    LaTeX/TeX has ensured its longevity!

##############################
ANSWER:
##############################

------------------------------





41. [Reviewer 4] The title is not specific enough - it should refer to the
    reproducibility of workflows/projects.

##############################
ANSWER:
##############################

------------------------------





42. [Reviewer 4] Whilst the thesis stated is valid, it may not be useful to
    practitioners of computation science and engineering as it stands.

ANSWER: We would appreciate if you could clarify this point a little
more. We have shown how it has already been used in many research projects
(also outside of observational astronomy which is the first author's main
background). It is precisely defined for computational science and
engineering problems where _publication_ of the human-readable workflow
source is also important.

------------------------------





43. [Reviewer 4] Longevity is not defined.

ANSWER: It has been defined now at the start of Section II.

------------------------------





44. [Reviewer 4] Whilst various tools are discussed and discarded, no
    attempt is made to categorise the magnitude of longevity for which they
    are relevant. For instance, environment isolators are regarded by the
    software preservation community as adequate for timescale of the order
    of years, but may not be suitable for the timescale of decades where
    porting and emulation are used.

ANSWER: Statements on quantifying their longevity have been added in
Section II. For example in the case of Docker images: "their longevity is
determined by the host kernel, usually a decade", for Python packages:
"Python installation with a usual longevity of a few years", for Nix/Guix:
"with considerably better longevity; same as supported CPU architectures."

------------------------------





45. [Reviewer 4] The title of this section "Commonly used tools and their
    longevity" is confusing - do you mean the longevity of the tools or the
    longevity of the workflows that can be produced using these tools?
    What happens if you use a combination of all four categories of tools?

##########################
ANSWER:
##########################

------------------------------





46. [Reviewer 4] It wasn't clear to me if code was being run to generate
    the results and figures in a LaTeX paper that is part of a project in
    Maneage. It appears to be suggested this is the case, but Figure 1
    doesn't show how this works - it just has the LaTeX files, the data
    files and the Makefiles. Is it being suggested that LaTeX itself is the
    programming language, using its macro functionality?

ANSWER: Thank you for highlighting this point of confusion. The caption of
Figure 1 has been edited to hopefully clarify the point. In short, the
arrows represent the operation of software on their inputs (the file they
originate from) to generate their outputs (the file they point to). In the
case of generating 'paper.pdf' from its three dependencies
('references.tex', 'paper.tex' and 'project.tex'), yes, LaTeX is used. But
in other steps, other tools are used. For example as you see in [1] the
main step of the arrow connecting 'table-3.txt' to 'tools-per-year.txt' is
an AWK command (there are also a few 'echo' commands for meta data and
copyright in the output plain-text file [2]).

[1] https://gitlab.com/makhlaghi/maneage-paper/-/blob/master/reproduce/analysis/make/demo-plot.mk#L51
[2] https://zenodo.org/record/3911395/files/tools-per-year.txt

------------------------------





47. [Reviewer 4] I was a bit confused on how collaboration is handled as
    well - this appears to be using the Git branching model, and the
    suggestion that Maneage is keeping track of all components from all
    projects - but what happens if you are working with collaborators that
    are using their own Maneage instance?

ANSWER: Indeed, Maneage operates based on the Git branching model. As
mentioned in the text, Maneage is itself a Git branch. People create their
own branch from the 'maneage' branch and start customizing it for their
particular project in their own particular repository. They can also use
all types of Git-based collaborating models to work together on a project
that is not yet finished.

Figure 2 in fact explicitly shows such a case: the main project leader is
committing on the "project" branch. But a collaborator creates a separate
branch over commit '01dd812' and makes a couple of commits ('f69e1f4' and
'716b56b'), and finally asks the project leader to merge them into the
project. This can be generalized to any Git based collaboration model.

------------------------------





48. [Reviewer 4] I would also liked to have seen a comparison between this
    approach and other "executable" paper approaches e.g. Jupyter
    notebooks, compared on completeness, time taken to write a "paper",
    ease of depositing in a repository, and ease of use by another
    researcher.

#######################
ANSWER:
#######################

------------------------------





49. [Reviewer 4] The weakest aspect is the assumption that research can be
    easily compartmentalized into simple and complete packages. Given that
    so much of research involves collaboration and interaction, this is not
    sufficiently addressed. In particular, the challenge of
    interdisciplinary work, where there may not be common languages to
    describe concepts and there may be different common workflow practices
    will be a barrier to wider adoption of the primary thesis and criteria.

ANSWER: Maneage was precisely defined to address the problem of
publishing/collaborating on complete workflows. Hopefully with the
clarification to point 47 above, this should also become clear.

------------------------------





50. [Reviewer 5] Major figures currently working in this exact field do not
    have their work acknowledged in this work.

ANSWER: This was due to the strict word limit and the CiSE publication
policy (to not include a literature review because there is a limit of only
12 citations). But we had indeed done a comprehensive literature review and
the editors kindly agreed that we publish that review as appendices to the
main paper on arXiv and Zenodo.

------------------------------





51. [Reviewer 5] The popper convention: Making reproducible systems
    evaluation practical ... and the later revision that uses GitHub
    Actions, is largely the same as this work.

ANSWER: This work and the proposed criteria are very different from
Popper. A review of Popper has been given in Appendix B.

------------------------------





52. [Reviewer 5] The lack of attention to virtual machines and containers
    is highly problematic. While a reader cannot rely on DockerHub or a
    generic OS version label for a VM or container, these are some of the
    most promising tools for offering true reproducibility.

ANSWER: Containers and VMs have been more thoroughly discussed in the main
body and also extensively discussed in appendix A (that are now available
in the arXiv and Zenodo versions of this paper). As discussed (with many
cited examples), Contains and VMs are only good when they are themselves
reproducible (for example running the Dockerfile this year and next year
gives the same internal environment). However we show that this is not the
case in most solutions (a more comprehensive review would require its own
paper).

However with complete/robust environment builders like Maneage, Nix or GNU
Guix, the analysis environment within a container can be exactly reproduced
later. But even so, due to their binary nature and large storage volume,
they are not trusable sources for the long term (it is expensive to archive
them). We show several example in the paper of how projects that relied on
VMs in 2011 and 2014 are no longer active, and how even Dockerhub will be
deleting containers that are not used for more than 6 months in free
accounts (due to the large storage costs).

Raul: it would be interesting to mention here that Maneage has the criterion of
"Minimal complexity". This means that even if for any reason the project is not
able to be run in the future, the content, analysis scripts, etc. are accesible
for the interested reader (because it is in plain text). So, it is transparent
in any case and the interested reader can follow the analysis and study the
different decissions of each step (why and how the analysis was done).

------------------------------





53. [Reviewer 5] On the data side, containers have the promise to manage
    data sets and workflows completely [Lofstead J, Baker J, Younge A. Data
    pallets: containerizing storage for reproducibility and
    traceability. InInternational Conference on High Performance Computing
    2019 Jun 16 (pp. 36-45). Springer, Cham.] Taufer has picked up this
    work and has graduated a MS student working on this topic with a
    published thesis. See also Jimenez's P-RECS workshop at HPDC for
    additional work highly relevant to this paper.

ANSWER: Thank you for the interesting paper by Lofstead+2019 on Data
pallets. We have cited it in Appendix A as examples of how generic the
concept of containers is.

The topic of linking data to analysis is also a core result of the criteria
presented here, and is also discussed shortly in the paper.  There are
indeed many very interesting works on this topic. But the format of CiSE is
very short (a maximum of ~6000 words with 12 references), so we don't have
the space to go into this any further. But this is indeed a very
interesting aspect for follow up studies, especially as the usage of
Maneage incrases, and we have more example workflows by users to study the
linkage of data analysis.

------------------------------





54. [Reviewer 5] Some other systems that do similar things include:
    reprozip, occam, whole tale, snakemake.

ANSWER: All these tools have been reviewed in the newly added appendices.

------------------------------





55. [Reviewer 5] the paper needs to include the context of the current
    community development level to be a complete research paper. A revision
    that includes evaluation of (using the criteria) and comparison with
    the suggested systems and a related work section that seriously
    evaluates the work of the recommended authors, among others, would make
    this paper worthy for publication.

ANSWER: A thorough review of current low-level tools and and high-level
reproducible workflow management systems has been added in the extended
Appendix.

------------------------------





56. [Reviewer 5] Offers criteria any system that offers reproducibility
   should have.

ANSWER:

------------------------------





57. [Reviewer 5] Yet another example of a reproducible workflows project.

ANSWER: As the newly added thorough comparisons with existing systems
shows, these set of criteria and the proof-of-concept offer uniquely new
features. As another referee summarized: "This manuscript describes a new
reproducible workflow which doesn't require another new trendy high-level
software. The proposed workflow is only based on low-level tools already
widely known."

The fact that we don't define yet another workflow language and framework
and base the whole workflow on time-tested solutions in a framwork that
costs only ~100 kB to archive (in contrast to multi-GB containers or VMs)
is new.

------------------------------





58. [Reviewer 5] There are numerous examples, mostly domain specific, and
    this one is not the most advanced general solution.

ANSWER: As the comparisons in the appendices and clarifications above show,
there are many features in the proposed criteria and proof of concept that
are new.

------------------------------





59. [Reviewer 5] Lack of context in the field missing very relevant work
    that eliminates much, if not all, of the novelty of this work.

ANSWER: The newly added appendices thoroughly describe the context and
previous work that has been done in this field.

------------------------------