aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMohammad Akhlaghi <mohammad@akhlaghi.org>2018-03-09 12:39:34 +0100
committerMohammad Akhlaghi <mohammad@akhlaghi.org>2018-03-09 12:39:34 +0100
commitddd6dfd504bf2ce1bc6a4f6c3445daf2ad3a2a06 (patch)
tree87ae74bc6ca665df8c9a7195e7ddd45a842b0d6e
parenta5b6cd1173a3e0203467a6dec306d65c11da78e8 (diff)
Added tip for pipeline outputs
While doing my own project (which has grown to a processing time of about half an hour), I felt that it would be very convenient to a record of the outputs at major points also. But we don't want to bloat the pipeline by commiting PDF files or large datasets that get fully changed and are just by-products. So it occurred to me to have a separate pipeline only for outputs and after trying it out, it indeed seemds to be a good solution.
-rw-r--r--README.md25
1 files changed, 22 insertions, 3 deletions
diff --git a/README.md b/README.md
index 9df72a7..ee60b4a 100644
--- a/README.md
+++ b/README.md
@@ -549,8 +549,8 @@ been explained here), please let us know to correct it.
-Tips on expanding this template (designing your pipeline)
-=========================================================
+Usage tips: designing your pipeline/workflow
+============================================
The following is a list of design points, tips, or recommendations that
have been learned after some experience with this pipeline. Please don't
@@ -716,7 +716,7 @@ us. In this way, we can add it here for the benefit of others.
- *Keep your input data*: The input data is also critical to the
pipeline, so like the above for software, make sure you have a backup
- of them
+ of them.
- **Version control**: It is important (and extremely useful) to have the
history of your pipeline under version control. So try to make commits
@@ -739,6 +739,25 @@ us. In this way, we can add it here for the benefit of others.
results to your colleagues, you can tag the commit as `v2`. Afterwards
when you submit to a paper, it can be tagged `v3` and so on.
+ - *Pipeline outputs*: During your research, it is possible to checkout a
+ specific commit and reproduce its results. However, the processing
+ can be time consuming. Therefore, it is useful to also keep track of
+ the final outputs of your pipeline (at minimum, the paper's PDF) in
+ important points of history. However, keeping a snapshot of these
+ (most probably large volume) outputs in the main history of the
+ pipeline can unreasonably bloat it. It is thus recommended to make a
+ separate Git repo to keep those files and keep this pipeline's volume
+ as small as possible. For example if your main pipeline is called
+ `my-exciting-project`, the name of the outputs pipeline can be
+ `my-exciting-project-outputs`. This enables easy sharing of the
+ output files with your co-authors (with necessary permissions) and
+ not having to bloat your email archive with extra attachments (you
+ can just share the link to the online repo in your
+ communications). After the research is published, you can also
+ release the outputs pipeline, or you can just delete it if it is too
+ large or un-necessary (it was just for convenience, and fully
+ reproducible after all).
+