diff options
author | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2018-03-09 12:39:34 +0100 |
---|---|---|
committer | Mohammad Akhlaghi <mohammad@akhlaghi.org> | 2018-03-09 12:39:34 +0100 |
commit | ddd6dfd504bf2ce1bc6a4f6c3445daf2ad3a2a06 (patch) | |
tree | 87ae74bc6ca665df8c9a7195e7ddd45a842b0d6e | |
parent | a5b6cd1173a3e0203467a6dec306d65c11da78e8 (diff) |
Added tip for pipeline outputs
While doing my own project (which has grown to a processing time of about
half an hour), I felt that it would be very convenient to a record of the
outputs at major points also. But we don't want to bloat the pipeline by
commiting PDF files or large datasets that get fully changed and are just
by-products. So it occurred to me to have a separate pipeline only for
outputs and after trying it out, it indeed seemds to be a good solution.
-rw-r--r-- | README.md | 25 |
1 files changed, 22 insertions, 3 deletions
@@ -549,8 +549,8 @@ been explained here), please let us know to correct it. -Tips on expanding this template (designing your pipeline) -========================================================= +Usage tips: designing your pipeline/workflow +============================================ The following is a list of design points, tips, or recommendations that have been learned after some experience with this pipeline. Please don't @@ -716,7 +716,7 @@ us. In this way, we can add it here for the benefit of others. - *Keep your input data*: The input data is also critical to the pipeline, so like the above for software, make sure you have a backup - of them + of them. - **Version control**: It is important (and extremely useful) to have the history of your pipeline under version control. So try to make commits @@ -739,6 +739,25 @@ us. In this way, we can add it here for the benefit of others. results to your colleagues, you can tag the commit as `v2`. Afterwards when you submit to a paper, it can be tagged `v3` and so on. + - *Pipeline outputs*: During your research, it is possible to checkout a + specific commit and reproduce its results. However, the processing + can be time consuming. Therefore, it is useful to also keep track of + the final outputs of your pipeline (at minimum, the paper's PDF) in + important points of history. However, keeping a snapshot of these + (most probably large volume) outputs in the main history of the + pipeline can unreasonably bloat it. It is thus recommended to make a + separate Git repo to keep those files and keep this pipeline's volume + as small as possible. For example if your main pipeline is called + `my-exciting-project`, the name of the outputs pipeline can be + `my-exciting-project-outputs`. This enables easy sharing of the + output files with your co-authors (with necessary permissions) and + not having to bloat your email archive with extra attachments (you + can just share the link to the online repo in your + communications). After the research is published, you can also + release the outputs pipeline, or you can just delete it if it is too + large or un-necessary (it was just for convenience, and fully + reproducible after all). + |