From 97b511e71f4f63dfe2551460a6d3a04729948613 Mon Sep 17 00:00:00 2001 From: Mohammad Akhlaghi Date: Tue, 29 Oct 2019 13:23:42 +0000 Subject: Suggestion on good usage of /dev/shm in README-hacking.md When you are working with large files and there is some good RAM in the system (large/powerful computers), it is beneficial to work in the shared memory directory and not the actual persistent storage (like HDD or SSD). With this commit, a fully working demo has been added to `README-hacking.md' (under the tips of "Make programming") to show how to effectively work in situations like this. --- README-hacking.md | 63 +++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 57 insertions(+), 6 deletions(-) (limited to 'README-hacking.md') diff --git a/README-hacking.md b/README-hacking.md index 338f03a..9ecda33 100644 --- a/README-hacking.md +++ b/README-hacking.md @@ -913,12 +913,13 @@ for the benefit of others. - *Large files*: If you are dealing with very large files (thus having multiple copies of them for intermediate steps is not possible), one - solution is the following strategy. Set a small plain text file as - the actual target and delete the large file when it is no longer - needed by the project (in the last rule that needs it). Below is a - simple demonstration of doing this. In it, we use Gnuastro's - Arithmetic program to add all pixels of the input image with 2 and - create `large1.fits`. We then subtract 2 from `large1.fits` to create + solution is the following strategy (Also see the next item on "Fast + access to temporary files"). Set a small plain text file as the + actual target and delete the large file when it is no longer needed + by the project (in the last rule that needs it). Below is a simple + demonstration of doing this. In it, we use Gnuastro's Arithmetic + program to add all pixels of the input image with 2 and create + `large1.fits`. We then subtract 2 from `large1.fits` to create `large2.fits` and delete `large1.fits` in the same rule (when its no longer needed). We can later do the same with `large2.fits` when it is no longer needed and so on. @@ -938,6 +939,56 @@ for the benefit of others. possible to greatly simplify this repetitive statement and make the code even more readable throughout the whole project. + - *Fast access to temporary files*: Most Unix-like operating systems + will give you a special shared-memory device (directory) called + `/dev/shm`. This directory is actually in your RAM, not in your + persistance storage like the HDD or SSD. Reading and writing from/to + the RAM is much faster than persistant storage, so if you have enough + ram available, it can be very beneficial for large temporary + files. You can make random file names in `/dev/shm` (that don't exist + at the time they are created) with the `mktemp` command and use text + files as targets to keep the temporary name (as described in the item + above under "Large files") for later deletion. For example this fully + working Makefile (which you can actually put in a `Makefile` and run + if you have an `input.fits` in the same directory). + ``` + .ONESHELL: + .SHELLFLAGS = -ec + all: mean-std.txt + shm-template = /dev/shm/$(shell whoami)-XXXXXXXXXX + large1.txt: input.fits + out=$$(mktemp $(shm-template)) + astarithmetic $< 2 + --output=$$out.fits + echo "$$out" > $@ + large2.txt: large1.txt + input=$$(cat $<) + out=$$(mktemp $(shm-template)) + astarithmetic $$input.fits 2 - --output=$$out.fits + rm $$input.fits $$input + echo "$$out" > $@ + mean-std.txt: large2.txt + input=$$(cat $<) + aststatistics $$input.fits --mean --std > $@ + rm $$input.fits $$input + ``` + The important point here is that the template has no suffix. So you + can add the suffix corresponding to your desired format. But more + importantly, when `mktemp` sets the random name, it also checks if no + file exists with that name and creates a file with that exact name at + that moment. So at the end of each recipe above, you'll have two + files in your `/dev/shm`, one empty file with no suffix one with a + suffix. The role of the file without suffix is just to ensure that + string of random characters will not be set by other calls to + `mktemp` and it should be deleted with the file containing a + suffix. This is the reason behind the `rm $$input.fits $$input` + command above: to make sure that first the file with a suffix is + deleted, then the core random file (note that when working in + parallel on powerful systems, other things can happen even in the + time between deleting two files of a single `rm` command). When using + this template, you can put the definition of `shm-template` in + `reproduce/analysis/make/initialize.mk` to be usable in all the + different Makefiles of your analysis. + - **Software tarballs and raw inputs**: It is critically important to document the raw inputs to your project (software tarballs and raw -- cgit v1.2.1