[ILUG] Slightly Off Topic : Concurrent Access to a Text File in a
bash script - can I enforce thread safety
Oisin Kim
oisinkim at gmail.com
Sat Aug 4 18:45:26 IST 2007
Hi All,
apologies for the significant delay in responding, we delivered late!
I got some fantastic advice on this topic from ilug.
In the final solution I used lockfiles, and since the number of files
I was dealing with wasn't so great, there was only a small performance
hit.
Padraig's idea to sort the files gave significant performance
improvement (double figure % differences) results on development
boxes, where there was only one disk, and less when connected to NFS
which used a SAN behind the scenes.
Thanks to everyone for their ideas and help.
Cheers,
Oisin
On 7/13/07, Oisin Kim <oisinkim at gmail.com> wrote:
> Thanks for the responses, I promise I'll update all with benchmarking
> results for Pádraig's inode sorting suggestion.
>
> I'll have a good read of the links Pádraig gave too and give me 2c on it.
>
> Thanks all,
>
> Cheers,
> Oisin
>
>
> On 7/13/07, Pádraig Brady <P at draigbrady.com> wrote:
> > Efficiently checksumming files is something I've thought a bit about¹
> > The biggest bottleneck I've found is disk head seeking, so to
> > minimise that, the handiest thing I've found is to sort by inode
> > (sorting by path is nearly as efficient). 1 modern CPU should be more
> > than enough to checksum data as fast as most disks can throw at it.
> >
> > Also you do not want the overhead of starting a cksum process per file.
> >
> > As a first pass can you compare the running speed of the following:
> >
> > find . -maxdepth 1 -type f -printf "%i\t%f\n" |
> > sort -k1,1n |
> > cut -f2 |
> > tr '\n' '\0' |
> > xargs -r0 cksum
> >
> > Now for multiple spindles it would be worth having multiple
> > checksum processes (especially if you have multiple CPUs).
> > So to answer your original question, how do you syncronize
> > writes to a single file in this case?
> >
> > Well when you open a file with O_APPEND set (as the shell
> > does when you `>> file`), on each write, the file offset
> > private to each process is automatically set to the current
> > file size. All you have to worry about is that cksum does
> > not write a partial line the whole way to the kernel
> > before it scheduled. I think this is OK, but I leave it
> > as an exercise for the reader to verify there are
> > no issue with buffering²
> >
> > ¹ http://www.pixelbeat.org/fslint/
> > ² http://www.pixelbeat.org/programming/stdio_buffering/
> >
>
>
More information about the ILUG
mailing list