[ILUG] benefits of raw i/o

David Murphy drjolt+ilug at redbrick.dcu.ie
Wed Jun 14 13:18:13 IST 2000


Quoting <Pine.LNX.4.21.0006140309460.1255-100000 at fogarty.jakma.org>
by Paul Jakma <paul at clubi.ie>:

> serial means more or less the same as sequential (to me). but to be
> specific i mean:

> serial I/O == raw I/O == character device I/O.

That sounds like 'sequential raw I/O' to me.

> no data buffering, no filesystem, no vfs. (no vat...). Just a char
> device that plugs you straight into the device driver.

Yes, raw I/O. I think we agree with what it is. Of course, you might
have an LVM in between the application and the scsi device drivers, if
you're doing raw access to a stripe or whatever.

> I've been arguing that the metadata cache is not the cause of
> slowness,
[...]
> The slowness is in the data cache.

Yes. Never been arguing otherwise.

> From the point of view of it, it sees that within a range of blocks
> ( range*blocksize >> allowed data cache) the usage pattern is
> extremely complex (big database). In order for the data cache to
> correctly predict that usage pattern it must have unacceptably
> complex heurastics.. better then for the data cache to get
> completely out of the way -> raw I/O.

No. Data cache out of the way can be raw devices, or 'direct I/O',
that is I/O to a filesystem without caching. Direct I/O allows a
tradeoff between the performance advantages of uncached access and the
management advantages of filesystems.

> > It was developed because, while raw disks are the ultimate in
> > performance, they are more work to administer than filesystems,
> > for obvious reasons. 

> never having worked with raw I/O: in what way is it more difficult
> to maintain? i would have thought easier. You just point oracle at a
> raw I/O logical volume and forget about it until oracle starts
> telling you that it's running short, at which point you either
> extend the LV or give it a fresh lv. 

Those are LVM issues, and are solved by having an LVM. An example
filesytem issue would be backup - it's kinda awkward to do an
incremental backup of a raw device.

> > Ah, but you don't, 'cos it uses extents, not indirect blocks and
> > fragments and things, and extents can be big.

> but inside the extent you must surely still use fragments/blocks?
> with the same dereferencing overhead as always. ('cept now the
> extent is an extra layer). there must be some layer of finer grained
> access inside the extent, otherwise what happens when the VFS says
> "AcmeFS, give me these blocks"? Does the FS say "uhmm.. here's a
> nice big 256MB blob of data"

My limited understanding is that an extent consists of a starting
point, and a number of blocks, i.e., "File lotsofdata starts 98349349
blocks in, and lasts for 3843438473848347 blocks."

> eg SGI XFS is extent based (it calls them "allocation groups") , but
> afaik it still uses moer traditional metadata such as superblocks,
> blocks, indirect blocks, fragments, directories, inodes within each
> extent/group.

You're describing a UFS with some extent-like feautres. Sun have made
similar modifications to UFS in Solaris, but neither is the same as an
extent-based filesystem, where the allocation is done with extents.

-- 
When asked if it is true that he uses his wheelchair as a weapon he will reply:
"That's a malicious rumour. I'll run over anyone who repeats it."
Stephen Hawking - [http://www.smh.com.au/news/0001/07/features/features1.html]
David Murphy - For PGP public key, send mail with Subject: send-pgp-key




More information about the ILUG mailing list