[ILUG] RAID, huge filesystems and data mining.

Ronan Cunniffe rcunniff at stp.dias.ie
Tue Jul 13 14:59:36 IST 2004


Hi all,

   Prompted by the "why RAID" discussion, I want to see what
ILUGgers think of the following data-mining challenge, and my current
sorta idea for solving it.  It's not *my* problem, but it's an interesting
one.

   Large (1-2TB, scaling soon x10 or thereabouts) proprietary
(multi-owner) data corpus, made up of many (thousands at least) of
separate datasets.

   You are holding this data, and mediating access to it for an arbitrary
number of dataminers.  Each user has a very definite set of access
permissions, and it's not a regular pattern (i.e. there's no easy way of
splitting the problem).
   A data-mining run is going to involve 0.1 to 0.5 TB.

   This is (AFAIK) going to run on Red Hat 9, or possibly Fedora or
something more recent.


My best idea about this was to store the whole thing on a single
filesystem, and use hard links to generate "views" onto the dataset for
each user.

The plus side is that setting up a view takes a few seconds at most
(compared to thousands of seconds for 100-500GB).

The downside is that the whole thing would need to be on a single
filesystem, rather than the usual bolt-on-a-few-more-mount-points
solution.

If anyone has a few bored braincells to throw at this problem, fire away.

Ronan

P.S.  Q: "Why RAID?"
      A: "How *else*?"



More information about the ILUG mailing list