[ILUG] Perl Regex Help Needed

Rory Winston rwinston at eircom.net
Sun Sep 11 16:53:30 IST 2005


Hi all

First of all, yes, I *do* realise that this is not a Perl mailing list 
per se. However, whenever I have gotten really stuck in the past and had 
to turn to this list, the combined expertise gathered here has never 
been found wanting. So apologies in advance. And yes, I have RTFM, etc. 
but I still cant figure this one out.

Consider the following - I have a single concatenated file of questions 
and answers. It looks something like this:

--- FILE 1 ---

Q 1) blah blah blah blah blah blah blah blah blah blah blah blah blah 
blah blah blah
blah blah blah blah blah blah blah blah

A 1) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb

Q 2) blah blah blah blah blah blah blah blah blah blah blah blah blah 
blah blah blah
blah blah blah blah blah blah blah blah

A 2) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb

Q 3) blah blah blah blah) blah blah blah blah blah blah blah blah blah 
blah blah blah)
blah blah blah blah blah blah blah blah

A 3) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb


I want to extract the questions and answers, and mark them up as HTML. 
Sounds simple eh? I wish.

What I really want is a regex that can extract pertinent info from a 
question/answer line (such as the question/answer
number), and then the text itself. My first attempt was:

my $trivia = do { local $/; <TRIVIA> };     # Slurp

while ( $trivia =~ m/^[QA] (\d)+\) (.*)/gm) {
    print "Matched ($1) and ($2)\n";
}

This works - sort of. The problem with the above regex is that it will 
only grab the question/answer text
to the end of the line, and not until the next question/answer 
delimiter. I guess I could add the \s flag
to capture newlines in the (.*) portion, like so:

while ( $trivia =~ m/^[QA] (\d)+\) (.*)/gms) {
    print "Matched ($1) and ($2)\n";
}

But that doesn't work either. It greedily grabs *everything*. I tried to 
coerce the (.*) match
to be not quite so greedy by adding the ? lazy operator:

while ( $trivia =~ m/^[QA] (\d)+\) (.*?)/gms) {
    print "Matched ($1) and ($2)\n";
}

But that now grabs *nothing*. At this point (having also tried some 
combinations of using the \G operator), I am
well and truly stuck. I just want to say "grab everything up until the 
next instance of a pattern that signifies a question/answer".

If anyone can help with this at all, the retro computing community will 
be very thankful!!!

Thanks
Rory





More information about the ILUG mailing list