[ILUG] Perl Regex Help Needed
Rory Winston
rwinston at eircom.net
Sun Sep 11 16:53:30 IST 2005
Hi all
First of all, yes, I *do* realise that this is not a Perl mailing list
per se. However, whenever I have gotten really stuck in the past and had
to turn to this list, the combined expertise gathered here has never
been found wanting. So apologies in advance. And yes, I have RTFM, etc.
but I still cant figure this one out.
Consider the following - I have a single concatenated file of questions
and answers. It looks something like this:
--- FILE 1 ---
Q 1) blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah
blah blah blah blah blah blah blah blah
A 1) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
Q 2) blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah
blah blah blah blah blah blah blah blah
A 2) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
Q 3) blah blah blah blah) blah blah blah blah blah blah blah blah blah
blah blah blah)
blah blah blah blah blah blah blah blah
A 3) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
I want to extract the questions and answers, and mark them up as HTML.
Sounds simple eh? I wish.
What I really want is a regex that can extract pertinent info from a
question/answer line (such as the question/answer
number), and then the text itself. My first attempt was:
my $trivia = do { local $/; <TRIVIA> }; # Slurp
while ( $trivia =~ m/^[QA] (\d)+\) (.*)/gm) {
print "Matched ($1) and ($2)\n";
}
This works - sort of. The problem with the above regex is that it will
only grab the question/answer text
to the end of the line, and not until the next question/answer
delimiter. I guess I could add the \s flag
to capture newlines in the (.*) portion, like so:
while ( $trivia =~ m/^[QA] (\d)+\) (.*)/gms) {
print "Matched ($1) and ($2)\n";
}
But that doesn't work either. It greedily grabs *everything*. I tried to
coerce the (.*) match
to be not quite so greedy by adding the ? lazy operator:
while ( $trivia =~ m/^[QA] (\d)+\) (.*?)/gms) {
print "Matched ($1) and ($2)\n";
}
But that now grabs *nothing*. At this point (having also tried some
combinations of using the \G operator), I am
well and truly stuck. I just want to say "grab everything up until the
next instance of a pattern that signifies a question/answer".
If anyone can help with this at all, the retro computing community will
be very thankful!!!
Thanks
Rory
More information about the ILUG
mailing list