[ILUG] Multi line regular expression help

Marcus Furlong furlongm at gmail.com
Fri Dec 2 07:55:55 GMT 2011


On Fri, Dec 2, 2011 at 07:19, Kingsley G. Morse Jr. <kingsley at loaner.com> wrote:
> Hi Marcus,
>
> You're very welcome.
>
> I'll try to help with your other questions, before
> giving you an updated script.
>
> 1.) I'm not certain that I know exactly which
>    extra spaces and dashes you alluded to, but my
>    guess is that appending
>
>        | sed 's/[ -]*">/">/'
>
>    to the end of the pipe gets rid of them.

Yep, it did.

> 2.) You can specify only two of the [A-Z] characters
>    with
>
>        [A-Z]{2}

Didn't know this syntax before, it'll definitely come in handy. Opted
for specifying all valid languages in the end, like so:

(BG|CS|DA|DE|EL|EN|ES|ET|FI|FR|GA|HU|IT|LT|LV|MT|NL|PL|PT|RO|SK|SL|SV)


> 3.) My understanding is that the -E option is an
>    undocumented alternative to -r. Both let sed
>    use extended regular expressions.

Yep, using -r on the older version of sed worked a treat.

> Here's an updated script...
>
> #!/bin/bash
>
> echo "<tag>(FR) text
>
> <tag> - (FR) text
>
> <tag> (FR)
> text
>
> <tag>
> (FR) text
>
> <tag>
>  - (FR) text
>
> <tag>
> othertext - (FR) text
>
> <tag>othertext - (FR) text
>
> <tag>othertext -
> (FR) text" | sed -r -n '/>/{N; s/\n//; s/>(.*)\(([A-Z]{2})\)/ language="\2" attribute="\1">/g;p;}' | sed -e 's/> \?/>\n/g' | sed 's/[ -]*">/">/'
>
>
> OK?

Perfect! With a few minor modifications I've got it doing exactly what
it should. Thanks again for all your help!

Marcus.
-- 
Marcus Furlong


More information about the ILUG mailing list