[ILUG] sed question
Brian Foster
blf at utvinternet.ie
Fri Aug 15 07:35:02 IST 2008
| From: Andrew McGill <glug at lunch.za.net>
| Date: Thu, 14 Aug 2008 15:06:40 +0200
|
| On Thursday 14 August 2008 13:52:51 Marcus Furlong wrote:
| >[ ... ] I have a large number of files that are tagged as follows:
| >
| > <first id="34">
| > blah blah
| > <second id="56" name="xyz1">hello hello</second>
| > <second name="xyz4">hello hello</second>
| > <second id="16" name="xyz5">hello hello</second>
| > <first id="3">
| > blah blah blah
| > <second>hello hello</second>
| > <second id="12" name="xyz5">hello hello</second>
| >
| > The "first" tags have no closing tags at all, and may or may not have
| > text between the tag and the next tag. What I want to do is remove the
| > "first" tag and any text up to, but not including the "second" tag.
|
| Sed doesn't do multi-line search and replace [ ... ]
nonsense. you just need to append the lines into
into the pattern space first. below is a (maybe
more obscure/complex than necessary?) ‘sed’ script
which seems to do what the OP wanted:
sed -e '/<first/{
:f
s/<first.*\(<second\)/\1/
t
N
bf
}'
or as one line:
sed -e '/<first/{ :f; s/<first.*\(<second\)/\1/; t; N; bf; }'
that works even if “<first” is on the first line.
neither “<first” nor “<second” has to be at the
start of a line.
it will (probably) go wrong if there is a “<first”
someplace _after_ a “<second” on a line; and may
also go wrong if there is not a “<second” in the
file after the (last) “<first”.
AFAIK, that should work with any sed(1); i.e., it's
not limited to GNU ‘sed’.
cheers!
-blf-
--
“How many surrealists does it take to | Brian Foster
change a lightbulb? Three. One calms | somewhere in south of France
the warthog, and two fill the bathtub | Stop E$$o (ExxonMobil)!
with brightly-coloured machine tools.” | http://www.stopesso.com
More information about the ILUG
mailing list