[ILUG] sed question

Brian Foster blf at utvinternet.ie
Fri Aug 15 07:35:02 IST 2008


  | From: Andrew McGill <glug at lunch.za.net>
  | Date: Thu, 14 Aug 2008 15:06:40 +0200
  | 
  | On Thursday 14 August 2008 13:52:51 Marcus Furlong wrote:
  | >[ ... ]  I have a large number of files that are tagged as follows:
  | >
  | > <first id="34">
  | > blah blah
  | > <second id="56" name="xyz1">hello hello</second>
  | > <second name="xyz4">hello hello</second>
  | > <second id="16" name="xyz5">hello hello</second>
  | > <first id="3">
  | > blah blah blah
  | > <second>hello hello</second>
  | > <second id="12" name="xyz5">hello hello</second>
  | >
  | > The "first" tags have no closing tags at all, and may or may not have
  | > text between the tag and the next tag. What I want to do is remove the
  | > "first" tag and any text up to, but not including the "second" tag.
  | 
  | Sed doesn't do multi-line search and replace [ ... ]

 nonsense.  you just need to append the lines into
 into the pattern space first.  below is a (maybe
 more obscure/complex than necessary?) ‘sed’ script
 which seems to do what the OP wanted:

	sed -e '/<first/{
		:f
		s/<first.*\(<second\)/\1/
		t
		N
		bf
	}'

 or as one line:

	sed -e '/<first/{ :f; s/<first.*\(<second\)/\1/; t; N; bf; }'

 that works even if “<first” is on the first line.
 neither “<first” nor “<second” has to be at the
 start of a line.

 it will (probably) go wrong if there is a “<first”
 someplace _after_ a “<second” on a line;  and may
 also go wrong if there is not a “<second” in the
 file after the (last) “<first”.

 AFAIK, that should work with any sed(1); i.e., it's
 not limited to GNU ‘sed’.

cheers!
	-blf-
-- 
“How many surrealists does it take to    |  Brian Foster
 change a lightbulb?  Three.  One calms  |  somewhere in south of France
 the warthog, and two fill the bathtub   |     Stop E$$o (ExxonMobil)!
 with brightly-coloured machine tools.”  |       http://www.stopesso.com



More information about the ILUG mailing list