[ILUG] Sed question

Brian Foster blf at utvinternet.ie
Fri Feb 22 17:01:35 GMT 2002


  | Date: Fri, 22 Feb 2002 14:10:05 +0000
  | From: Padraig Brady <padraig at antefacto.com>
  | 
  | Stephen_Reilly at dell.com wrote:
  | > <"
  | > sed -e 's/<[^>]*img/¬<img/g' foo.jsp | #put each <img>...
  | > tr "¬" "\n" |                          #on a new line.
  | > sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp' |
  | > sort -u
  | > ">
  | > 
  | > hmmm, guess I better stop calling files "¬" ...
  | 
  | true. I wouldn't have to do it if sed recognised c escapes
  | like I mentioned previously.

if you are using a Bourne-ish shell (e.g., sh, ksh, bash, ...)
then to insert a newline (before each `foo' in the following
example), you can do (sans the indentation):

   sed -e 's/foo/\
   foo/g'

other shells with obnoxious quoting rules are exercises best
left to the reader ....

b.t.w., sed(1) does recognize \n for newline in REs; without
which, the hold space can be awkward to use in some cases.
I haven't tried, but I suspect the above IMG problem _might_
be solvable in one sed command, even with multiple IMGs on
one line.  harder, however, might be the SRC on a separate
line from its IMG (which I _think_ is legal HTML).

for your amusement, here's a little bash(1)/sed script that
I threw together a few days ago to solve a stupid little
format conversion problem.  much to my embarrassment, it
took me three tries to get it right ....  ;-(

=====(cut here and below)=====:fixup=====(cut here and below)=====
#!/bin/bash
#
# Copyright © 2002 Brian L Foster.  All rights reserved.
# $Id: :fixup,v 1.1 2002/02/19 17:32:52 blf Exp $
#
case $# in
2)	tex=$1
	raw=$2
	;;
1)	tex=$1
	raw=/dev/stdin
	;;
*)	echo "Usage: $0 source.tex [ dvi2tty.raw ]" >&2
	exit 2
	;;
esac

	# The inner sed(1) script transforms LaTeX  \textsc{word}
	# into the sed command                      s/\<word\>/WORD/g
	# which the outer sed executes.  The inner sed script hence
	# reads the original LaTeX input source, whilst the outer
	# script reads the dvi2tty(1) conversion of that source,
	# writing to stdout a modified version of the conversion.
	#
sed -e 's/IRL£/IEP/g'	\
    -e 's/unix/Unix/g'	\
    -e "$(
	cat -- "$tex" | tr ' \t' '\n\n' | \
		sed -n -e '/\\textsc{\([A-Za-z0-9]\{1,\}\)}/{
			s/^.*\\textsc{\([A-Za-z0-9]\{1,\}\)}.*$/\1/
			h
			y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
			x
			s,^.*$,s/\\<&\\>/,
			G
			s,\n\(.*\)$,\1/g,
			p
		}' | sort | uniq
 	)" -- "$raw"
=====(cut here and above)=====:fixup=====(cut here and above)=====

b.t.w., there's at least one spurious backslash in the above.

cheers!
	-blf-
--
 Innovative, very experienced, Unix and      | Brian Foster    Dublin, Ireland
 Chorus (embedded RTOS) kernel internals     | e-mail: blf at utvinternet.ie
 expert looking for a new position ...       | mobile: (+353 or 0)86 854 9268
  For a resume, contact me, or see my website  http://www.blf.utvinternet.ie




More information about the ILUG mailing list