[ILUG] Sed question

Feargal Reilly feargal at helgrim.com
Fri Feb 22 14:32:24 GMT 2002


At 12:11 22/02/02, Padraig Brady wrote:
>Padraig Brady wrote:
>>Padraig Brady wrote:
>>
>>>Rory Winston wrote:
>>>
>>>>Hi,
>>>>
>>>>I'm trying to use sed to do the following: search through a .jsp file for
>>>>any <img> references, and then generate a bare list of the image filenames.
>>>>So a .jsp page with 3 images inline would generate an output of:
>>>>
>>>>a.gif
>>>>b.gif
>>>>c.gif
>>>>
>>>>I'm trying to do it like the following (for this example, I'm ignoring any
>>>>complications due to case and/or whitespace):
>>>>
>>>>sed -n "/img src=\"/,/\">/p" foo.jsp
>>>>
>>>>But this doesnt just print out image filenames - it prints out entire 
>>>>lines.
>>>>Has anyone done anything like this already? If anyone has any grep-based
>>>>solutions that would be great too. Correct me if I'm wrong, but is sed (and
>>>>Perl) able to handle certain types of multi-line matching that grep cannot?
>>>>
>>>>Cheers!
>>>>Rory
>>>
>>>How about:
>>>sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp' foo.jsp
>>>Padraig.
>>>
>>The script above doesn't deal correctly with multiple images
>>on the same line, the following is better:
>>cat foo.jsp |
>>sed -e 's/<[^>]*img/¬<img/g' |
>>tr "¬" "\n" |
>>sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp'
>Stephen's suggestion of not printing duplicate images is
>obviously correct, so for completeness, and removing
>the useless use of cat:
>
>sed -e 's/<[^>]*img/¬<img/g' foo.jsp | #put each <img>...
>tr "¬" "\n" |                          #on a new line.
>sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp' |
>sort -u

Only problem with this, is it'll miss out tags spanning lines, and IMG tags.
I happened to be doing a similar thing last week, here's the tcl script I 
did up for it:

#!/path/to/tclsh
set f [open [lindex $argv 0] r]
set file [read $f]
close $f
set list [split $file <]
foreach i $list {
         if {[string length $i]} {
                 regsub -all "\n" [lindex [split $i >] 0] " " tag
                 if {![string compare -nocase [lindex $tag 0] [lindex $argv 
1]]} {
                         puts $tag
                 }
         }
}

Saved as foo, usage is
./foo filename tag
It'll spit out any html tags beginning with 'tag' in 'filename', wrapping 
them onto one per line.
so ./foo foo.jps img|sed -n 's/.*src="\([^"]*\)".*/\1/gp'

Will do the trick in all cases.


>Padraig.
>
>
>--
>Irish Linux Users' Group: ilug at linux.ie
>http://www.linux.ie/mailman/listinfo/ilug for (un)subscription information.
>List maintainer: listmaster at linux.ie

Feargal Reilly.
http://www.helgrim.com/





More information about the ILUG mailing list