[ILUG] Re: Re: sed question

Marcus Furlong furlongm at hotmail.com
Fri Aug 15 23:22:32 IST 2008


On Friday 15 August 2008 22:18 in <g84rpc$dbf$1 at ger.gmane.org>, Marcus
Furlong wrote:

> On Friday 15 August 2008 05:27 in
> <ddb467af0808142127x7bb9aff9re9a362ad5e264a80 at mail.gmail.com>, Emen Zhao
> wrote:
> 
>> Hello Marcus,
>> 
>> Try if this helps. It assumes all elements are missing ending tag, and
>> doesn't support embedded tags. If that's the case, a more sophisticated
>> script might be needed.
>> 
>> perl -0777 -wpl -e 's{(<(\w+).*?>.*?)(?=\s*(<\w|\z))}{$1. " </$2>"}esg'
>> 
>> Hope this helps.
> 
> It does, it _almost_ does what I need. It doesn't seem to handle the case
> where the tag content starts on a new line though:
> 
> <third>
> hello
> <third>hello
> 
> becomes
> 
> <third> </third>
> hello
> <third>hello </third>
> 
> I tried undef $/ (as per a different post) but that doesn't seem to help
> either. Any ideas how to fix this?

Ok this only happens if it's on the first line so I added an extra line and
it works perfectly now, thanks!

One final question for the list on the same topic..

Some of the tags contain an attribute, say "my_attribute", which according
to the DTD, should only contain certain values. If the value is not valid,
I want to remove the attribute entirely. E.g. if ASD SDF DFG FGH GHJ HJK
are the valid values for this attribute, then the following:

<third my_attribute="ASD">
<third my_attribute="AD AD">
<third my_attribute="">
<third my_attribute="HJK">

would become:

<third my_attribute="ASD">
<third>
<third>
<third my_attribute="HJK">

I threw together the following snippet, which works, but it strikes me as a
horrible hack, as I'm sure there's a perl/sed one liner that could do it.
Does anyone know how it could be done somewhat more elegantly?

# some values contain spaces so convert them to underscores
# and back again before removal
for k in `grep -o 'my_attribute=".*"' ${xml_filename} |
sed -e 's/my_attribute=//' | sed -e 's/ /_/g'` ; do
  valid=false
  # the following are the valid values that this attribute can have
  for l in ASD SDF DFG FGH GHJ HJK ; do
    if [ "${k}" == "\"${l}\"" ] ; then
      valid=true
    fi
  done
  if [ "${valid}" == "false" ] ; then
    k=`echo ${k} | sed -e 's/_/ /g'`
    sed -i -e "s/ my_attribute=${k}//" ${xml_filename}
  fi
done

Marcus.




More information about the ILUG mailing list