[ILUG] Re: Re: sed question
Marcus Furlong
furlongm at hotmail.com
Fri Aug 15 23:22:32 IST 2008
On Friday 15 August 2008 22:18 in <g84rpc$dbf$1 at ger.gmane.org>, Marcus
Furlong wrote:
> On Friday 15 August 2008 05:27 in
> <ddb467af0808142127x7bb9aff9re9a362ad5e264a80 at mail.gmail.com>, Emen Zhao
> wrote:
>
>> Hello Marcus,
>>
>> Try if this helps. It assumes all elements are missing ending tag, and
>> doesn't support embedded tags. If that's the case, a more sophisticated
>> script might be needed.
>>
>> perl -0777 -wpl -e 's{(<(\w+).*?>.*?)(?=\s*(<\w|\z))}{$1. " </$2>"}esg'
>>
>> Hope this helps.
>
> It does, it _almost_ does what I need. It doesn't seem to handle the case
> where the tag content starts on a new line though:
>
> <third>
> hello
> <third>hello
>
> becomes
>
> <third> </third>
> hello
> <third>hello </third>
>
> I tried undef $/ (as per a different post) but that doesn't seem to help
> either. Any ideas how to fix this?
Ok this only happens if it's on the first line so I added an extra line and
it works perfectly now, thanks!
One final question for the list on the same topic..
Some of the tags contain an attribute, say "my_attribute", which according
to the DTD, should only contain certain values. If the value is not valid,
I want to remove the attribute entirely. E.g. if ASD SDF DFG FGH GHJ HJK
are the valid values for this attribute, then the following:
<third my_attribute="ASD">
<third my_attribute="AD AD">
<third my_attribute="">
<third my_attribute="HJK">
would become:
<third my_attribute="ASD">
<third>
<third>
<third my_attribute="HJK">
I threw together the following snippet, which works, but it strikes me as a
horrible hack, as I'm sure there's a perl/sed one liner that could do it.
Does anyone know how it could be done somewhat more elegantly?
# some values contain spaces so convert them to underscores
# and back again before removal
for k in `grep -o 'my_attribute=".*"' ${xml_filename} |
sed -e 's/my_attribute=//' | sed -e 's/ /_/g'` ; do
valid=false
# the following are the valid values that this attribute can have
for l in ASD SDF DFG FGH GHJ HJK ; do
if [ "${k}" == "\"${l}\"" ] ; then
valid=true
fi
done
if [ "${valid}" == "false" ] ; then
k=`echo ${k} | sed -e 's/_/ /g'`
sed -i -e "s/ my_attribute=${k}//" ${xml_filename}
fi
done
Marcus.
More information about the ILUG
mailing list