If I want to parse a html file (might be badly formed ) and take out a list of the tags used (unique list) and the attributes of each tag used (aggregate).. What tool would you recommend? Sed or gawk might do it, but maybe there is something better out there.. Thanks Justin MacCarthy