[ILUG] Re: meta http-equiv useless??

Brian Foster blf at blf.utvinternet.ie
Sun Aug 21 14:09:58 IST 2005


  | Date: Sun, 21 Aug 2005 01:13:36 -0500
  | From: greg wm <ilug at nvpf.org>
  |[ ... ]
  | > wget -ENKkrl19 -nH -w2 -owget.log http://nonviolentpeaceforce.org
  | 
  | my locale is en_IE.UTF-8, so why did wget save in latin-1 format?

 whoa!  slow down here ....

 I suspect the answer is “because wget(1) does not alter
 the charset (used for the page's contents)”.  I suspect
 this for three reasons:

  ◆ no evidence — e.g., a diff(1) listing — has been
    posted that shows wget did so.

  ◆ the wget(1) man page fails to mention any such change.

  ◆ no such change was observed in an experiment (below).

 the page was served up as Latin1 (née ISO-8859-1),
 and that was what wget saved.  the only(?) changes
 wget made were to the URLs.

 my experiment (I use a UTF-8 locale):

 using Opera, I saved a copy of what the URL
   http://nonviolentpeaceforce.org/spanish/welcome.asp
 was served as.  it's meta-equiv was iso-8850-1, and
 it used a mixture of &...; entities and literal Latin1
 characters.  and the page (file) really was Latin1.
 everything was consistent, and as expected and reported,

 then I used the above wget options (sans -r) to fetch
 that same URL.  result?  the `.orig' file was _identical_,
 and the only apparent changes in the `.html' file were the
 URLs (not exhaustively checked).  more to the point, the
 literal Latin1 characters were _identical_.

 the conclusion?  wget does not alter the charset used for
 the page's contents.  hence, lacking any diff(1) listing
 to the contrary, I'll claim this did _not_ happen in the
 original situation.  that is, any theory that wget changed
 the charset/encoding of the page's contents is incorrect.
 (I am open to correction, provided evidence is supplied.)

  | the wget manual page mentions nothing at all about character sets.

 broadly, Yes, it does not.  why should it?  presuming
 my experiment above is equivalent to what was done,
 wget does not alter the charset of the page contents.


 the Apache/server “default” charset answer is interesting.
 an authorative override is not what I call a default?!

 FWIW, the http-equiv _is_ used when / useful for viewing
 local HTML files (i.e., not served up by a server).

cheers!
	-blf-
-- 
Experienced (20+ yrs) kernel/software Eng: | Brian Foster   Montpellier,
 • Unix, embedded, &tc;  • Linux;  • doc;  | blf at utvinternet.ie   FRANCE
 • IDL, automated testing, process, &tc.   |  Stop E$$o (ExxonMobile)!
Résumé (CV) http://www.blf.utvinternet.ie  |     http://www.stopesso.com



More information about the ILUG mailing list