[ILUG] PHP plus Celtic languages

kevin kevin at cybercolloids.net
Thu Aug 19 09:07:17 IST 2004


Yes, I specify

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />	
<meta http-equiv="Content-Language" content="kw"/>

Much to my surprise w3c has a content language for Cornish kw=kernewek
It seems to work OK in Mozilla and Konqueror. To continue the pedantic 
note....what is the correct code to use for "small t with cedilla"? if not 
&#355;

Kevin.



On Wednesday 18 August 2004 22:38, Brian Foster wrote:
  | From: kevin <kevin at cybercolloids.net>
  | Date: Wed, 18 Aug 2004 11:21:50 +0100
  |[ ... ]
  | Cornish uses some accents including t-cedilla in words such as
  |
  |  conveţhaz - Verb, to understand
  |
  | I can write this using codes in UTF-8 like conve&#355;haz  [ ... ]

 uh, not exactly.  “&#355;” does not (cannot)
 represent literal UTF-8 per se.  (it _is_ the
 UCS codepoint value for U+0163, which is
 “LATIN SMALL LETTER T WITH CEDILLA”, which
 apparently is the character you want.)

 I cannot recall if the “&#<dec>;” and “&#X<hex>;”
 HTML/XML entities specify UCS codepoints (i.e.,
 independent of the document's charset/encoding),
 or character values specific to the document's
 encoding.

 I presume yer document effectively specifies
 its encoding is UTF-8, in which case my bad
 memory matters less than usual:  the 163 hex
 (355 decimal) UCS value is turned into the
 correct UTF-8 byte sequence (which is the
 two hex bytes C5 A3).

pedantically cheers!
	-blf-
--
«How many surrealists does it take to    |  Brian Foster      Montpellier,
 change a lightbulb?  Three.  One calms  |  blf at utvinternet.ie      FRANCE
 the warthog, and two fill the bathtub   |    Stop E$$o (ExxonMobile)!
 with brightly-colored machine tools.»   |        http://www.stopesso.com




More information about the ILUG mailing list