[ILUG] Editing unicode text files.
Brian Foster
blf at blf.utvinternet.ie
Sat Feb 17 10:42:04 GMT 2007
| Date: Fri, 16 Feb 2007 16:31:43 +0000
| From: "Aine Douglas" <aine.douglas at gmail.com>
|
| Can anyone recommend a commandline text editor that is capable of
| editing unicode text files?
|
| I've got some webpages to edit which contain chinese script, and when
| I open them in vi i get long strings of @@@@@^^^???@@ etc, and its a
| pain downloading them for really small edits.
I don't quite grok what it is you want to do?
First, “Unicode” is ambiguous to the point of meaningless;
what matters is the encoding, not what is encoded.
( Briefly: Every character is in the UCS (Universal
Character Set, ISO-10646, also called “Unicode”†).
A character's binary representation is an encoding.
US-ASCII, e.g., is the first 128 charaters of the UCS;
ISO-8859-1 is the first 256; ISO-8859-15 is a slightly
different set of 256; UTF-8 is all two billion; and
there are many other encodings. )
Second, how will the editor be used without downloading
the files in question?
And third, by “command line” do you mean something like
sed(1), or just an editor you can launch from the shell
(like the vi(1) mentioned?).
Editors that can handle the full UCS/Unicode in a variety
of encodings include vim(1), mined, and yudit. Some other
editors, such as joe(1), handle UTF-8 but not necessarily
an arbitrary encoding.
I've only used `vim' in anger (in several senses! ;-) ):
`vim', at least, will autodetect the file's encoding and
map it to yer locale's, and hence you can use `vim' to
edit a SJIS file on a UTF-8 system. The file is saved
in its original encoding. Almost needless to say, this
mapping works best if the system/locale uses UTF-8 (on
Linux), since UTF-8 round-trips the full UCS. Result is,
provided you are displaying UTF-8 correctly (mostly a
matter of fonts), `vim' works quite well (albeit keying
in non-keyboard characters can be a pain: I tend to use
gucharmap(1) and copy-and-paste).
cheers!
-blf-
† Pedantically, “Unicode” means three different things,
and is not a synonym for the UCS.
--
Experienced (>25 yrs) kernel/software Eng: | Brian Foster Montpellier,
• Unix, embedded, &tc; • Linux; • doc; | blf at utvinternet.ie FRANCE
• IDL, automated testing, process, &tc. | Stop E$$o (ExxonMobile)!
Résumé (CV) http://www.blf.utvinternet.ie | http://www.stopesso.com
More information about the ILUG
mailing list