|
|
[What is Internationalization?] [Unicode] [Links] [Further notes] [References] [Chronology]
This page was written because I kept forgetting issues involving (& mislaying good web pages about) Internationalization. -- Fred Curtis.
Internationalization - sometimes abbreviated 'i18n' (i followed by 18 letters followed by n) -
Localization - sometimes abbreviated 'l10n' (l followed by 10 letters followed by n) -
Unicode (see www.unicode.org) is a standard that attempts to encompass characters from all currently used languages, several historical languages and technical / mathematical symbols. It is synonymous with the Universal Character Set (UCS) of ISO 10646 - the standards are periodically synchronised.
See the WikiPredia entry for Unicode for discussion on the advantages and disadvantages of unicode.
Most European languages are written horizontally left-to-right (LTR) with rows running top-to-bottom (TTB). Hebrew, Arabic & Farsi are written right-to-left (RTL) with rows read TTB. Traditional Japanese and Chinese are written vertically TTB with columns running RTL; both are now commonly written horizontally LTR/TTB. Older Mainland Chinese and Mongolian scripts are written vertically but the columns are read LTR. [More on writing directions; references].
Unicode caters for horizontal LTR and RTL scripts, and regards the vertical rendering of scripts as a formatting style. Sometimes the text directions are mixed, e.g. in an English (LTR) document quoting an Arabic (RTL) phrase or vice-versa, or in Arabic when digits in a number are written LTR. Unicode copes with these mixtures by listing characters in logical order (the order in which they should be read), using special control characters to indicate LTR / RTL rendering on a page -- see "Bidirectional Behaviour" in section 3 of the Unicode standard.
Etemad[3] has an excellent discussion on vertical/horizontal text layout.
In Unicode, a character may be represented visually in a variety of context-dependent glyphs, e.g.:
Several unicode characters may combine to yield a single glyph. The character LATIN A followed by the character COMBINING DIAERESIS is rendered "Ä". The Unicode encodings of Devanagari and Arabic are quite complex to render.
ANSI/POSIX locales
X/Open language independent messages
General:
HTML:
Win32:
Fonts:
Ancient Numidian scripts and traditional (Mangyan) Philippine scripts like Hanunoo were written vertically bottom-to-top (BTT) with columns running left-to-right (LTR). Several ancient languages flirted with with alternating horizontal RTL / LTR rows (called boustrophedon, Greek for "turning like [plowing] oxen"), but seem invariably to have settled into either RTL or LTR. Mayan was written in paired vertical columns reading from left to right and top to bottom in zigzag pattern. Ogham was carved border-like around the edges of rocks and doorways. Foucault[1] refers to "Mexicans" writing in spirals. Gaur[2] lists many writing directions, but provides no corresponding examples of scripts for each direction.
Books written in Asian RTL vertical scripts and in Hebrew / Arabic RTL horizontal scripts have pages ordered in reverse to the common European style, so, e.g., the front cover of such a book is where an English reader would expect to find the back cover; another layout issue affected by writing direction is the order in which panels in a cartoon book are read.
There's an excellent overview of writing systems at www.omniglot.com and world languages at www.ethnologue.com.
| [1] | There is a reference [which I haven't checked!] to "Mexicans" writing bottom-to-top and in spirals around pp. 34-7 of Michael Foucault's essay The Order of Things: An Archaeology of the Human Sciences (trans. Alan Sheridan, NY: Vintage, 1970). |
| [2] | Gaur, Albertine. History of Writing. British Library, 1984, pp. 52-54. |
| [3] | Elika J. Etemad, Discussion paper: Robust Vertical Text Layout. This document is now a Unicode Technical Note [PDF only at 2005-07-08; Here is an earlier HTML draft] |
|
|
| ||||||||||