Characters actually used in a language (Ô in English), language codes (id/in for Indonesian), names (Faeroese/Faroese), differentiation (Sámi variants), the glyphs (how should dcaron look like) and the list of languages covered are all disputable. Try to look at the bright side. But if you think that some important aspect is missing or wrong, please don't hesitate to mail your comments to Indrek Hein, firstname.lastname@example.org.
There are many existing romanisation (transliteration and transcription) systems in use for both roman and non-roman scripts. This database lists only the systems that are widely used in writing geographical names, hence the abbreviation BGN/PCGN -- United States Board on Geographic Names / UK Permanent Committee on Geographical Names. There are many other existing transliteration schemes approved and used by ISO, bigger libraries, national bodies etc.
Languages codes are used in order of preference:
The following languages (some represented by romanization systems)
do not require any additional characters
to basic Latin:
Armenian, Aymara, Belarusian, Creole,
English, Fijian, Georgian, Greenlandic, Ikiribati,
Kinyarwanda, Kirundi, Kosraean,
Latin, Malay, Maldivian, Nauruan, Ndebele, Neomelanesian (Tok Pisin), Nukuoro,
Palauan, Papiamento, Pedi, Ponapean, Quechua, Sesotho, siSwati, Somali,
Soninke, Swahili, Thai, Toucouleur, Trukese, Tsonga,
Tuvaluan, Ukrainian, Woleaian, Xhosa, Zulu. The list is incomplete
and some of the forementioned languages are included in
the database as they nevertheless have a number of 'important'
characters or other possible transcription systems.
The only thing that we can be reasonably sure about
is that for Latin, the basic Latin alphabet should suffice...
so please try to forget about the use of macron over long
vowels or we have no sure things left.
Some characters for Latin and Cyrillic are not in the UCS. These characters are of two types -- they are either based on a modified shape of an existing character or a combination of a character and additional diacritical marks. All modified (and new) shapes will eventually be allocated in the UCS, characters that may be decomposed, esp. those characters that are needed only in some rare transcription schemes may not get a separate code. The general principle in this database is that every character occurring in some language's alphabet needs a separate code; combining characters are needed only for transcription schemes. Additional characters are given codes from the private use area of the UCS starting with E000.