Spelling of names

Name preferences

In the KNAB foreign place names data the principal name is always the one considered to be the endonym or the local official name. This is given in the original spelling if possible, or romanized according to internationally accepted systems. For some non-Roman languages there are also provisionally KNAB's own systems in use. See an overview of romanization systems used in KNAB.

For features that are shared by two countries and having two different names usually both name forms are given as equal, the principal name being just mechanically the name used in the country whose ISO code is alphabetically first. E.g. for features shared between Spain (ES) and Portugal (PT) the Spanish name is given as the first form. This is done solely for technical reasons and does not imply any recognition of the degree of importance each name forms may possess.

For features shared by more than two countries or having more than two different names, and also for names of features beyond national sovereignty (seas, oceans) the conventional English form is given as the principal name, with French as a parallel name.

Romanized and original-script names

In principle, all name variants in KNAB should appear at least in Roman script, as this is used for sorting and queries. Russian names for features outside the C.I.S. may appear only in Cyrillic form, these can be queried using the Russian-language query form.

Non-Roman script forms presented here are actually regenerated based on transcription and transliteration schemes that record for KNAB the original-script forms. In principle only these non-Roman script forms are presented here that have been recorded de visu, i.e. directly from sources in the original script. (This is why it has not always been possible to give original-script forms to each of the name variant.) There are, however, cases when the romanization used in some sources is judged to be sufficient to represent the original-script form, and then also the non-Roman script forms are generated. If there are errors (which unfortunately can never be totally excluded) these can occur in several stages: errors in original sources (i.e. the romanization used there is inadequate); errors in transcription or transliteration procedures; errors in conversion modules.

Encoding of names data

Since July 19, 2004 all the names data in the Internet version of KNAB are presented fully compliant to the Unicode (or ISO 10646) standard, without any conventional sequences (formatting commands, etc.) used earlier. Names that were earlier given in Roman, Greek and Cyrillic scripts, are now additionally given in other non-Roman scripts (Arabic, Chinese, Tamil, etc.). Correct presentation of these name forms in your computer screens and applications depends on the use of the latest browsers (recommended are Firefox, Netscape 7.1 or Internet Explorer 6.0) and widest Unicode-compliant fonts.

Widest ranges of Unicode characters are given in fonts, such as Arial Unicode MS or Bitstream Cyberbit. For a detailed description of Unicode and suitable fonts please refer to Alan Wood's Unicode Resources and David McCreedy's Gallery of Unicode Fonts. You will also find there links to download fonts for missing scripts. But another problem is that although Unicode characters might be present in certain fonts (e.g. Arial Unicode MS), this will not guarantee the correct presentation of the names data, as additional OpenType Layout Tables are needed to ensure correct selection of glyphs (ligatures, etc.) and their sequences. With gradual updating these problems will be overcome. The following notes are based on observations with Windows 98 II, so any newer computers are likely to perform better.

Arial Unicode MS should correctly present names data in the following scripts: Latin (i.e. Roman, incl. Vietnamese), Arabic (incl. Pashto, Persian, Uighur, Urdu), Armenian, Chinese (simplified and traditional), Cyrillic (incl. characters for Turkic languages), Devanagari (Hindi, Marathi, Nepali, etc.), Georgian, Greek (incl, polytonic), Gujarati, Gurmukhi (Panjabi), Hebrew, Japanese (default), Kannada, Korean, Tamil and Thai. The font also contains characters for Bengali, Lao, Malayalam, Oria, Telugu and Tibetan but their presentation is not correct (no OpenType tables).

Bitstream Cyberbit contains characters for the following scripts: Latin (Roman), Arabic, Chinese (default), Cyrillic, Greek, Hebrew, Japanese, Thai.

For the following languages/scripts additional fonts are needed (see the links above): Bengali, Burmese (Myanmar), Ethiopian (Amharic and Tigrinya), Unified Canadian Aboriginal Syllabics (Inuktitut), Khmer, Lao, Malayalam, Oria, Sinhala, Telugu, Thaana (Maldivian) and Tibetan. For Burmese and Sinhala it is at present almost impossible to find a publicly available Unicode font, there are difficulties also with Telugu. A good selection of fonts is presented on a page of geonames.de by Werner Fröhlich.

If for example you have set Arial Unicode as your default font for Unicode pages, you might experience problems in correctly viewing names data for Lao, Tibetan, etc., even if you have installed a font suitable for viewing these scripts and select your options for the languages to use these particular fonts. This is because the browser application will take all characters from the default font for Unicode and only if these are missing, from other available fonts. There are two complicated ways to overcome presentation problems if you are seriously interested in correctly viewing these data : 1) for the time of viewing change your default Unicode font into a font that does not contain this particular range, e.g. Times New Roman (then the browser will follow your preferences in choosing fonts for languages); 2) copy the text into other applications, like Word document and change the font there. This can also be done automatically as text portions in various scripts often contain the tag <span lang="..">.....</span>. If you replace it with e.g. <font face="....">....</font>, then your application will automatically use the fonts prescribed. NB! Some scripts are read from right to left (Arabic, Hebrew, Thaana) and for these scripts the tag also contains indication of direction (e.g. <span dir="rtl" lang="ar">); if you delete that, the names are again distorted.

The following technical remarks could be useful to those wishing to view the names data correctly, or use these data in other applications. For different reasons the Unicode coding in some cases has been applied differently from Unicode recommendations.

Latin (Roman)

Devanagari, Bengali, Gurmukhi (Panjabi), Kannada, Tamil
Chinese and Japanese