EVS 8:2000 ESET1

5. Repertoire of Latin letters used in Estonia (ESET1)

ESET1 defines a character repertoire that is used in all databases and official documents issued in Estonia. ESET1 is a coded character set on implementation level 1, conformant to ISO/IEC 10646 specifications i.e. this character repertoire does not include combining diacritical marks and all characters in ESET1 are mapped to ISO/IEC 10646. ESET1 is a fixed character set, i.e. the character repertoire will no be expanded.

ESET1 (242 characters) can formally be viewed as consisting of ISO 646 IRV (character codes <0020>..<007E>, including all the characters 'A'..'Z', 'a'..'z' of the basic latin alphabet) and supplementary set of 147 latin characters. ESET1 can also be divided into letters (191+2) and additional characters. Taken into account the intended scope of ESET1, the characters apostrophe <0027> and quotation marks <0022> are not considered to be punctuation marks but rather belonging to letters. Some other characters may also occur in personal names and placenames eg. hyphen <002D>, full stop <002E>, space <0020>, slash <002F>, comma <002C>, placenames may contain brackets and digits. The list is presented in Appendix A. ESET1 is a subset of Windows Glyph List 4 (WGL4) and all the characters can be found in at least one of ISO 8859-1, 2, 9 or 13 code page.

This repertoire of Latin letters has a strictly limited scope--it is used to encode personal names and placenames. In all cases when a personal name containing characters not in ESET1 must be entered into a database or registry, the necessary simplifications are decided upon on an individual basis taking into account the originating language and existing latinization schemes for this language. In all such cases the original form in latin script is also entered into the database for future use using one of the methods described below.

ESET1 does not specify how the data should be entered or coded internally. It is recommended not to set an upper limit to the possible length of a personal name. Such restriction may inhibit to encode personal names of unpredictable length, containing many characters not in the basic latin alphabet. For encoding schemes that use varying number of bytes to represent a character (e.g. UTF-8 or encoding in angle brackets) it is also important that such sequences of bytes denoting one character be kept together as a whole, i.e. these sequences must not be broken e.g. to continue on an additional database field or to insert a hyphenation point.

All ESET1 applications must support at least the following method of encoding:

All characters in the basic latin alphabet and the additional characters in ISO 646 IRV should not be encoded.
Any other character in ESET1 may be represented by it's hexadecimal ISO/IEC 10646 code between angle brackets. The code must consist of exactly four digits and/or letters A-F in either upper or lower case. For example names Väino, Õnne and Vjateslav may be written as V<00E4>ino, <00d5>nne and Vjat<0161>eslav.

If this notation is used, the following restrictions apply:

Applications that use code page ISO 8859-15, only characters outside this code page must be encoded. For characters in ISO 8859-15, 8-bit encoding is allowed.
Applications that use code pages ISO 8859-1, ISO 8859-13, CP1252 (Windows 'Western') or CP1257 (Windows 'Baltic'), the following characters may be encoded as 8-bit entities: 'Õ', 'õ', 'Ä', 'ä', 'Ö', 'ö', 'Ü', 'ü'. For all other characters, the notation in angle brackets must be used.
Applications that use any other code page must encode all characters in angle brackets.

Other encoding methods may also be used by mutual agreement. UTF-8 is recommended.

ESET1 is sufficient to write at least the following European languages: Albanian, Bulgarian, Estonian, Greenlandic, Spanish, Dutch, Horvatian, Irish, English, Icelandic, Italian, Lithuanian, Latvian, Macedonian, Norwegian, Polish, Portuguese, French, Swedish, Romanian, German, Serbian, Slovak, Slovene, Finnish, Danish, Czech, Turkey, Hungarian.

ESET1 is only partly sufficient to write at least the following European languages: Livonian, Maltese, Sami, Gaelic.