Practical International Data Management - Casing

Casing within languages

The rules of casing varies between languages. In German, for example, nouns begin with an upper case letter. In Celtic language there may have upper cased letters within words, for example McDonaldn, O'Fearghail, ni'Fearghail, Br na gCrann. Propositions in many languages start with lower-cased letters. In some languages too, thoroughfare types such as rue, avenue, via are written with a lower-case first letter.

Note that some diacritic marks only apply to lower-case letters, so if data is collected in all upper case those diacritic marks can never be correctly added when data is translated to mixed-case. In some languages, such as French, it is usual to write upper-case words without diacritics at all, thought they must be provided in the lower-case versions of those same words.

In some languages dipthongs which are typographically formed from two letters are alphabetically a single letter. This, in Dutch, ij is a single letter, so when capitalised at the start of the work, both letters need processing: IJssel. In Welsh, these apparent combinations are all single letters: ch, dd, ff, ll, ng, ph, rh, th. This a personal name originating from Welsen may be written with a lower-cased first letter: ffion, ffiske.

Casing of Personal Names

Because of the complexity and variety of personal names, they should be collected correctly cased (in mixed case, not in upper case) and not post-processed to make any casing changes. Assuming, for example, that names always start with a capital letter and that all letters after the first one are always in lower case are incorrect assumptions. Collect in mixed case as an upper case version can always be created from a mixed case name - the reverse is not true.

Casing of Postal Addresses

Components within postal addresses, such as thoroughfare types, can be processed to correct case. In some languages the thoroughfare type is written with an upper-case first letter (Street, Stra├če), in others with a lower-case firs letter (rue, via, avenida).



