Transliteration easy way - Microsoft Transliteration Utility

If you are lucky enough :-) to have not one, but two alphabets in daily use, your regular task in programming will be transliteration - transformation of text from one script (alphabet) to another.

In Serbia, we are using Latin as well as Cyrillic alphabet (and that is not same Cyrillic as Russian one) and common task is conversion from one to another and vice-versa.

This is not too complicated request; you can easily create necessary procedures; however, there is a better way:

Microsoft Transliteration Utility (MTU) is not widely known, but very useful tool for just that purpose: transliteration. It can easily transliterate text either typed in a text box or from one file to another.

There is set of predefined translations:

  • Serbian Cyrillic to Latin / Serbian Latin to Cyrillic
  • Bosnian Cyrillic to Latin / Bosnian Latin to Cyrillic
  • Hangul to Romanization
  • Inuktitut to Romanization / Romanization to Inuktitut
  • Malayalam to Romanization / Romanization to Malayalam

You are not limited to above set; you can easily create your own translations, using Module Development Console:

Microsoft Transliteration Utility - Module Development Console
(click on image for larger version)

Creating simple textual file, you can use full power of MTU’s parsing engine: definitions of input and output characters, rules for transliteration including definitions of new states for translation state machine.

This is not the end - you can even use MTU programmatically (although please check EULA for commercial usage):

  • Add reference to MSTranslitTools.DLL (it can be found in %programfiles%\Microsoft Transliteration Utility)
  • Add using System.NaturalLanguage.Tools;
  • Current translation files (.tms) can be found in %CommonProgramFiles%\Transliteration\Modules\Microsoft\
  • Here is simple code fragment to demonstrate:
TransliteratorSpecification specification =
   TransliteratorSpecification.FromSpecificationFile("Serbian Latin to Cyrillic.tms");

Transliterator transliterator = Transliterator.FromSpecification(specification);
string rezultat = transliterator.Transliterate("Vesic.Org");

Console.WriteLine(rezultat);

3.2.2008

.Net, C#, Microsoft, i18n

Dejan VesićComments (0)

Breaking changes for language codes in KB928365, KB928366

.Net FrameworkSome security updates are not just security updates.

If you installed (or you have Automatic Update turned on) yesterday’s updates:

  • KB928365 - Security update for the .NET Framework 2.0 for Windows Server 2003, Windows XP, and Windows 2000
  • KB928366 - Security update for the .NET Framework 1.1 for Windows XP and Windows 2000

you will get security update (nice) and breaking changes (not so nice) regarding some of the languages in the framework. More precise, some of specific cultures changed their codes:

LCID Old code New code Old / New description
2074 sr-SP-Latn sr-Latn-CS Serbian (Latin, Serbia and Montenegro) /
Serbian (Latin, Serbia)
3098 sr-SP-Cyrl sr-Cyrl-CS Serbian (Cyrillic, Serbia and Montenegro) /
Serbian (Cyrillic, Serbia)
1068 az-AZ-Latn az-Latn-AZ Azeri (Latin, Azerbaijan)
1091 uz-UZ-Latn uz-Latn-UZ Uzbek (Latin, Uzbekistan)
1025 div-MV dv-MV Divehi (Maldives)
2092 az-AZ-Cyrl az-Cyrl-AZ Azeri (Cyrillic, Azerbaijan)
2115 uz-UZ-Cyrl uz-Cyrl-UZ Uzbek (Cyrillic, Uzbekistan)
7194 sr-BA-Cyrl sr-Cyrl-BA Serbian (Cyrillic) (Bosnia and Herzegovina)
5146 bs-BA-Latn bs-Latn-BA Bosnian (Bosnia and Herzegovina)
6170 sr-BA-Latn sr-Latn-BA Serbian (Latin) (Bosnia and Herzegovina)
9225 en-CB en-029 English (Caribbean)

(this Caribbean change looks very suspicious, but code says so)

Those changes will cause you problems if you have satellite assemblies for given languages in your application - after client installs update(s), those translations will simply cease to work - recompile and distribution of new ones is mandatory.

I appreciate updates but some sort of warning or information on official patch pages would be, at least, nice.

11.7.2007

.Net, i18n

Dejan VesićComments (0)