Transliteration easy way - Microsoft Transliteration Utility

If you are lucky enough :-) to have not one, but two alphabets in daily use, your regular task in programming will be transliteration - transformation of text from one script (alphabet) to another.

In Serbia, we are using Latin as well as Cyrillic alphabet (and that is not same Cyrillic as Russian one) and common task is conversion from one to another and vice-versa.

This is not too complicated request; you can easily create necessary procedures; however, there is a better way:

Microsoft Transliteration Utility (MTU) is not widely known, but very useful tool for just that purpose: transliteration. It can easily transliterate text either typed in a text box or from one file to another.

There is set of predefined translations:

  • Serbian Cyrillic to Latin / Serbian Latin to Cyrillic
  • Bosnian Cyrillic to Latin / Bosnian Latin to Cyrillic
  • Hangul to Romanization
  • Inuktitut to Romanization / Romanization to Inuktitut
  • Malayalam to Romanization / Romanization to Malayalam

You are not limited to above set; you can easily create your own translations, using Module Development Console:

Microsoft Transliteration Utility - Module Development Console
(click on image for larger version)

Creating simple textual file, you can use full power of MTU’s parsing engine: definitions of input and output characters, rules for transliteration including definitions of new states for translation state machine.

This is not the end - you can even use MTU programmatically (although please check EULA for commercial usage):

  • Add reference to MSTranslitTools.DLL (it can be found in %programfiles%\Microsoft Transliteration Utility)
  • Add using System.NaturalLanguage.Tools;
  • Current translation files (.tms) can be found in %CommonProgramFiles%\Transliteration\Modules\Microsoft\
  • Here is simple code fragment to demonstrate:
TransliteratorSpecification specification =
   TransliteratorSpecification.FromSpecificationFile("Serbian Latin to Cyrillic.tms");

Transliterator transliterator = Transliterator.FromSpecification(specification);
string rezultat = transliterator.Transliterate("Vesic.Org");

Console.WriteLine(rezultat);

3.2.2008

.Net, C#, Microsoft, i18n

Dejan Vesić

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment