So, you know everything about text, right?–part V

In the previous post I’ve said that we needed a small detour into cultures to conclude the linguistic string comparison section. In .NET, the CultureInfo type holds all the information regarding a specific culture. For example, any instance of CultureInfo will give the name or calendar of a specific culture. In .NET, each culture is identified by a unique name which identifies the language/country pair (if you follow this link, you’ll get all the info about how those pairs are built). In practice, this means that “pt-PT” refers to the Portuguese culture in Portugal (which is quite different from the one used, for ex., in Brazil). Here’s an example of how you might end up creating an instance of this type:

var culture = new CultureInfo("pt-PT");

Every thread has two properties which reference culture objects: CurrentUICulture and CurrentCulture. The first is used to obtain resources which are presented to the user (ex.: this object is used to load the appropriate resource file – that is, if you’re using resource files. if you’re not, shame on you!). By default, any created thread will always reference an object compatible with the language of the installed OS (in MUI, you can change the current culture through the Regional and Language Options in the control panel applet).

The second culture property (CurrentCulture) is used for all the other things which CurrentUICulture isn’t used for (ex.: number formatting, string comparing, etc.). The initial value of this property is influenced by the selected value of the Regional and Language option of the control panel applet. It’s common for both properties to reference the same CultureInfo object. This isn’t obligatory, of course…for instance, nothing prevents us from adapting the user interface to a language while formatting info according to some other culture (a good example of this is a web site which adapts its buttons and labels for different languages – or cultures – and will always format the values according to the “en-US” culture because it needs to bill in dollars). To achieve this, you’d have to set those properties to the adequate CultureInfo objects. But we’re  digressing…

For our string discussion, what matters is understanding that the CurrentCulture refers to a CultureInfo object that influences the comparison operations performed over strings. And to do this, the CultureInfo object uses a CompareInfo object which knows how to sort characters. In my mother language (which, btw, is Portuguese) there really isn’t any interesting gotcha (at least, that I can remember). However, that is not the case with German, where ß has the same value  as ss. In practice, this means that (and please pardon my lack of German knowledge, so that might not be the correct way to write football) the following snippet might get you by surprise:

var str1 = "fussbal";
var str2 = "fu\u00DFbal";
Console.WriteLine("{0} : {1}", str1, str2);//print it
Console.WriteLine( String.Compare(str1, str2, StringComparison.Ordinal) == 0 );//false
Thread.CurrentThread.CurrentCulture = new CultureInfo("de-DE"); //change culture
Console.WriteLine(String.Compare(str1, str2, StringComparison.CurrentCulture ) == 0);//true

 

As you can see, the comparison does return true after I’ve changed the CultureInfo associated with the current thread’s CurrentCulture property. The experienced reader knows that the previous code can be simplified because the Compare method will always perform a character expansion before comparing the strings (in our example, that means that ß will be replaced by ss). So, you don’t really need to change the CurrentCulture or pass the StringComparison.CurrentCulture to the Compare method. Anyways, doing that makes the intent of the code clear and you should always strive to do that.

In one of the previous paragraphs, I’ve mentioned the CompareInfo class: this class is used internally for performing the comparison between the strings. If you need more control, then you’ll be happy to know that nothing prevents you from using that class directly:

var str1 = "fussbal";
var str2 = "fu\u00DFbal";
var culture = new CultureInfo("de-DE");
Console.WriteLine(culture.CompareInfo.Compare(str1, str2) == 0);//true

 

You probably won’t be doing this often, but now you know that it exists Smile. Notice also that there are several overloads of this method which allow you to specify an offset, a length or a CompareOptions value for influencing the returned result. Since we’re talking about CompareInfo, you should also notice that it offers several interesting methods: IndexOf, lastIndexOf, StartsWith, etc.. These methods give you more control than you get by default when using the similar methods of the class String.

And I guess this sums it up: linguistic string comparisons rely on CultureInfo objects which end up delegating that work to the CompareInfo class. There’s still more to say about strings, so stay tuned for more!

Advertisements

~ by Luis Abreu on April 25, 2011.

2 Responses to “So, you know everything about text, right?–part V”

  1. Can you discuss any differences between these 2 ways of getting CultureInfo objects?

    var culture1 = new CultureInfo(“en-US”);
    var culture2 = CultureInfo.GetCultureInfo(“en-US”);

    Thanks!

  2. left one out of the previous comment, sorry:

    var culture3 = CultureInfo.CreateSpecificCulture(“en-US”);

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: