In-Depth

Multilevel Sorting with IComparable and IComparer

IComparable and IComparer sound the same and work in similar ways, but there are important differences you need to know.

A common and relatively simple coding task in C# is to sort an array or List collection of objects. You can do this by using either the IComparable or IComparer interface in conjunction with the Sort method. Although there's plenty of technical information available on these two interfaces, the documentation doesn't provide much guidance for \when to use each interface. In this article I'll describe when and how to use IComparable and IComparer and describe a practical pattern for coding multilevel sorting comparisons.

There are three main sorting scenarios you'll likely encounter.

Scenario 1: You're designing a class and want one default sort order and no additional sort orders. You should derive the class from the IComparable interface and code the CompareTo method directly inside the class.

Scenario 2: You're designing a class and want two or more sort orders but don't want a default sort order. Don't directly derive the class from either IComparable or IComparer. For each sort order you should code a Compare method inside a nested subclass that inherits from the IComparer interface.

Scenario 3: You're designing a class and want a default sort order, along with one or more additional sort orders. You should derive the class from the IComparable interface, code CompareTo inside the class, and code Compare methods inside nested subclasses that inherit from IComparer for each non-default sort order.

Although there are many other object sorting scenarios, if you understand how to deal with these three main scenarios you'll be able to handle any sorting situation.

Scenario 1: A Single Default Sort Order
As always, the best way to explain something is to present sample code. Let's start with Scenario 1, where you just want a single default sort order. Consider the class definition outlined in Listing 1.

The code in Listing 1 defines a simple City class with a name field (such as "boston"), a population field (in millions, such as 1.2), and a region field which can be either "north," "south," "east" or "west." To implement a default sort order for a class, you should derive the class from the IComparable interface as shown on the first line of Listing 1. The IComparable interface contract requires that you implement its CompareTo method. The static methods RegionCompare and RegionValue are helper methods for the CompareTo method.

Imagine that the CompareTo method has been coded to define an ordering so that city objects are sorted first by population (ascending -- that is, low to high); if populations are equal, then by region (where "north" comes before "south" comes before "east" comes before "west"); if populations and regions are equal, then by name (ascending alphabetical order). The City class could be used as shown in Listing 2.

In Listing 2, a List of City objects is instantiated and then six City objects are added to the List collection. The first foreach loop would display the city objects in the order in which they were added, San Diego first through Houston sixth. The List.Sort method is called without any parameters. Behind the scenes the Sort method would use the default ordering defined in the CompareTo method. The display produced by the second foreach loop would be:

boston     1.0 east
chicago    1.2 north
nashville  1.2 south
san diego  1.2 west
houston    1.7 west
phoenix    1.7 west

Notice that cities are listed first by population, then when populations are equal by region where north precedes south, precedes east, precedes west, and if both populations and region are equal the cities are listed in order by name.

The code for the City class constructor is:

public City(string name, double population, string region)
{
  this.name = name;
  this.population = population;
  this.region = region;
}

To keep the size of my code small and to keep the main ideas of multilevel sorting clear, I'm omitting normal error checking, such as checking input parameter values for null. The ToString method is nothing fancy:

public override string ToString()
{
  return this.name.PadRight(10) +
    " " + this.population.ToString("F1") +
    " " + this.region;
}

The CompareTo method is shown in Listing 3.

The CompareTo method accepts a single parameter, which is the same type as the class in which it's defined (a City object in this case) and compares relative to the this object. The method must return 0 if this and other are the same with regard to sort order; return a value greater than 0 (usually +1) if the this object comes after the other object in sort order; and return a value less than 0 (usually -1) if the this object comes before the other object.

Recall that I want to order first by population (ascending), then by region (where north comes before south, then east, west), then by name (ascending). The pattern I use in CompareTo is to first check to see if name, population and region are all the same. In general, there are three common ways to check the orders of class fields:

  • For string fields you can use the String.Compare method to check for alphabetical order.
  • For numeric fields you can use the ==, > and < operators.
  • For string fields that have a non-alphabetical order, a good strategy is to write a pair of helper methods such as the RegionCompare and RegionValue methods shown in Listing 1.

After checking for object equality on all fields, I check for situations where the this object comes after the other object. First I check population, then if that field is equal I check region, then if those two fields are equal I check name. If none of those checks is true and triggers a return 1 statement, I know that the this object must come before the other object and so I return -1.

In general, if you have a class with n fields and you use the pattern I've described here, the CompareTo method will have one return statement for equality, plus n statements to check for this greater than other to return +1, plus a final return -1 statement for a total of n+2 return statements.

The RegionCompare helper method looks like this:

static int RegionCompare(string regionA, string regionB)
{
  if (RegionValue(regionA) == RegionValue(regionB))
    return 0;
  else if (RegionValue(regionA) > RegionValue(regionB))
    return 1;
  else
    return -1;
}

And the RegionValue helper method looks like so:

static int RegionValue(string region) // smaller listed first
{
  if (region == "north") return 2;
  else if (region == "south") return 4;
  else if (region == "east") return 6;
  else return 8; // "west"
}

Comparing string fields in a non-alphabetic way is rather annoy-ing. The RegionValue method assigns an arbitrary integer value to each possible region value. I use values of 2, 4, 6 and 8, but I could've used pretty much any values as long as the string value that comes first in sort order (north in this case) has the lowest integer value, the string that comes second has the next-largest integer value and so on.

Methods like RegionValue, which map strings to integers, are relatively slow and a good reason to use enum types instead of strings, or to encode such information as integers.

Scenario 2: More than One Sort Order
To define a class that can be sorted in two or more ways and that has no default ordering, you should use the IComparer interface.

Notice that the City class doesn't directly inherit from either IComparable or IComparer. The City class in Listing 4 has three sort order routines, each of which is defined in a nested subclass. Each nested subclass inherits from the IComparer interface and implements a Compare method that defines a sort order.

Sorting a collection of City class objects as defined in Listing 4 could look like:

List<City> list = new List<City>();

list.Add(new City("san diego", 1.2, "west"));
list.Add(new City("phoenix", 1.7, "west"));
(etc.)

list.Sort(new City.SortByNameAscending());

foreach (City c in list) { Console.WriteLine(c.ToString()); }

The List.Sort method is passed an instance of one of the subclasses that defines a sort order, SortByNameAscending in this case; this ordering is used to sort the list collection. If the Compare method in SortByNameAscending is coded so that City objects are ordered solely by name in alphabetical order without regard to population or region, the output produced by the foreach statement would be:

boston  1.0 east
chicago 1.2 north
houston 1.7 west
nashville 1.2 south
phoenix 1.7 west
san diego 1.2 west

The SortByNameAscending subclass is defined as:

public class SortByNameAscending : IComparer<City>
{
  public int Compare(City c1, City c2)
  {
    return String.Compare(c1.name, c2.name);
  }
}

Unlike IComparable CompareTo, which accepts one parameter, the IComparer Compare method accepts two parameters.

Compare returns a value of 0 if the two parameters are equal with regard to sort order, a value greater than 0 (usually +1) if the left parameter comes after the right parameter in sort order, and a value less than 0 (usually -1) if the left parameter comes before the right parameter in sort order. I could've coded the Compare method like so:

if (c1.name == c2.name) return 0;
else if (c1.name > c2.name) return 1;
else if (c1.name < c2.name) return -1;

But because this is exactly how String.Compare works, I can just call String.Compare.

The SortByNameDescending subclass looks like this:

public class SortByNameDescending : IComparer<City>
{
  public int Compare(City c1, City c2)
  {
    return -1 * String.Compare(c1.name, c2.name);
  }
}

Returning -1 times an ascending order statement is a standard trick when coding a descending order method. The SortByPopulationName subclass is given in Listing 5.

The SortByPopulationName subclass houses a Compare method that orders first by city population (ascending), then by city name (alphabetical order). You can implement as many non-default sort-ordering subclasses as you wish. For a class with three data fields, if using both ascending and descending orders on all fields there are a total of 78 distinct sort orderings.

Scenario 3: Multiple Sort Orders and a Default Order
If you want to define a class that has a default sort ordering, and also one or more additional sort orders, you should inherit the class from the IComparable interface, implement CompareTo in the class, and define nested subclasses, each of which inherits from IComparer and implements the Compare method. The outline for a City class that has both a default sort order and three non-default sort orders is shown in Listing 6.

There's no new code in Listing 6 that didn't appear in the Scenario 1 and Scenario 2 code. Essentially, the code for the two scenarios is simply combined.

I did make one small organizational change: I created a wrapper subclass named SortRoutines to hold the non-default sort order subclasses. As a rule of thumb, if my class has three or more sort subclasses, I create a wrapper class for modularity and improved code readability. Using the City class defined in Listing 6 could look like this:

List<City> list = new List<City>();
list.Add(new City("san diego", 1.2, "west"));
(etc.)

list.Sort();
foreach (City c in list) { Console.WriteLine(c.ToString()); }

list.Sort(new City.SortRoutines.SortByNameDescending());
foreach (City c in list) { Console.WriteLine(c.ToString()); }

After instantiating and loading the list of City objects, you can use the default sort ordering by calling the Sort method without an argument, or you can use one of the non-default sort orders by passing an instance of the appropriate subclass to Sort.

Putting It Together
You've seen examples of how to deal with three common object sorting scenarios: a default sort order, multiple sort orderings with no default order, and multiple sort ordering plus a default order. Understanding these three scenarios will allow you to handle other scenarios.

Consider Scenario 1 again. The code example derived a City class from the IComparable interface and implemented the required CompareTo method inside the class. A perfectly reasonable alternative is to define a single subclass that inherits from IComparer, which houses and implements the Compare method, and pass that subclass to Sort.

In short, using a nested subclass that inherits from IComparer and implements the Compare method is the most general approach to object sorting. I presented to you a useful pattern for multilevel comparisons where you first check for equality (returning 0), then use n checks for greater than (returning +1) and then fall to a return of -1. Happy object sorting!

Acknowledgement: Thanks to Paul Koch, a developer at Microsoft Research, for reviewing the technical accuracy of this article.

About the Author

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at jamccaff@microsoft.com.

comments powered by Disqus

Featured

Subscribe on YouTube