Cook Computing

Splitting a String

April 15, 2009 Written by Charles Cook

I recently investigated a bug where someone had tried to split a string into space separated words using String.Split in C#. The bug was that the code failed to handle leading/trailing spaces and sequences of more than one space in the string:


var input = " 1 22  3333  ";
var words = input.Split(' ');
foreach (var word in words) Console.Write("[" + word + "] ");

// outputs [] [1] [22] [] [3333] [] []

Not a particularly interesting bug but investigating it did throw up a couple of mildly interesting points. I first thought I would try using Regex.Split but this does not handle the leading/trailing spaces:


var words = Regex.Split(input, " +");

// outputs [] [1] [22] [3333] []

Ignoring the fact I could just call String.Trim to fix this, I tried using Regex.Matches. The interesting point here is that this doesn't compile:


var words = from match in Regex.Matches(input, (@"[^ ]+")) select match.ToString();

This is because the class MatchCollection, the type returned from Regex.Matches, doesn't implement IEnumerable<T>, only implementing IEnumerable.The fix is to add a cast for the type returned from the MatchCollection enumeration:


var words = from Match match in Regex.Matches(input, @"[^ ]+") select match.ToString();
foreach (var word in words) Console.Write("[" + word + "] ");

// outputs [1] [22] [3333]

The other interesting point is that I noticed there is no ForEach extension in Linq. For example, this doesn't compile:


words.ForEach(word => Console.Write("[" + word + "] "));

I suppose this is the case because Linq is a functional style of programming and so its operators should be side-effect free. ForEach does not fit in with this, its whole purpose being the side-effects it is used for. If you want to use ForEach in this way you can implement your own ForEach extension:


public static class MyExtensions
{
    public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
    {
        foreach (T item in source)
        {
            action(item);
        }
    }
}

Going back to the original problem, I discovered that String.Split takes an option which solves the problem:


var words = input.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
foreach (var word in words) Console.Write("[" + word + "] ");

// outputs [1] [22] [3333]