Skip to main content
edited tags; edited title
Link
200_success
  • 145.6k
  • 22
  • 191
  • 481

Idiomatic word counting Word-counting script in C#?

Source Link
Ray Toal
  • 765
  • 7
  • 16

Idiomatic word counting script in C#?

My goal is to read from standard input, break up the input into words (Unicode letters and apostrophes), case-insensitively, and produce a report to standard output, where each line is a word followed by a space followed by its count. The output should be sorted by words.

This is my code:

using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;

class TraditionalWordCountApp
{
    public static void Main(string[] args)
    {
        SortedDictionary<string, int> counts = new SortedDictionary<string, int>();
        Regex wordRegex = new Regex(@"[\p{L}']+");
        string line;
        while ((line = Console.ReadLine()) != null)
        {
            line = line.ToLower();
            foreach (Match m in wordRegex.Matches(line))
            {
                String word = m.Value;
                int count;
                counts.TryGetValue(word, out count);
                counts[word] = ++count;
            }
        }
        foreach (KeyValuePair<string, int> pair in counts)
        {
            Console.WriteLine("{0} {1}", pair.Key, pair.Value);
        }
    }
}

My concerns:

  • Should things be wrapped in a namespace?
  • Should either the class or the main function or both be public?
  • Is there a nicer way to declare the sorted dictionary? Java has <>, does C#?
  • I used the C-style idiom of assigning to a variable and then checking for null, all inside the while-condition! Surely there is a nicer way to do this in C#. Is there? I like neither the embedded assignment nor the billion-dollar-mistake null. :)
  • I searched StackOverflow a bit for a nicer way to update a word frequency dictionary. Most languages seem to have a built-in get-with-default method on dictionaries, but the answers I saw said no, and they recommended TryGetValue. These answers are old, though. Is there something nicer in the most modern C#?
  • Any nicer way to do the last loop?
  • Would a LINQ-based solution be nicer? More idiomatic? What would it look like? I think (though I am brand new to C#) I could whip something up, but should I? FWIW I do know how to do the LINQ-y stuff in Java but wonder if this is the right way to go in C#.