My goal is to read from standard input, break up the input into words (Unicode letters and apostrophes), case-insensitively, and produce a report to standard output, where each line is a word followed by a space followed by its count. The output should be sorted by words.
This is my code:
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
class TraditionalWordCountApp
{
public static void Main(string[] args)
{
SortedDictionary<string, int> counts = new SortedDictionary<string, int>();
Regex wordRegex = new Regex(@"[\p{L}']+");
string line;
while ((line = Console.ReadLine()) != null)
{
line = line.ToLower();
foreach (Match m in wordRegex.Matches(line))
{
String word = m.Value;
int count;
counts.TryGetValue(word, out count);
counts[word] = ++count;
}
}
foreach (KeyValuePair<string, int> pair in counts)
{
Console.WriteLine("{0} {1}", pair.Key, pair.Value);
}
}
}
My concerns:
- Should things be wrapped in a namespace?
- Should either the class or the
mainfunction or both bepublic? - Is there a nicer way to declare the sorted dictionary? Java has
<>, does C#? - I used the C-style idiom of assigning to a variable and then checking for null, all inside the while-condition! Surely there is a nicer way to do this in C#. Is there? I like neither the embedded assignment nor the billion-dollar-mistake
null. :) - I searched StackOverflow a bit for a nicer way to update a word frequency dictionary. Most languages seem to have a built-in get-with-default method on dictionaries, but the answers I saw said no, and they recommended
TryGetValue. These answers are old, though. Is there something nicer in the most modern C#? - Any nicer way to do the last loop?
- Would a LINQ-based solution be nicer? More idiomatic? What would it look like? I think (though I am brand new to C#) I could whip something up, but should I? FWIW I do know how to do the LINQ-y stuff in Java but wonder if this is the right way to go in C#.