Skip to main content
2 of 5
added 597 characters in body
Adriano Repetti
  • 10.6k
  • 1
  • 24
  • 48
  1. It's not mandatory but it's a widely accepted guideline. Break it only if you have a good reason to do so.
  2. No, they don't. Class should be internal (default) and static (because it has not instance methods). Main() method should be private to make clear that it won't be called outside this class but it's not mandatory (someone prefers public because it's logically called from outside).

For the other points things are more complex that a paragraph...

You can create a generator to read from standard input:

private static IEnumerable<string> ReadAllInputs()
{
    while (true)
    {
        string line = Console.ReadLine();
        if (line == null)
            break;

        yield return line;
    }
}

You do not need ToLower() on input, I do not even start to mention all the problems of case insensitive comparison if not done properly: a lower case string may be different from an upper case string (even in number of characters) but they may be compared equal. Don't put yourself in trouble and let the framework do the work for you:

var counts = new SortedDictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);

Note that you do not need a SortedDictionary here, ordering may be done (in this case) when printing output. If performance matters then you should profile by yourself.

When you increment word count you also increment local variable count, not a big performance hit (and compiler may be smart enough to elide local variable) but it's about code clarity. You can do:

counts[word] = count + 1;

Or simply:

++counts[word];

Note that this innocent code ++counts[word] (as well as original counts[word] = ++count) when word doesn't exist in dictionary (TryGetValue returns false) will perform another lookup vanishing part of the benefit of TryGetValue for the first insertion! You have to explicitly call Add().

If you want you can refactor foreach loop to print results with LINQ, not a big gain (IMO), let's see how:

Console.WriteLine(String.Join(Environment.NewLine,
    counts.Select(x => $"{x.Key} {x.Value}")));

If we put everything together with then have something like this:

namespace Test {
    static class TraditionalWordCountApp {
        static void Main(string[] args) {
            var counts = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);
            var wordRegex = new Regex(@"(?i)[\p{L}']+");

            foreach (var match in ReadAllInputs().SelectMany(x => wordRegex.Match(x))) {
                if (counts.TryGetValue(match.Value, out count))
                    counts[word] = count + 1;
                else
                    counts.Add(match.Value, 1);    
            }

            Console.WriteLine(String.Join(Environment.NewLine,
                counts.OrderBy(x => x.Key).Select(x => $"{x.Key} {x.Value}")));
        }

        static IEnumerable<string> ReadAllInputs() {
            while (true) { // Yes, in C# this is "safe"
                string line = Console.ReadLine();
                if (line == null)
                    break;

                // No need to match any regex against an empty line
                if (line.Length == 0)
                    continue; 

                yield return line;
            }
        }
    }
}

All these done you may think that...there should be another way to do it. You can use GroupBy(). Profile both versions to see which one has better performance (and the one you feel more easy to understand):

var counts = ReadAllInputs()
    .SelectMany(x => regex.Match(x))
    .GroupBy(x => x.Value, StringComparer.CurrentCultureIgnoreCase)
    .Select(x => new { Word = x.Key, Count = x.Count() })
    .OrderBy(x => x.Word);

Then you print it with:

Console.WriteLine(String.Join(Environment.NewLine,
    counts.Select(x => $"{x.Word} {x.Count}")));
Adriano Repetti
  • 10.6k
  • 1
  • 24
  • 48