I had a need to group the same dataset in several groups. So instead of repeatedly query the dataset, I made an extension that could do it once. The caveat is, that the result is materialized in dictionaries, because I didn't managed to find an way to avoid that. Maybe you can?
public static IDictionary<string, Dictionary<object, HashSet<T>>> MultiGroupBy<T>(this IEnumerable<T> source, params (string Label, Func<T, object> Getter)[] groupers)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (groupers == null) throw new ArgumentNullException(nameof(groupers));
IDictionary<string, Dictionary<object, HashSet<T>>> results = new Dictionary<string, Dictionary<object, HashSet<T>>>();
using (var enumer = source.GetEnumerator())
{
while (enumer.MoveNext())
{
foreach ((var label, var func) in groupers)
{
if (!results.TryGetValue(label, out var dict))
{
dict = new Dictionary<object, HashSet<T>>();
results[label] = dict;
}
var key = func(enumer.Current);
if (!dict.TryGetValue(key, out var set))
{
set = new HashSet<T>();
dict[key] = set;
}
set.Add(enumer.Current);
}
}
}
return results;
}
Use case:
static void TestMultiGrouping()
{
string[] data =
{
"Black",
"White",
"Yellow",
"green",
"Red",
"blue",
"cyan",
"Magenta",
"Orange"
};
foreach (var result in data.MultiGroupBy(
("First UCase", s => s.Length > 0 && char.IsUpper(s[0])),
("Length", s => s.Length),
("Length Four", s => s.Length == 4),
("Contains 'e'", s => s.Contains('e')),
("Num n's", s => s.Count(c => c == 'n'))))
{
Console.WriteLine($"Results for {result.Key}:");
foreach (var dict in result.Value)
{
Console.WriteLine($"{dict.Key}: {dict.Value.Count} [{(string.Join(", ", dict.Value))}]");
}
Console.WriteLine();
}
}
IObservable) rather than a pull interface (IEnumerable). Rather than returning a dict mapping keys to sets of elements, you can try returning a dict mapping keys to streams of elements. \$\endgroup\$IEnumerable) was given. Maybe the header of the question is a little misleading - the primary goal was performance optimization rather than "one iteration", but I thought, that "one iteration" was the way to optimize - which it still may be if memory allocation matters. \$\endgroup\$IEnumerable.toObservable(), which would be a perfect fit. I think the performance characteristics would be pretty good. Using a cold observable (one that doesn't do anything until there's a subscriber), nothing happens until anyone is interested in the result, and they won't have to be concurrently stored in memory \$\endgroup\$