I opted for the latter, to create a new empty hash and Hash#merge onto it, to be 100% sure that the default_proc as well as any internal flags are reset to defaults:
def gruppiere
return enum_for(__callee__) { size if respond_to?(:size) } unless block_given?
{}.merge(
each_with_object(Hash.new { |hash, key| hash[key] = [] }) do |element, result|
key = yield element
result[key] << element
end
)
end
#Hash#fetch
There is actually a better option than using a default_proc. Hash#fetch will get the value corresponding to the key if the key exists and otherwise return a value of our choosing:
def gruppiere
return enum_for(__callee__) { size if respond_to?(:size) } unless block_given?
each_with_object({}) do |element, result|
key = yield element
result[key] = result.fetch(key, []) << element
end
end
module EnumerableGruppiereExtension
def gruppiere
return enum_for(__callee__) { size if respond_to?(:size) } unless block_given?
{}.merge(
each_with_object(Hash.new { |hash, key| hash[key] = [] }) do |element, result|
key = yield element
result[key] = result[key]result.fetch(key, []) << element
end
)
end
end
module EnumerableWithGruppiere
refine Enumerable do
include EnumerableGruppiereExtension
end
end
using EnumerableWithGruppiere
puts [1, 2, 3, 4].gruppiere(&:even?)
#=> { false => [1, 3], true => [2, 4] }
It is, however, not easy to program in a functional way in Ruby. Neither the core and standard library data structures nor the core and standard library algorithms really lend themselves to Functional Programming.
Here is a purely functional version that does not use mutation, side-effects, or looping.:
def gruppiere
return enum_for(__callee__) { size if respond_to?(:size) } unless block_given?
inject({}) do |result, element|
key = yield element
result.merge({ key => result.fetch(key, []) + [element] })
end
end
Now, you might ask yourself: that actually doesn't look that bad. Why did I say that Ruby is not amenable to Functional Programming?
The reason for this is performance.
Because Hash and Array are mutable, operations such as Hash#merge and Array#+ can only be implemented by copying the entire data structure. Whereas if Hash and Array were immutable, as they are in a collections library for a functional language, these operations could be implemented by what is called structural sharing, which means that Hash#merge and Array#+ would not return a full copy of the original but rather would return only the updated data and a reference to the old version. This is much more efficient.
For example, here is what the same code would look like in Scala:
def [A, B](seq: Iterable[A]).gruppiere(classifier: A => B): Map[B, Iterable[A]] =
seq.foldLeft(Map.empty[B, IndexedSeq[A]]) {
(result, element) => {
val key = classifier(element)
result updated(key, result.getOrElse(key, IndexedSeq.empty[A]) :+ element)
}
}
Iterable(1, 2, 3).gruppiere { _ % 2 == 0 }
//=>Map(false -> Iterable(1, 3), true -> Iterable(2))
As you can see, it looks more or less identical. Some names are different (e.g. foldLeft instead of inject, getOrElse instead of fetch, etc.), and there are some static type annotations. But other than that, it is the same code. The main difference is in the performance: Map.updated does not copy the map, it returns a map which shares all its data except the one updated key-value-pair with the original. The same applies to IndexedSeq.:+ (an alias for IndexedSeq.append).