40

I have array of words and I want to get a hash, where keys are words and values are word count.

Is there any more beautiful way then my:

result = Hash.new(0)
words.each { |word| result[word] += 1 }
return result
4
  • Are you doing the Berkeley SaaS course? Commented Feb 28, 2012 at 11:54
  • 2
    Yes, I have a solution, but looking for the better versions. Commented Feb 28, 2012 at 12:49
  • 1
    if result[word] doesn't exist it'll throw an exception because there's no + for nil. Commented Nov 18, 2014 at 3:26
  • the result is initialized with 0, so if key doesn't exist it will be 0, not nil Commented Dec 5, 2018 at 10:54

5 Answers 5

60

The imperative approach you used is probably the fastest implementation in Ruby. With a bit of refactoring, you can write a one-liner:

wf = Hash.new(0).tap { |h| words.each { |word| h[word] += 1 } }

Another imperative approach using Enumerable#each_with_object:

wf = words.each_with_object(Hash.new(0)) { |word, acc| acc[word] += 1 }

A functional/immutable approach using existing abstractions:

wf = words.group_by(&:itself).map { |w, ws| [w, ws.length] }.to_h

Note that this is still O(n) in time, but it traverses the collection three times and creates two intermediate objects along the way.

Finally: a frequency counter/histogram is a common abstraction that you'll find in some libraries like Facets: Enumerable#frequency.

require 'facets'
wf = words.frequency
Sign up to request clarification or add additional context in comments.

5 Comments

May be simply, str.split(" ").reduce(Hash.new(0)) { |h,w| puts h[w] += 1; h }?
Some pinch-of-salt speed testing, ruby 2.0.0p451 on a macbook running mavericks: Declarative: 100.times { words.inject(Hash.new 0) { |h, w| h[w] += 1; h } }: avg 1.17s. Imperative: 100.times { hist = Hash.new 0; words.each { |w| hist[w] += 1 } }: avg 1.09s. words was an array of 10k random words, generation of the array alone took 0.2s avg. i.e. Imperative was about 9% faster.
Thank you for the last note about Facets. I've re-implemented this several times now, and facets saves me the trouble of re-doing it or starting my own standard lib. For others, you should check out Facets, it's like an extension of Ruby's standard library.
Great answer. I prefer the readability of group_by(&:itself)
Also, each_with_object fits better here than reduce IMO.
17

Posted on a related question, but posting here for visibility as well:

Ruby 2.7 onwards will have the Enumerable#tally method that will solve this.

From the trunk documentation:

Tallys the collection. Returns a hash where the keys are the elements and the values are numbers of elements in the collection that correspond to the key.

["a", "b", "c", "b"].tally #=> {"a"=>1, "b"=>2, "c"=>1}

Comments

7

With inject:

str = 'I have array of words and I want to get a hash, where keys are words'
result = str.split.inject(Hash.new(0)) { |h,v| h[v] += 1; h }

=> {"I"=>2, "have"=>1, "array"=>1, "of"=>1, "words"=>2, "and"=>1, "want"=>1, "to"=>1, "get"=>1, "a"=>1, "hash,"=>1, "where"=>1, "keys"=>1, "are"=>1}

I don't know about the efficiency.

2 Comments

According to doc of the facets method posted by tokland, inject is a slower.
Also, if you use inject and you need to return the object at the end of the block like above (; h), you should use each_with_object instead.
3

This one is elegant:

  words.group_by(&:itself).transform_values(&:count)

Comments

2
irb(main):001:0> %w(foo bar foo bar).each_with_object(Hash.new(0)) { |w, m| m[w] += 1 }
=> {"foo"=>2, "bar"=>2}

as @mfilej said

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.