75

I have a sorted array:

[
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
]

I would like to get something like this but it does not have to be a hash:

[
  {:error => 'FATAL <error title="Request timed out.">', :count => 2},
  {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1}
]

14 Answers 14

137

The following code prints what you asked for. I'll let you decide on how to actually use to generate the hash you are looking for:

# sample array
a=["aa","bb","cc","bb","bb","cc"]

# make the hash default to 0 so that += will work correctly
b = Hash.new(0)

# iterate over the array, counting duplicate entries
a.each do |v|
  b[v] += 1
end

b.each do |k, v|
  puts "#{k} appears #{v} times"
end

Note: I just noticed you said the array is already sorted. The above code does not require sorting. Using that property may produce faster code.

Sign up to request clarification or add additional context in comments.

4 Comments

I do not actually need to print it, just a hash did the trick. Thanks!
I know I'm late, but, wow. Hash defaults. That's a really cool trick. Thanks!
And if you wanted to find the max occurrence (and do it in a single line): a.inject(Hash.new(0)) {|hash, val| hash[val] += 1; hash}.entries.max_by {|entry| entry.last} ....gotta love it!
You should learn Enumerable to avoid procedure coding style.
69

You can do this very succinctly (one line) by using inject:

a = ['FATAL <error title="Request timed out.">',
      'FATAL <error title="Request timed out.">',
      'FATAL <error title="There is insufficient ...">']

b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h }

b.to_a.each {|error,count| puts "#{count}: #{error}" }

Will produce:

1: FATAL <error title="There is insufficient ...">
2: FATAL <error title="Request timed out.">

3 Comments

With Ruby 1.9+ you can use each_with_object instead of inject: a.each_with_object(Hash.new(0)) { |o, h| h[o] += 1 }.
@Andrew - thanks, I prefer the naming of each_with_object since it better matches other similar method names on ruby enumerables.
Note that each_with_object simplifies the code a little because it doesn't require the accumulator to be the return value of the block.
43

Using Enumerable#tally

["a", "b", "c", "b"].tally 

#=> { "a" => 1, "b" => 2, "c" => 1 }

Note: Only for Ruby versions >= 2.7

2 Comments

amazing ruby stuff
Wow, and it's not even a Rails thing. Brilliant. You just saved me like 12 lines.
31

If you have array like this:

words = ["aa","bb","cc","bb","bb","cc"]

where you need to count duplicate elements, a one line solution is:

result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }

Comments

28

A different approach to the answers above, using Enumerable#group_by.

[1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h
# {1=>1, 2=>2, 3=>3, 4=>1}

Breaking that into its different method calls:

a = [1, 2, 2, 3, 3, 3, 4]
a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]}
a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]]
a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1}

Enumerable#group_by was added in Ruby 1.8.7.

2 Comments

I like (&:itself), that's just the right amount of clever!
A one-line as elegant as they come. This should be the accepted answer!
21

How about the following:

things = [1, 2, 2, 3, 3, 3, 4]
things.uniq.map{|t| [t,things.count(t)]}.to_h

It sort of feels cleaner and more descriptive of what we're actually trying to do.

I suspect it would also perform better with large collections than the ones that iterate over each value.

Benchmark Performance test:

a = (1...1000000).map { rand(100)}
                       user     system      total        real
inject                 7.670000   0.010000   7.680000 (  7.985289)
array count            0.040000   0.000000   0.040000 (  0.036650)
each_with_object       0.210000   0.000000   0.210000 (  0.214731)
group_by               0.220000   0.000000   0.220000 (  0.218581)

So it is quite a bit faster.

2 Comments

Doesn't things.uniq and things.count(t) iterate over the array?
Entirely possible it does, under the hood, so perhaps I've described that wrong. Either way, the performance gain appears to be real, I think...
12

From Ruby >= 2.2 you can use itself: array.group_by(&:itself).transform_values(&:count)

With some more detail:

array = [
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
];

array.group_by(&:itself).transform_values(&:count)
 => { "FATAL <error title=\"Request timed out.\">"=>2,
      "FATAL <error title=\"There is insufficient system memory to run this query.\">"=>1 }

Comments

8

Personally I would do it this way:

# myprogram.rb
a = ['FATAL <error title="Request timed out.">',
'FATAL <error title="Request timed out.">',
'FATAL <error title="There is insufficient system memory to run this query.">']
puts a

Then run the program and pipe it to uniq -c:

ruby myprogram.rb | uniq -c

Output:

 2 FATAL <error title="Request timed out.">
 1 FATAL <error title="There is insufficient system memory to run this query.">

Comments

3
a = [1,1,1,2,2,3]
a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } }
=> [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}]

2 Comments

Oh, don't do that. You're reiterating through the whole array for each value!
there are good solutions up there. just want to mention the existance of array#count: a = [1,1,1,2,2,3]; a.uniq.inject([]){|r, i| r << { :error => i, :count => a.count(i) } }
1

If you want to use this often I suggest to do this:

# lib/core_extensions/array/duplicates_counter
module CoreExtensions
  module Array
    module DuplicatesCounter
      def count_duplicates
        self.each_with_object(Hash.new(0)) { |element, counter| counter[element] += 1 }.sort_by{|k,v| -v}.to_h
      end
    end
  end
end

Load it with

Array.include CoreExtensions::Array::DuplicatesCounter

And then use from anywhere with just:

the_ar = %w(a a a a a a a  chao chao chao hola hola mundo hola chao cachacho hola)
the_ar.duplicates_counter
{
           "a" => 7,
        "chao" => 4,
        "hola" => 4,
       "mundo" => 1,
    "cachacho" => 1
}

Comments

1

Since #tally is for 2.7 and up, and I'm not there yet, it's easy to use the #count method on the array. Use #uniq on the array to get one copy of each member of the array, and then find #count for that member in the array:

counts=Hash.new
arr.uniq.each {|name| counts[name]=arr.count(name) }

Example:

arr = [ 1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 5]
arr.uniq => [1, 2, 3, 4, 5]
counts=Hash.new; arr.uniq.each {|name| counts[name]=arr.count(name) }

gives us

counts => {1=>1, 2=>2, 3=>5, 4=>2, 5=>1} 

Comments

0

Simple implementation:

(errors_hash = {}).default = 0
array_of_errors.each { |error| errors_hash[error] += 1 }

1 Comment

That first line could be written more clearly using errors_hash = Hash.new(0)
0

Here is the sample array:

a=["aa","bb","cc","bb","bb","cc"]
  1. Select all the unique keys.
  2. For each key, we'll accumulate them into a hash to get something like this: {'bb' => ['bb', 'bb']}
    res = a.uniq.inject({}) {|accu, uni| accu.merge({ uni => a.select{|i| i == uni } })}
    {"aa"=>["aa"], "bb"=>["bb", "bb", "bb"], "cc"=>["cc", "cc"]}

Now you are able to do things like:

res['aa'].size 

Comments

-3
def find_most_occurred_item(arr)
    return 'Array has unique elements already' if arr.uniq == arr
    m = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
    m.each do |k, v|
        a = arr.max_by { |v| m[v] }
        if v > a
            puts "#{k} appears #{v} times"
        elsif v == a
            puts "#{k} appears #{v} times"
        end 
    end
end

puts find_most_occurred_item([1, 2, 3,4,4,4,3,3])

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.