How to count duplicate elements in a Ruby array

Question

I have a sorted array:

[
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
]

I would like to get something like this but it does not have to be a hash:

[
  {:error => 'FATAL <error title="Request timed out.">', :count => 2},
  {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1}
]

nimrodm · Accepted Answer · 2009-02-20 14:39:02Z

137

The following code prints what you asked for. I'll let you decide on how to actually use to generate the hash you are looking for:

# sample array
a=["aa","bb","cc","bb","bb","cc"]

# make the hash default to 0 so that += will work correctly
b = Hash.new(0)

# iterate over the array, counting duplicate entries
a.each do |v|
  b[v] += 1
end

b.each do |k, v|
  puts "#{k} appears #{v} times"
end

Note: I just noticed you said the array is already sorted. The above code does not require sorting. Using that property may produce faster code.

answered Feb 20, 2009 at 14:39

nimrodm

24k7 gold badges62 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Željko Filipin Over a year ago

I do not actually need to print it, just a hash did the trick. Thanks!

Matchu Over a year ago

I know I'm late, but, wow. Hash defaults. That's a really cool trick. Thanks!

codecraig Over a year ago

And if you wanted to find the max occurrence (and do it in a single line): a.inject(Hash.new(0)) {|hash, val| hash[val] += 1; hash}.entries.max_by {|entry| entry.last} ....gotta love it!

phil pirozhkov Over a year ago

You should learn Enumerable to avoid procedure coding style.

vladr · Accepted Answer · 2011-09-27 13:47:32Z

69

You can do this very succinctly (one line) by using inject:

a = ['FATAL <error title="Request timed out.">',
      'FATAL <error title="Request timed out.">',
      'FATAL <error title="There is insufficient ...">']

b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h }

b.to_a.each {|error,count| puts "#{count}: #{error}" }

Will produce:

1: FATAL <error title="There is insufficient ...">
2: FATAL <error title="Request timed out.">

edited Sep 27, 2011 at 13:47

user142162

answered Feb 21, 2009 at 2:17

vladr

67k18 gold badges131 silver badges132 bronze badges

3 Comments

Andrew Marshall Over a year ago

With Ruby 1.9+ you can use each_with_object instead of inject: a.each_with_object(Hash.new(0)) { |o, h| h[o] += 1 }.

Matt Huggins Over a year ago

@Andrew - thanks, I prefer the naming of each_with_object since it better matches other similar method names on ruby enumerables.

the Tin Man Over a year ago

Note that each_with_object simplifies the code a little because it doesn't require the accumulator to be the return value of the block.

Santhosh · Accepted Answer · 2021-02-24 16:47:10Z

43

Using Enumerable#tally

["a", "b", "c", "b"].tally 

#=> { "a" => 1, "b" => 2, "c" => 1 }

Note: Only for Ruby versions >= 2.7

edited Feb 24, 2021 at 16:47

answered Sep 1, 2019 at 21:06

Santhosh

29.3k9 gold badges84 silver badges88 bronze badges

2 Comments

Yakob Ubaidi Over a year ago

amazing ruby stuff

J.M. Janzen Over a year ago

Wow, and it's not even a Rails thing. Brilliant. You just saved me like 12 lines.

the Tin Man · Accepted Answer · 2017-05-04 21:15:21Z

31

If you have array like this:

words = ["aa","bb","cc","bb","bb","cc"]

where you need to count duplicate elements, a one line solution is:

result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }

edited May 4, 2017 at 21:15

the Tin Man

161k44 gold badges222 silver badges308 bronze badges

answered May 28, 2014 at 11:35

Manish Shrivastava

32.2k13 gold badges103 silver badges103 bronze badges

Comments

the Tin Man · Accepted Answer · 2017-05-04 21:16:47Z

28

A different approach to the answers above, using Enumerable#group_by.

[1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h
# {1=>1, 2=>2, 3=>3, 4=>1}

Breaking that into its different method calls:

a = [1, 2, 2, 3, 3, 3, 4]
a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]}
a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]]
a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1}

Enumerable#group_by was added in Ruby 1.8.7.

edited May 4, 2017 at 21:16

the Tin Man

161k44 gold badges222 silver badges308 bronze badges

answered Jan 24, 2017 at 18:04

Kaoru

1,57011 silver badges15 bronze badges

2 Comments

Dan Bechard Over a year ago

I like (&:itself), that's just the right amount of clever!

zor-el Over a year ago

A one-line as elegant as they come. This should be the accepted answer!

SqlZim · Accepted Answer · 2018-03-23 21:35:04Z

21

How about the following:

things = [1, 2, 2, 3, 3, 3, 4]
things.uniq.map{|t| [t,things.count(t)]}.to_h

It sort of feels cleaner and more descriptive of what we're actually trying to do.

I suspect it would also perform better with large collections than the ones that iterate over each value.

Benchmark Performance test:

a = (1...1000000).map { rand(100)}
                       user     system      total        real
inject                 7.670000   0.010000   7.680000 (  7.985289)
array count            0.040000   0.000000   0.040000 (  0.036650)
each_with_object       0.210000   0.000000   0.210000 (  0.214731)
group_by               0.220000   0.000000   0.220000 (  0.218581)

So it is quite a bit faster.

edited Mar 23, 2018 at 21:35

SqlZim

38.2k6 gold badges45 silver badges61 bronze badges

answered Apr 5, 2017 at 13:42

Carpela

2,2051 gold badge26 silver badges57 bronze badges

2 Comments

Santhosh Over a year ago

Doesn't things.uniq and things.count(t) iterate over the array?

Carpela Over a year ago

Entirely possible it does, under the hood, so perhaps I've described that wrong. Either way, the performance gain appears to be real, I think...

Ana María Martínez Gómez · Accepted Answer · 2018-09-24 20:54:04Z

From Ruby >= 2.2 you can use itself: array.group_by(&:itself).transform_values(&:count)

With some more detail:

array = [
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
];

array.group_by(&:itself).transform_values(&:count)
 => { "FATAL <error title=\"Request timed out.\">"=>2,
      "FATAL <error title=\"There is insufficient system memory to run this query.\">"=>1 }

dan · Accepted Answer · 2012-05-03 22:03:20Z

8

Personally I would do it this way:

# myprogram.rb
a = ['FATAL <error title="Request timed out.">',
'FATAL <error title="Request timed out.">',
'FATAL <error title="There is insufficient system memory to run this query.">']
puts a

Then run the program and pipe it to uniq -c:

ruby myprogram.rb | uniq -c

Output:

 2 FATAL <error title="Request timed out.">
 1 FATAL <error title="There is insufficient system memory to run this query.">

answered May 3, 2012 at 22:03

dan

46k52 gold badges165 silver badges265 bronze badges

Comments

Milan Novota · Accepted Answer · 2009-02-20 14:56:51Z

3

a = [1,1,1,2,2,3]
a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } }
=> [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}]

answered Feb 20, 2009 at 14:56

Milan Novota

15.6k7 gold badges57 silver badges62 bronze badges

2 Comments

glenn mcdonald Over a year ago

Oh, don't do that. You're reiterating through the whole array for each value!

Mr. Ronald Over a year ago

there are good solutions up there. just want to mention the existance of array#count: a = [1,1,1,2,2,3]; a.uniq.inject([]){|r, i| r << { :error => i, :count => a.count(i) } }

Arnold Roa · Accepted Answer · 2018-07-28 03:39:49Z

If you want to use this often I suggest to do this:

# lib/core_extensions/array/duplicates_counter
module CoreExtensions
  module Array
    module DuplicatesCounter
      def count_duplicates
        self.each_with_object(Hash.new(0)) { |element, counter| counter[element] += 1 }.sort_by{|k,v| -v}.to_h
      end
    end
  end
end

Load it with

Array.include CoreExtensions::Array::DuplicatesCounter

And then use from anywhere with just:

the_ar = %w(a a a a a a a  chao chao chao hola hola mundo hola chao cachacho hola)
the_ar.duplicates_counter
{
           "a" => 7,
        "chao" => 4,
        "hola" => 4,
       "mundo" => 1,
    "cachacho" => 1
}

Moshe Yudkowsky · Accepted Answer · 2022-01-05 21:22:05Z

Since #tally is for 2.7 and up, and I'm not there yet, it's easy to use the #count method on the array. Use #uniq on the array to get one copy of each member of the array, and then find #count for that member in the array:

counts=Hash.new
arr.uniq.each {|name| counts[name]=arr.count(name) }

Example:

arr = [ 1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 5]
arr.uniq => [1, 2, 3, 4, 5]
counts=Hash.new; arr.uniq.each {|name| counts[name]=arr.count(name) }

gives us

counts => {1=>1, 2=>2, 3=>5, 4=>2, 5=>1}

Evan Senter · Accepted Answer · 2009-02-21 02:24:20Z

0

Simple implementation:

(errors_hash = {}).default = 0
array_of_errors.each { |error| errors_hash[error] += 1 }

answered Feb 21, 2009 at 2:24

Evan Senter

2471 silver badge10 bronze badges

1 Comment

the Tin Man Over a year ago

That first line could be written more clearly using errors_hash = Hash.new(0)

the Tin Man · Accepted Answer · 2012-11-13 18:17:50Z

0

Here is the sample array:

a=["aa","bb","cc","bb","bb","cc"]

Select all the unique keys.
For each key, we'll accumulate them into a hash to get something like this: {'bb' => ['bb', 'bb']}

    res = a.uniq.inject({}) {|accu, uni| accu.merge({ uni => a.select{|i| i == uni } })}
    {"aa"=>["aa"], "bb"=>["bb", "bb", "bb"], "cc"=>["cc", "cc"]}

Now you are able to do things like:

res['aa'].size

edited Nov 13, 2012 at 18:17

the Tin Man

161k44 gold badges222 silver badges308 bronze badges

answered Nov 13, 2012 at 18:14

magicgregz

7,7293 gold badges37 silver badges28 bronze badges

Comments

S.Shah · Accepted Answer · 2020-04-19 16:04:43Z

-3

def find_most_occurred_item(arr)
    return 'Array has unique elements already' if arr.uniq == arr
    m = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
    m.each do |k, v|
        a = arr.max_by { |v| m[v] }
        if v > a
            puts "#{k} appears #{v} times"
        elsif v == a
            puts "#{k} appears #{v} times"
        end 
    end
end

puts find_most_occurred_item([1, 2, 3,4,4,4,3,3])

answered Apr 19, 2020 at 16:04

S.Shah

197 bronze badges

Collectives™ on Stack Overflow

How to count duplicate elements in a Ruby array

14 Answers 14

4 Comments

3 Comments

2 Comments

Comments

2 Comments

2 Comments

Comments

Comments

2 Comments

Comments

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

14 Answers 14

4 Comments

3 Comments

2 Comments

Comments

2 Comments

2 Comments

Comments

Comments

2 Comments

Comments

Comments

1 Comment

Comments

Comments

Linked

Related