How to count number of substrings in python, if substrings overlap?

Question

The count() function returns the number of times a substring occurs in a string, but it fails in case of overlapping strings.

Let's say my input is:

^_^_^-_-

I want to find how many times ^_^ occurs in the string.

mystr=input()
happy=mystr.count('^_^')
sad=mystr.count('-_-')
print(happy)
print(sad)

Output is:

1
1

I am expecting:

2
1

How can I achieve the desired result?

Mad Physicist · Accepted Answer · 2018-03-15 19:16:59Z

New Version

You can solve this problem without writing any explicit loops using regex. As @abhijith-pk's answer cleverly suggests, you can search for the first character only, with the remainder being placed in a positive lookahead, which will allow you to make the match with overlaps:

def count_overlapping(string, pattern):
    regex = '{}(?={})'.format(re.escape(pattern[:1]), re.escape(pattern[1:]))
    # Consume iterator, get count with minimal memory usage
    return sum(1 for _ in re.finditer(regex, string))

[IDEOne Link]

Using [:1] and [1:] for the indices allows the function to handle the empty string without special processing, while using [0] and [1:] for the indices would not.

Old Version

You can always write your own routine using the fact that str.find allows you to specify a starting index. This routine will not be very efficient, but it should work:

def count_overlapping(string, pattern):
    count = 0
    start = -1
    while True:
        start = string.find(pattern, start + 1)
        if start < 0:
            return count
        count += 1

[IDEOne Link]

Usage

Both versions return identical results. A sample usage would be:

>>> mystr = '^_^_^-_-'
>>> count_overlapping(mystr, '^_^')
2
>>> count_overlapping(mystr, '-_-')
1
>>> count_overlapping(mystr, '')
9
>>> count_overlapping(mystr, 'x')
0

Notice that the empty string is found len(mystr) + 1 times. I consider this to be intuitively correct because it is effectively between and around every character.

ClumsyPuffin · Accepted Answer · 2018-03-15 07:20:37Z

3

you can use regex for a quick and dirty solution :

import re
mystr='^_^_^-_-'
print(len(re.findall('\^(?=_\^)',mystr)))

answered Mar 15, 2018 at 7:20

ClumsyPuffin

4,0791 gold badge19 silver badges17 bronze badges

6 Comments

Mad Physicist Over a year ago

This is awesome. Search for first character, positive lookahead on the rest. May I steal and generalize it?

ClumsyPuffin Over a year ago

@MadPhysicist Yes , Please and thanks for for explaining it nicely :)

Sociopath Over a year ago

I knew one of the regex ninjas will come out and post better answer :-)

Mad Physicist Over a year ago

@Abhijithpk. See the update to my answer. Seems to work like the crap I posted before, but so much more elegant. Thanks for showing me this

ClumsyPuffin Over a year ago

@AyushGoyal Have a look at MadPhysicist 's updated answer ,He is explaining how positive lookahead works and he uses re.escape to avoid escaping regex special characters like ^ instead of my crude backslash escape sequence

|

Sociopath · Accepted Answer · 2018-03-15 07:04:26Z

1

You need something like this

def count_substr(string,substr):
    n=len(substr)
    count=0
    for i in range(len(string)-len(substr)+1):
        if(string[i:i+len(substr)] == substr):      
            count+=1
    return count

mystr=input()
print(count_substr(mystr,'121'))

Input: 12121990

Output: 2

answered Mar 15, 2018 at 7:04

Sociopath

13.4k22 gold badges53 silver badges82 bronze badges

6 Comments

Mad Physicist Over a year ago

This is a totally functional solution, but it has a number of inefficiencies. The main one is that string[i:i+len(substr)] creates a new string object every single time.

Sociopath Over a year ago

totally agree with you. It's not an efficient solution but only thing I could come up with. will surly improve :-)

Mad Physicist Over a year ago

It's completely functional, so +1. I really don't think mine is much better, but I couldn't think of a better way either :)

Mad Physicist Over a year ago

I especially like that we get the same result for an empty patter without any special processing.

Mad Physicist Over a year ago

I think it is absolutely ideal to do that :) I even addressed it in my answer.

|

Collectives™ on Stack Overflow

How to count number of substrings in python, if substrings overlap?

3 Answers 3

Comments

6 Comments

6 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

6 Comments

6 Comments

Linked

Related