11

I can globally replace a regular expression with re.sub(), and I can count matches with

for match in re.finditer(): count++

Is there a way to combine these two, so that I can count my substitutions without making two passes through the source string?

Note: I'm not interested in whether the substitution matched, I'm interested in the exact count of matches in the same call, avoiding one call to count and one call to substitute.

2
  • A duplicate of Python: how to substitute and know whether it matched. Commented Mar 30, 2020 at 22:57
  • 2
    As noted in the question, I'm not interested in whether the substitution was made, I'm interested in (a) knowing how many substitutions were made and (b) doing it via the same call that's making the substitutions. Commented Mar 30, 2020 at 23:44

3 Answers 3

10

You can use re.subn.

re.subn(pattern, repl, string, count=0, flags=0)

it returns (new_string, number_of_subs_made)

For example purposes, I'm using the same example as @Shubham Sharma used.

text = "Jack 10, Lana 11, Tom 12, Arthur, Mark"
out_str, count = re.subn(r"(\d+)", repl='repl', string=text)

# out_str--> 'Jack repl, Lana repl, Tom repl, Arthur, Mark'
# count---> 3
Sign up to request clarification or add additional context in comments.

Comments

3

You can pass a repl function while calling the re.sub function. The function takes a single match object argument, and returns the replacement string. The repl function is called for every non-overlapping occurrence of pattern.

Try this:

count = 0
def count_repl(mobj): # --> mobj is of type re.Match
    global count
    count += 1 # --> count the substitutions
    return "your_replacement_string" # --> return the replacement string

text = "The original text" # --> source string
new_text = re.sub(r"pattern", repl=count_repl, string=text) # count and replace the matching occurrences in one pass.

OR,

You can use re.subn which performs the same operation as re.sub, but return a tuple (new_string, number_of_subs_made).

new_text, count = re.sub(r"pattern", repl="replacement", string=text)

Example:

count = 0
def count_repl(mobj):
    global count
    count += 1
    return f"ID: {mobj.group(1)}"

text = "Jack 10, Lana 11, Tom 12, Arthur, Mark"
new_text = re.sub(r"(\d+)", repl=count_repl, string=text)

print(new_text)
print("Number of substitutions:", count)

Output:

Jack ID: 10, Lana ID: 11, Tom ID: 12
Number of substitutions: 3

Comments

0
import re


text = "Jack 10, Lana 11, Tom 12"
count = len([x for x in re.finditer(r"(\d+)", text)])
print(count)

# Output: 3

Ok, there's a better way

import re


text = "Jack 10, Lana 11, Tom 12"
count = re.subn(r"(\d+)", repl="replacement", string=text)[1]
print(count)

# Output: 3

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.