Splitting string in multiple places Python

Question

I am trying to get a set of numbers out of a string. The numbers are nestled between characters.

Here is an example: NC123456Sarah Von Winkle

NC is the only part of the string that is a guarantee
123456 is the number I want to extract
Sarah Von Winkle is the name, it can be anything

So I cannot just split at 'S' and 'C' to try and grab the digits.

Code

Nothing tried so far.

Problem

I have no idea how to approach this.

How can I split the string to get only the digits in the middle?

Regex is your friend for something like this docs.python.org/3/library/re.html — Ryan Schaefer
– Ryan Schaefer, Commented Nov 6, 2020 at 17:14
Good question (context/task + optional code + problem + question) - good answer (at least I tried) 😉️ — hc_dev
– hc_dev, Commented Feb 1, 2022 at 21:58

wasif · Accepted Answer · 2020-11-06 17:15:53Z

1

You can use Regex for this:

import re
s='NC123456Sarah Von Winkle'
m=''.join(re.findall(r'NC(\d+).*',s))
print(int(m))

answered Nov 6, 2020 at 17:15

wasif

15.6k3 gold badges19 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

hc_dev Over a year ago

Why join? Are we supposed to search a repetitive pattern .. and concatenate the found numbers to a single int ?

hc_dev Over a year ago

While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.

Dharman · Accepted Answer · 2020-11-06 17:29:07Z

1

You can try re, which is the standard library of Python.

import re

sample_string = "NC123456Sarah Von Winkle"
result_digits = re.findall(r"\d+", sample_string, flags=0)

Then your result should be ['123456']. If you want just an integer instead of a string, you can convert it with int(result_digits[0]).

edited Nov 6, 2020 at 17:29

Dharman♦

33.9k27 gold badges103 silver badges156 bronze badges

answered Nov 6, 2020 at 17:23

Snoopy

1481 silver badge7 bronze badges

1 Comment

hc_dev Over a year ago

Nice concise regex \d+! Can you explain it and the reason for flags=0 ?

hc_dev · Accepted Answer · 2022-02-01 20:33:38Z

0

Use the regex module :

import re
s = "NC123456Sarah Von Winkle"
t = re.findall("[0-9]+",s)
print(t)

This will give :

['123456']

The regular-expression (pattern) is composed of:

character-range [0-9] will find all occurrences of any digit between 0 to 9 in the string s
quantifier + indicates, we are searching for at least one occurrence of the pattern before (e.g. [0-9]).

edited Feb 1, 2022 at 20:33

hc_dev

9,6941 gold badge30 silver badges47 bronze badges

answered Nov 6, 2020 at 17:16

Roshin Raphel

2,7195 gold badges28 silver badges43 bronze badges

Comments

hc_dev · Accepted Answer · 2022-02-01 22:03:32Z

To match and capture (= extract) the number, you can use a regular-expression.

TL;DR: I would recommend re.match(r'NC(\d+)', s).group(1) (details in the last section).

Regex to match a number

To match a number with a minimum length of 1 digit, use the regular-expression (patter) \d+' for one or many digits, optionally inside a capturing-group as (\d+)` where:

\d is a character class (meta-character) for digits (of range 0-9)
+ is a quantifier matching if at least one occurrence of preceding pattern was found
( and ) form a capturing-group of the enclosed sub-regex

Test your regex on regex101 or regexplanet and choose the right flavor/language/engine (here: Python).

In Python use the built-in regex module re. Define the regex as raw-string like r'\d+'.

Find to extract only the number or empty list

Either function re.findall to find a list of occurrences:

import re

s = 'NC123456Sarah Von Winkle'
pattern = r'\d+'
occurrences = re.findall(pattern, s)

print(occurrences)

Prints:

['123456']

The first number occurrences[0] is yours if not empty:

if len(occurrences) == 0:
    print('no number found in: ' + s)
else:
    number =  occurrences[0]

Split to get all parts

Or function re.split to split the string into parts:

import re

s = 'NC123456Sarah Von Winkle'
pattern = r'(\d+)'
parts = re.split(pattern, s)

print(parts)

Prints:

['NC', '123456', 'Sarah Von Winkle']

Note: without the capture-group (i.e. without parentheses ()) the output would be just: ['NC', 'Sarah Von Winkle'] (excluding the splitter-pattern)

Here you would get the number in second part parts[1] as long as non-number-prefix like "NC" is guaranteed and followed by a number.

Extract with a capturing-group

Use the group function together with a regex containing a capturing-group:

import re

s = 'NC123456Sarah Von Winkle'
capture_number_pattern = re.compile(r'NC(\d+)')
extracted = capture_number_pattern.match(s).group(1)

print(extracted)

Prints:

Note: re.compile returns a compiled pattern. This can optimize performance when pattern is re-used multiple times and improve readability of the code.

Pay attention: To make your matching robust and defensive test if there is a match, otherwise an error is raised at runtime, see Python shell:

>>> extracted = capture_number_pattern.match('NCHelloWorld2022').group(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

You can test if a match was found or fail-fast if match is None:

s = 'NCHelloWorld2022'
match = capture_number_pattern.match(s)
if not match:
    print('No number found in:' + s)
else:
    print(match.group(1))

prints:

No number found in:NC123456Sarah Von Winkle

Collectives™ on Stack Overflow

Splitting string in multiple places Python

Code

Problem

4 Answers 4

2 Comments

1 Comment

Comments

Regex to match a number

Find to extract only the number or empty list

Split to get all parts

Extract with a capturing-group

Comments

Hot Network Questions

Collectives™ on Stack Overflow

Code

Problem

4 Answers 4

2 Comments

1 Comment

Comments

Regex to match a number

Find to extract only the number or empty list

Split to get all parts

Extract with a capturing-group

Comments

Related