1

I am trying to get a set of numbers out of a string. The numbers are nestled between characters.

Here is an example: NC123456Sarah Von Winkle

  • NC is the only part of the string that is a guarantee
  • 123456 is the number I want to extract
  • Sarah Von Winkle is the name, it can be anything

So I cannot just split at 'S' and 'C' to try and grab the digits.

Code

Nothing tried so far.

Problem

I have no idea how to approach this.

How can I split the string to get only the digits in the middle?

2
  • Regex is your friend for something like this docs.python.org/3/library/re.html Commented Nov 6, 2020 at 17:14
  • Good question (context/task + optional code + problem + question) - good answer (at least I tried) 😉️ Commented Feb 1, 2022 at 21:58

4 Answers 4

1

You can use Regex for this:

import re
s='NC123456Sarah Von Winkle'
m=''.join(re.findall(r'NC(\d+).*',s))
print(int(m))
Sign up to request clarification or add additional context in comments.

2 Comments

Why join? Are we supposed to search a repetitive pattern .. and concatenate the found numbers to a single int ?
While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.
1

You can try re, which is the standard library of Python.

import re

sample_string = "NC123456Sarah Von Winkle"
result_digits = re.findall(r"\d+", sample_string, flags=0)

Then your result should be ['123456']. If you want just an integer instead of a string, you can convert it with int(result_digits[0]).

1 Comment

Nice concise regex \d+! Can you explain it and the reason for flags=0 ?
0

Use the regex module :

import re
s = "NC123456Sarah Von Winkle"
t = re.findall("[0-9]+",s)
print(t)

This will give :

['123456']

The regular-expression (pattern) is composed of:

  • character-range [0-9] will find all occurrences of any digit between 0 to 9 in the string s
  • quantifier + indicates, we are searching for at least one occurrence of the pattern before (e.g. [0-9]).

Comments

0

To match and capture (= extract) the number, you can use a regular-expression.

TL;DR: I would recommend re.match(r'NC(\d+)', s).group(1) (details in the last section).

Regex to match a number

To match a number with a minimum length of 1 digit, use the regular-expression (patter) \d+' for one or many digits, optionally inside a capturing-group as (\d+)` where:

  1. \d is a character class (meta-character) for digits (of range 0-9)
  2. + is a quantifier matching if at least one occurrence of preceding pattern was found
  3. ( and ) form a capturing-group of the enclosed sub-regex

Test your regex on regex101 or regexplanet and choose the right flavor/language/engine (here: Python).

In Python use the built-in regex module re. Define the regex as raw-string like r'\d+'.

Find to extract only the number or empty list

Either function re.findall to find a list of occurrences:

import re

s = 'NC123456Sarah Von Winkle'
pattern = r'\d+'
occurrences = re.findall(pattern, s)

print(occurrences)

Prints:

['123456']

The first number occurrences[0] is yours if not empty:

if len(occurrences) == 0:
    print('no number found in: ' + s)
else:
    number =  occurrences[0]

Split to get all parts

Or function re.split to split the string into parts:

import re

s = 'NC123456Sarah Von Winkle'
pattern = r'(\d+)'
parts = re.split(pattern, s)

print(parts)

Prints:

['NC', '123456', 'Sarah Von Winkle']

Note: without the capture-group (i.e. without parentheses ()) the output would be just: ['NC', 'Sarah Von Winkle'] (excluding the splitter-pattern)

Here you would get the number in second part parts[1] as long as non-number-prefix like "NC" is guaranteed and followed by a number.

Extract with a capturing-group

Use the group function together with a regex containing a capturing-group:

import re

s = 'NC123456Sarah Von Winkle'
capture_number_pattern = re.compile(r'NC(\d+)')
extracted = capture_number_pattern.match(s).group(1)

print(extracted)

Prints:

123456

Note: re.compile returns a compiled pattern. This can optimize performance when pattern is re-used multiple times and improve readability of the code.

Pay attention: To make your matching robust and defensive test if there is a match, otherwise an error is raised at runtime, see Python shell:

>>> extracted = capture_number_pattern.match('NCHelloWorld2022').group(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

You can test if a match was found or fail-fast if match is None:

s = 'NCHelloWorld2022'
match = capture_number_pattern.match(s)
if not match:
    print('No number found in:' + s)
else:
    print(match.group(1))

prints:

No number found in:NC123456Sarah Von Winkle

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.