Grabbing CSV Information with Regex in Python

Question

I'm trying to extract all the phone numbers from a CSV document and append them to a list in string format. Here is a sample of my input:

[email protected],John,Doe,,,(555) 555-5555

And here is the code I am using:

l = []
with open('sample.csv', 'r') as f:
    reader = csv.reader(f)
    for x in reader:
        number = re.search(r'.*?@.*?,.*?,.*?,.*?,.*?,(.*?),',x)
        if number in x:
            l.append(''.join(number))

Basically, I'm trying to check if there is a number at a certain position in the row (where the parentheses are) and then append that to a list as a string using join. However, I keep getting this error:

Traceback (most recent call last):
  File "C:/Users/svillamil/Desktop/Final Phone.py", line 14, in <module>
    number = re.search(b'.*?@.*?,.*?,.*?,.*?,.*?,(.*?),', x)
  File "C:\Users\svillamil\AppData\Local\Programs\Python\Python36-32\lib\re.py", line 182, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

How do I get around this?

You're using the wrong tool for the job. Also, x is not a string. Look at the documentation for the csv library. — user554546
– user554546, Commented Jan 10, 2017 at 16:18
why don't you just split a file line by comma and iterate through elements checking each if it matches (???) ? — amphibient
– amphibient, Commented Jan 10, 2017 at 16:18

mkrieger1 · Accepted Answer · 2017-01-10 16:21:50Z

3

Iterating over a csv.reader gives you a list of strings for each row.

Taking the value at index 5 already gives you the phone number (if I counted correctly). You don't need a regular expression to do this.

l = []
with open('sample.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        number = row[5]
        if number:
            l.append(number)

(Conversely, if you insisted on using a regular expression, you wouldn't need csv to do the splitting and could just iterate over the raw lines of the file.)

answered Jan 10, 2017 at 16:21

mkrieger1

24.2k7 gold badges68 silver badges83 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

SVill Over a year ago

This worked. Thank you. I was so focused on using regular expressions, I didn't even think to use the index numbers.

amphibient · Accepted Answer · 2017-01-10 18:12:06Z

1

You should just split a file line by comma and iterate through elements checking each if it matches (...), assuming a phone number can appear at any delimited position in a file line:

import re

result = []

with open('sandbox.txt', 'r') as f:
    fileLines = f.readlines()

for fileLine in fileLines:
    fileLine = fileLine.strip()
    lineElems = fileLine.split(',')

    for lineElem in lineElems:

        pattern = re.compile("\(...\)")

        if pattern.match(lineElem):
            print("Adding %s" % lineElem)
            result.append(lineElem)

edited Jan 10, 2017 at 18:12

answered Jan 10, 2017 at 16:32

amphibient

31.7k56 gold badges159 silver badges254 bronze badges

2 Comments

mkrieger1 Over a year ago

"Just split a file line by comma" – that's exactly what the csv module is for.

amphibient Over a year ago

i was trying to offer a more elegant alternative to "(b'.*?@.*?,.*?,.*?,.*?,.*?,(.*?),', x)", which is ugly and chaotic

pragman · Accepted Answer · 2017-01-10 16:23:36Z

-1

x is a list which contains each field of the row.

So one approach is to join the array and then apply the regex,

foo=','.join(x)
number = re.search(r'.*?@.*?,.*?,.*?,.*?,.*?,(.*?),', foo)

Or you can iterate over each field in the row and check if its a number

for row in reader:
   for field in row:
       number = re.search(r'<phone-number-regex>', field)
       if number in x:
           l.append(''.join(number))

answered Jan 10, 2017 at 16:23

pragman

1,64417 silver badges20 bronze badges

Collectives™ on Stack Overflow

Grabbing CSV Information with Regex in Python

3 Answers 3

1 Comment

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Related