1

I'm trying to extract all the phone numbers from a CSV document and append them to a list in string format. Here is a sample of my input:

[email protected],John,Doe,,,(555) 555-5555

And here is the code I am using:

l = []
with open('sample.csv', 'r') as f:
    reader = csv.reader(f)
    for x in reader:
        number = re.search(r'.*?@.*?,.*?,.*?,.*?,.*?,(.*?),',x)
        if number in x:
            l.append(''.join(number))

Basically, I'm trying to check if there is a number at a certain position in the row (where the parentheses are) and then append that to a list as a string using join. However, I keep getting this error:

Traceback (most recent call last):
  File "C:/Users/svillamil/Desktop/Final Phone.py", line 14, in <module>
    number = re.search(b'.*?@.*?,.*?,.*?,.*?,.*?,(.*?),', x)
  File "C:\Users\svillamil\AppData\Local\Programs\Python\Python36-32\lib\re.py", line 182, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

How do I get around this?

3
  • The use of regex is mandatory? Commented Jan 10, 2017 at 16:14
  • You're using the wrong tool for the job. Also, x is not a string. Look at the documentation for the csv library. Commented Jan 10, 2017 at 16:18
  • 1
    why don't you just split a file line by comma and iterate through elements checking each if it matches (???) ? Commented Jan 10, 2017 at 16:18

3 Answers 3

3

Iterating over a csv.reader gives you a list of strings for each row.

Taking the value at index 5 already gives you the phone number (if I counted correctly). You don't need a regular expression to do this.

l = []
with open('sample.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        number = row[5]
        if number:
            l.append(number)

(Conversely, if you insisted on using a regular expression, you wouldn't need csv to do the splitting and could just iterate over the raw lines of the file.)

Sign up to request clarification or add additional context in comments.

1 Comment

This worked. Thank you. I was so focused on using regular expressions, I didn't even think to use the index numbers.
1

You should just split a file line by comma and iterate through elements checking each if it matches (...), assuming a phone number can appear at any delimited position in a file line:

import re

result = []

with open('sandbox.txt', 'r') as f:
    fileLines = f.readlines()

for fileLine in fileLines:
    fileLine = fileLine.strip()
    lineElems = fileLine.split(',')

    for lineElem in lineElems:

        pattern = re.compile("\(...\)")

        if pattern.match(lineElem):
            print("Adding %s" % lineElem)
            result.append(lineElem)

2 Comments

"Just split a file line by comma" – that's exactly what the csv module is for.
i was trying to offer a more elegant alternative to "(b'.*?@.*?,.*?,.*?,.*?,.*?,(.*?),', x)", which is ugly and chaotic
-1

x is a list which contains each field of the row.

So one approach is to join the array and then apply the regex,

foo=','.join(x)
number = re.search(r'.*?@.*?,.*?,.*?,.*?,.*?,(.*?),', foo)

Or you can iterate over each field in the row and check if its a number

for row in reader:
   for field in row:
       number = re.search(r'<phone-number-regex>', field)
       if number in x:
           l.append(''.join(number))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.