0

I'm trying to extract unit information from a text file. This function always returns 'm' regardless of the real unit in the file. What am I doing wrong?

def get_seba_unit(file):
    with open(file) as f:
        unit = ''
        lines = f.readlines()
        if lines[10].find('m'):
            unit = 'm'
        elif lines[10].find('cm'):
            unit = 'cm'
        elif lines[10].find('°C'):
            unit = '°C'
        print('found Unit: ' + unit + ' for sensor: ' + file)
        return(unit)
4
  • 1
    what does the line say? it's looking for an 'm' anywhere in the line, not just at the place you want it to look. Commented Mar 14, 2017 at 14:23
  • e.g. 01.01.2016 00:10:47 0,427 m Commented Mar 14, 2017 at 14:27
  • 2
    find returns position of occurence or -1 if sequence not found. -1 in if is interpreted as True. Commented Mar 14, 2017 at 14:27
  • and in your case first if statement will always get true if that line contains m character not in 0 index and all other if statement will get neglected Commented Mar 14, 2017 at 14:28

2 Answers 2

1

This does not do what you think it does:

if lines[10].find('m'):

find returns the index of the thing you are looking for, or -1 if it's not found. So unless m is the first character on the line (index 0), your condition will always be True (In Python a non-zero number is truthy)

You might want to try if 'm' in line[10] instead

Also, check for cm before m, otherwise you'll never find cm

Sign up to request clarification or add additional context in comments.

Comments

0

If what you're looking for is a way to extract out units from your data, i'd use some simple regex like the below one:

import io
import re
from collections import defaultdict

data = io.StringIO("""

1cm

2m

3°C

1cm 10cm

2m 20m

3°C           30°C

""")


def get_seba_unit(file):
    floating_point_regex = "([-+]?\d*\.\d+|\d+)"
    content = file.read()
    res = defaultdict(set)

    for suffix in ['cm', 'm', '°C']:
        p = re.compile(floating_point_regex + suffix)
        matches = p.findall(content)
        for m in matches:
            res[suffix].add(m)

    return dict(res)

print(get_seba_unit(data))

And you'd get an output like this one:

{'cm': {'1', '10'}, '°C': {'3', '30'}, 'm': {'2', '20'}}

Of course, the above code is just assuming your units will be floating point units but the main idea would be attacking this problem using regular expressions.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.