0

I have been using Python 2.7 for some time now and have recently switched to Python 3. I have already updated my code on some points, but the problem I currently have deludes me. What I am trying to do is to load a dataset using np.loadtxt. Because this data also contains strings I am importing the full array as a string. I want to do type conversions after to convert some entries to float. This fails miserably and I do not understand why. All I see is that in Python 3 all strings get the prefix 'b' and I have the feeling this has something to do with this, but I cannot find a concise answer. Code and error below.

    filename = 'train.csv'
    raw_data = open(filename, 'rb')
    data = np.loadtxt(raw_data, delimiter=",", dtype = 'str')
    dataset = data[1:,1:]
    print(dataset)
    original_data = dataset
    test = float(dataset[0,0])
    print(test)

Result

    [["b'60'" "b'RL'" "b'65'" ..., "b'WD'" "b'Normal'" "b'208500'"]
     ["b'20'" "b'RL'" "b'80'" ..., "b'WD'" "b'Normal'" "b'181500'"]
     ["b'60'" "b'RL'" "b'68'" ..., "b'WD'" "b'Normal'" "b'223500'"]
     ..., 
     ["b'70'" "b'RL'" "b'66'" ..., "b'WD'" "b'Normal'" "b'266500'"]
     ["b'20'" "b'RL'" "b'68'" ..., "b'WD'" "b'Normal'" "b'142125'"]
     ["b'20'" "b'RL'" "b'75'" ..., "b'WD'" "b'Normal'" "b'147500'"]]
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-38-c154945cd6f1> in <module>()
          5 print(dataset)
          6 original_data = dataset
    ----> 7 test = float(dataset[0,0])
          8 print(test)

    ValueError: could not convert string to float: "b'60'"
2
  • Hi Sem, you probably don't have to open the file in the first place but can pass the filename to loadtxt. Also you might want to try numpy.genfromtxt which is more powerful for parsing data. Apart from that, can you provide a sample line from your data file? Commented Oct 24, 2016 at 10:04
  • Hi dnalow, using genfromtxt already fixes it. Thanks for your comment! Will post answer below. Kind regards, Sem Commented Oct 24, 2016 at 10:29

1 Answer 1

0

As suggested by dnalow, something goes wrong in the type conversion because I first open the file and then read from it. The solution is to not use open open(filename, 'rb') and np.loadtxt, but to use np.genfromtxt. Code below.

    filename = 'train.csv'
    data = np.genfromtxt(filename, delimiter=",", dtype = 'str')
    dataset = data[1:,1:]
    print(dataset)
    original_data = dataset
    test = float(dataset[0,0])
    print(test)
    filename = 'train.csv'
    data = np.genfromtxt(filename, delimiter=",", dtype = 'str')
    dataset = data[1:,1:]
    print(dataset)
    original_data = dataset
    test = float(dataset[0,0])
    print(test)

Result

    [['60' 'RL' '65' ..., 'WD' 'Normal' '208500']
     ['20' 'RL' '80' ..., 'WD' 'Normal' '181500']
     ['60' 'RL' '68' ..., 'WD' 'Normal' '223500']
     ..., 
     ['70' 'RL' '66' ..., 'WD' 'Normal' '266500']
     ['20' 'RL' '68' ..., 'WD' 'Normal' '142125']
     ['20' 'RL' '75' ..., 'WD' 'Normal' '147500']]
    60.0
Sign up to request clarification or add additional context in comments.

2 Comments

btw. you can also try not to specify the dtype. I think genfromtext will try an automatic type conversion and return a structured array. However, the usage of structured arrays is a bit different that for ndarrays so it might not be what you want.
Experiment with dtype=None; and read up on structured arrays.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.