1

I'm reading in data and trying to create a NumPy array of shape (194, 1). So it should look like: [[4], [0], [9], ...]

I'm doing this:

def parse_data(file_name):
    data = []
    target = []
    with open(file_name) as f:
        for line in f:
            temp = line.split()
            x = [float(x) for x in temp[:2]]
            y = float(temp[2])
            data.append(np.array(x))
            target.append(np.array(y))
    return np.array(data), np.array(target)

x, y = parse_data("data.txt")

when I inspect y.shape, it's (194,), not (194,1) as I expected.

The x has shape (194,2) as I'd expect, however.

Any idea what I'm doing incorrectly?

Thanks!

1
  • Can you provide some lines from data.txt? Commented Apr 12, 2018 at 20:22

3 Answers 3

3

You seem to have expected np.array(y) to automatically turn your scalar y into a 1-element row. That's not how NumPy works.

np.array(y) is 0-dimensional. Putting a bunch of those in a list and calling array on the list produces a 1-dimensional result, not a 2-dimensional one.

Sign up to request clarification or add additional context in comments.

Comments

1

When np.array() is called on a list of numpy arrays built from scalars it concatenates them and then creates a numpy array, giving you your (194,) shape.

You can always reshape y to your desired shape:

def parse_data(file_name):
    data = []
    target = []
    with open(file_name) as f:
        for line in f:
            temp = line.split()
            x = [float(x) for x in temp[:2]]
            y = float(temp[2])
            data.append(np.array(x))
            target.append(y)
    return np.array(data), np.array(target).reshape(-1, 1)

x, y = parse_data("data.txt")

Of course you can also fix your problem with:

target.append(np.array([y]))

An example of the behavior I stated above:

import numpy as np
a = np.array(5)
b = np.array(4)
v = [a, b]
v
>>>[array(5), array(4)]
np.array(v)
>>>array(5, 4) #arrays are concatenated

3 Comments

return np.array(data), np.array(target).reshape(-1, 1) likely to be better, in case the amount of data varies.
I think it should be (-1, 1), not (1, -1).
You could also use return np.array(data), np.vstack(target) A bit more concise and works as long as target is 1D.
0

I'd skip the np.array in the iteration.

def parse_data(file_name):
    data = []
    target = []
    with open(file_name) as f:
        for line in f:
            temp = line.split()
            x = [float(x) for x in temp[:2]]
            y = float(temp[2])
            data.append(x)
            target.append(y)
    return np.array(data), np.array(target)

This would create data like:

 [[1.0, 2.0],[3.0, 4.0], ....]

and target like

 [1.2, 3.2, 3.1, ...]

np.array(data) then turns the list of lists into a 2d array, and the list of numbers into a 1d array.

It is then easy to reshape or add a dimension to the 1d, making it (1,n) or (n,1) or what ever you need.

Remember the basic array construction methods are:

np.array([1,2,3])             # 1d
np.array([[1,2],[3,4]])       # 2d

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.