How to slice a multidimensional array into two arrays?

Question

I have read all kinds of tutorials but I somehow can't implement those on my task.

My objective is to extract the data from a text file. And later on plot some histograms based on the data. However, I'm new to python and I'm stuck with the basics of slicing an array. In the text file there's a raw dataset; each item is in it's own row and each row has multiple attributes. The attributes are separated by commas.

I'm trying to split the dataset into two. The first attributes(cultivars) of each row into one array and the rest of the attributes(attributes of the given cultivar) of each item into a second array. The raw data is in 178 by 14 format.

I successfully got the first array extracted with the following code:

readFile = open('wine.data', 'r')
cultivar = np.loadtxt(readFile, delimiter=',', usecols=[0], unpack=True)

But when I try to make the second array, I run into problems.

readFile = open('wine.data','r')
attributes = np.loadtxt(readFile, delimiter=',', usecols=[-13], unpack=True)

Whatever I try to put into that usecols-method, it's either wrong by syntax (as the code above is) or I'll get a distorted array, like this:

[[ 1.00000000e+00 1.00000000e+00 1.00000000e+00 ..., 3.00000000e+00 3.00000000e+00 3.00000000e+00] [ 1.42300000e+01 1.32000000e+01 1.31600000e+01 ..., 1.32700000e+01 1.31700000e+01 1.41300000e+01] [ 1.71000000e+00 1.78000000e+00 2.36000000e+00 ..., 4.28000000e+00 2.59000000e+00 4.10000000e+00] ..., [ 1.04000000e+00 1.05000000e+00 1.03000000e+00 ..., 5.90000000e-01 6.00000000e-01 6.10000000e-01] [ 3.92000000e+00 3.40000000e+00 3.17000000e+00 ..., 1.56000000e+00 1.62000000e+00 1.60000000e+00] [ 1.06500000e+03 1.05000000e+03 1.18500000e+03 ..., 8.35000000e+02 8.40000000e+02 5.60000000e+02]]

The whole python code is here:

import numpy as np
import matplotlib.pyplot as plt
import urllib


readFile = open('wine.data', 'r')

first = np.loadtxt(readFile, delimiter=',', usecols=[0], unpack=True)
readFile = open('wine.data','r')
rest = np.loadtxt(readFile, delimiter=',', usecols=[-13], unpack=True)
readFile.close()

print rest

Raw data: http://pastebin.com/YqV1AZ3r

Oh, I forgot to show you guys the raw data. Here:pastebin.com/YqV1AZ3r — seppo
– seppo, Commented Dec 4, 2014 at 13:39

kalhartt · Accepted Answer · 2014-12-04 12:26:30Z

1

usecols needs to be a sequence of column indexes. So to get columns 1-13 you can do

readFile = open('wine.data', 'r')
rest = np.loadtxt(readFile, delimiter=',', usecols=range(1,14), unpack=True)

Also, there is no need to read the file twice. You can read the file once and split (using numpy's indexing) like so

readFile = open('wine.data', 'r')
data = np.loadtxt(readFile, delimiter=',', unpack=True)
first = data[:,0]
rest = data[:,1:]

answered Dec 4, 2014 at 12:26

kalhartt

4,13723 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JNevens Over a year ago

I was just about to answer the same thing. Just read the data file at once and split it using :

seppo Over a year ago

Woah. That wasn't complicated. That does the trick! Thank you very much! I'd upvote your answer if I could. And thanks for the file-reading tip.

Collectives™ on Stack Overflow

How to slice a multidimensional array into two arrays?

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related