1

I have read all kinds of tutorials but I somehow can't implement those on my task.

My objective is to extract the data from a text file. And later on plot some histograms based on the data. However, I'm new to python and I'm stuck with the basics of slicing an array. In the text file there's a raw dataset; each item is in it's own row and each row has multiple attributes. The attributes are separated by commas.

I'm trying to split the dataset into two. The first attributes(cultivars) of each row into one array and the rest of the attributes(attributes of the given cultivar) of each item into a second array. The raw data is in 178 by 14 format.

I successfully got the first array extracted with the following code:

readFile = open('wine.data', 'r')
cultivar = np.loadtxt(readFile, delimiter=',', usecols=[0], unpack=True)

But when I try to make the second array, I run into problems.

readFile = open('wine.data','r')
attributes = np.loadtxt(readFile, delimiter=',', usecols=[-13], unpack=True)

Whatever I try to put into that usecols-method, it's either wrong by syntax (as the code above is) or I'll get a distorted array, like this:

[[ 1.00000000e+00 1.00000000e+00 1.00000000e+00 ..., 3.00000000e+00 3.00000000e+00 3.00000000e+00] [ 1.42300000e+01 1.32000000e+01 1.31600000e+01 ..., 1.32700000e+01 1.31700000e+01 1.41300000e+01] [ 1.71000000e+00 1.78000000e+00 2.36000000e+00 ..., 4.28000000e+00 2.59000000e+00 4.10000000e+00] ..., [ 1.04000000e+00 1.05000000e+00 1.03000000e+00 ..., 5.90000000e-01 6.00000000e-01 6.10000000e-01] [ 3.92000000e+00 3.40000000e+00 3.17000000e+00 ..., 1.56000000e+00 1.62000000e+00 1.60000000e+00] [ 1.06500000e+03 1.05000000e+03 1.18500000e+03 ..., 8.35000000e+02 8.40000000e+02 5.60000000e+02]]

The whole python code is here:

import numpy as np
import matplotlib.pyplot as plt
import urllib


readFile = open('wine.data', 'r')

first = np.loadtxt(readFile, delimiter=',', usecols=[0], unpack=True)
readFile = open('wine.data','r')
rest = np.loadtxt(readFile, delimiter=',', usecols=[-13], unpack=True)
readFile.close()

print rest

Raw data: http://pastebin.com/YqV1AZ3r

2
  • What does the raw data look like? Commented Dec 4, 2014 at 12:20
  • Oh, I forgot to show you guys the raw data. Here:pastebin.com/YqV1AZ3r Commented Dec 4, 2014 at 13:39

1 Answer 1

1

usecols needs to be a sequence of column indexes. So to get columns 1-13 you can do

readFile = open('wine.data', 'r')
rest = np.loadtxt(readFile, delimiter=',', usecols=range(1,14), unpack=True)

Also, there is no need to read the file twice. You can read the file once and split (using numpy's indexing) like so

readFile = open('wine.data', 'r')
data = np.loadtxt(readFile, delimiter=',', unpack=True)
first = data[:,0]
rest = data[:,1:]
Sign up to request clarification or add additional context in comments.

2 Comments

I was just about to answer the same thing. Just read the data file at once and split it using :
Woah. That wasn't complicated. That does the trick! Thank you very much! I'd upvote your answer if I could. And thanks for the file-reading tip.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.