0

I am new to Python, and I am trying to sort of 'migrate' a excel solver model that I have created to Python, in hopes of more efficient processing time.

I receive a .csv sheet that I use as my input for the model, it is always in the same format.

This model essentially uses 4 different metrics associated with product A, B and C, and I essentially determine how to price A, B, and C accordingly.

I am at the very nascent stage of effectively inputting this data to Python. This is what I have, and I would not be surprised if there is a better approach, so open to trying anything you veterans have to recommend!

import csv

f = open("141881.csv")
for row in csv.reader(f):

    price = row[0]

    a_metric1 = row[1]
    a_metric2 = row[2]
    a_metric3 = row[3]
    a_metric4 = row[4]

    b_metric1 = row[7]
    b_metric2 = row[8]
    b_metric3 = row[9]
    b_metric4 = row[10]

    c_metric1 = row[13]
    c_metric2 = row[14]
    c_metric3 = row[15]
    c_metric4 = row[16]

The .csv file comes in the format of price,a_metric1,a_metric2,a_metric3,a_metric4,,price,b_metric1,b_metric2,b_metric3,b_metric4,price,,c_metric1,c_metric2,c_metric3,c_metric4

I skip the second and third price column as they are identical to the first one.

However when I run the python script, I get the following error:

    c_metric1 = row[13]
IndexError: list index out of range

And I have no idea why this occurs, when I can see the data is there myself (in excel, this .csv file would go all the way to column Q, or what I understand as row[16].

Your help is appreciated, and any advice on my approach is more than welcomed.

Thanks in advance!

3
  • Take a look into pandas. You can use it's very powerful from_csv method and various other methods. Commented Sep 29, 2014 at 21:10
  • You may want to consider using a DictReader. That way, instead of having to assign each column to a variable a_metric3 or c_metric2, you can just use row['a_metric3'] and row['c_metric2']. Or you may want to put this into some 2D structure, like metric['a'][0], or just metric[0, 0]. Really, without knowing what you're planning to do with this data, it's hard to say how you should organize it. Commented Sep 29, 2014 at 21:14
  • 1
    Meanwhile, if you want us to debug your problem, you need to include a minimal, complete, verifiable example. That includes some minimal input data that, when fed into this program, reproduces your error. The problem could well be that some of your rows only have 13 columns, but how could we possibly know that, or tell you how to deal with it, if we can't see the input? Commented Sep 29, 2014 at 21:15

4 Answers 4

1

Using print() can be your friend here:

import csv
with open('141881.csv') as file_handle:
    file_reader = csv.reader(file_handle)
    for row in file_reader:
        print(row)

The code above will print out EACH row.

To print out ONLY the first row replace the for loop with: print(file_reader.__next__()) (assuming Python3)

Printing out row(s) will allow you to see what exactly a "row" is.

P.S. Using with is advisable because it handles the opening and closing of the file for you

Sign up to request clarification or add additional context in comments.

Comments

1

Look into pandas.

Read file as:

data = pd.read_csv('141881.csv'))

to read a columns:

col = data.columns['column_name']

to read a row:

row = data.ix[row_number]

Comments

0
  • CSV Module in Python transforms a spreadsheet into a matrice : a list of list

The python module to read csv transform each line of your input into a list. For each row, it will split the row into a list of cell.In other words, one array is composed of as many columns you have into your excel spreadsheet.

Try in terminal:

>>> f = open("141881.csv")
>>> print csv.reader(f)
>>>[["id", "name", "company", "email"],[1563, "defoe", "SuperFastCompany",],["[email protected]"],[1564, "doe", "Awsomestartup", "doe@awesomestartup"], ...]`

So that's why you iterate throught the rows of your spreadsheet assigning the value into a new variable.

I recommend you to read on basics of list manipulation.

But...

  • What is an IndexError? catching exception:

If one cell is empty or one row has less columns than other: it will thraw an Error. Such as you described. IndexError means Python wasn't able to find a value for this specific cell. In other words if some line of your excel spreadsheet are smaller than the other it will say there is no such value to asign and throw an Index Error. That why knowing how to catch exception could be very useful to see the problem. Try to verify that the list of each has the same lenght if not assign an empty value for example

try:
#if row has always 17 cells with values 
#I can just assign it directly using a little trick
 price,a_metric1,a_metric2,a_metric3,a_metric4,,price,b_metric1,b_metric2,b_metric3,b_metric4,price,c_metric1,c_metric2,c_metric3,c_metric4 = row'
except IndexError:
    # if there is no 17 cells 
    # tell me how many cells is actually in the list
    # you will see there that there less than 17 elements

  print len(row)

Now you can just skip the error by assigning None value to those who don't appears in the csv file

You can read more about Catching Exception

Comments

0

Thanks everyone for your input - printing the results made me realize that I was getting the IndexError because of the very first row, which only had headers. Skipping that row got rid of the error.

I will look into pandas, it seems like that will be useful for the type of work I am doing.

Thanks again for all of your help, much appreciated.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.