4

I am trying to read a csv file in python 27 to create a dictionary. CSV file looks like-

SI1440269,SI1320943,SI1321085 SI1440270,SI1320943,SI1321085,SI1320739 SI1440271,SI1320943
SI1440273,SI1321058,SI1320943,SI1320943

Number of entries in each row are not fixed. First column entries should be my keys. My code -

import csv
reader = csv.reader(open('test.csv'))

result = {}
for column in reader:
    key = column[0]
    if key in result:
        pass
    result[key] = column[1:]
print result

Output:

{'SI1440273': ['SI1321058', 'SI1320943', 'SI1320943'], '': ['', '', ''], 'SI1440271': ['SI1320943', '', ''], 'SI1440270': ['SI1320943', 'SI1321085', 'SI1320739'], 'SI1440269': ['SI1320943', 'SI1321085', '']}

How can I get rid of null values in the output? Also, how can I have my key values in the output to be in the same order as in csv file?

Edit: I want single row per 'key'

7
  • 1
    Just for the record it is not really a csv file. Commented Jul 5, 2015 at 18:54
  • Also for the record, I believe the variable you define as column is actually a row :) Commented Jul 5, 2015 at 18:57
  • I am not sure if understand what is the expected output here. Do you want to keep only a single row per "key"? Commented Jul 5, 2015 at 19:00
  • I just ran your program and I'm getting different results: {'SI1440270 SI1320943 SI1321085 SI1320739 SI1440271 SI1320943': [], 'SI1440273 SI1321058 SI1320943 SI1320943': [], 'SI1440269 SI1320943 SI1321085': []}. Can you explain a little more what you want here? Commented Jul 5, 2015 at 19:02
  • 1
    Your for loop iterates over each row in your csv file, not each column. You can see this if you put a print statement at the top of your loop: print(column). This will print a row of your file, not a column. Commented Jul 5, 2015 at 19:48

3 Answers 3

4

You could use csv.DictReader as follows:

import csv

result = {}
with open('test.csv') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=" ", fieldnames=["id"], restkey="data")
    for row in reader:
        print row
        result[row["id"]] = row["data"]

print result

This would give you a per-row dictionary solution, so you could process it a line at a time. I also then append them all into one single result dictionary.

From this you will get the following output:

{'data': ['SI1320943', 'SI1321085'], 'id': 'SI1440269'}
{'data': ['SI1320943', 'SI1321085', 'SI1320739', 'SI1440271', 'SI1320943'], 'id': 'SI1440270'}
{'data': ['SI1321058', 'SI1320943', 'SI1320943'], 'id': 'SI1440273'}
{'SI1440273': ['SI1321058', 'SI1320943', 'SI1320943'], 'SI1440270': ['SI1320943', 'SI1321085', 'SI1320739', 'SI1440271', 'SI1320943'], 'SI1440269': ['SI1320943', 'SI1321085']}
Sign up to request clarification or add additional context in comments.

Comments

3

try this

import csv
reader = csv.reader(open('test.csv'))

result = {row[0]:row[1:] for row in reader if row and row[0]}
print result

if you want further more to eliminate null in values then do as bellow

import csv
reader = csv.reader(open('test.csv'))

result = {row[0]:[i for i in row[1:] if i] for row in reader if row and row[0]}
print result

To preserve the order of entry

from collections import OrderedDict
result = OrderedDict()
for row in reader:
   if row and row[0]:
      result[row[0]]=[i for i in row[1:] if i]

# print result
for key in result:
   print key,":" ,result[key]

5 Comments

This solved my problem partially. I want my key values in the output to be in the same order as in csv file. Its not happening with your code.
from collections import OrderedDict and use it
Used but didn't work. Code- import csv from collections import OrderedDict result = OrderedDict() reader = csv.reader(open('test.csv')) result = {row[0]:[i for i in row[1:] if i] for row in reader if row and row[0]} print result
This is Just for your knowledge dictionary in python is a hash table which don't preserve any order where as if you want to preserve order of entry of key then use OrderedDict from collections I have added third sample code make use of it.
is it compulsory that the word 'OrderedDict' comes at the beginning of the output?
2

As already noted this is not CSV - so readline and split would be more appropriate and use OrderedDict to keep input order:

import csv
from collections import OrderedDict
result = OrderedDict()
with open('test.csv') as f:
    for row  in f:
        row=row.strip().split()
        key = row[0]
        result[key] = row[1:]
print result

2 Comments

Why it is not CSV. Care to explain?
CSV = Comma Separated Value - so fields are separated by comma, here I see that they are separated by space - so split is easier - csv reader gives result as noted in comment by @hobenkr

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.