Restructure CSV file with Python

Question

I have a csv file that looks like this:

Date     Name    Wage
5/1/19   Joe     $100
5/1/19   Sam     $120
5/1/19   Kate    $30
5/2/19   Joe     $120
5/2/19   Sam     $134
5/2/19   Kate    $56
5/3/19   Joe     $89
5/3/19   Sam     $90
5/3/19   Kate    $231

I would like to restructure it to look like this:

Date      Joe    Sam    Kate
5/1/19    $100   $120   $30
5/2/19    $120   $134   $56
5/3/19    $89    $90    $231

I am not sure how to approach it. Here is what I started writing:

import csv

with open ('myfile.csv', 'rb') as filein, open ('restructured.csv', 'wb') as fileout:
  rows = list(csv.DictReader(filein, skipinitialspace=True))
  names = NOT SURE HOW TO GET THIS
  fieldnames = ['Date'] + ['{}'.format(i) for i in names]
  csvout = csv.DictWriter(fileout, fieldnames=fieldnames, extrasaction='ignore', restval='NA')
  csvout.writeheader()
  for row in rows:
    row['{}'.format(row['Name'].strip())] = row['Wage']
    csvout.writerow(row)

The csv modules is just a parser that yields the CSV rows as tuples or dicts. It does not transform by itself the rows into something else. — glenfant
– glenfant, Commented Jun 5, 2019 at 15:31
Thank you. Would you mind pointing me at pandas example that does something similar? — manticora
– manticora, Commented Jun 5, 2019 at 15:33
@manticora This video could help you: youtube.com/watch?v=dcqPhpY7tWk — hpelleti
– hpelleti, Commented Jun 5, 2019 at 15:42
What is the separator? Does list(csv.DictReader(filein, skipinitialspace=True)) return what you expect? — Serge Ballesta
– Serge Ballesta, Commented Jun 5, 2019 at 15:45

Serge Ballesta · Accepted Answer · 2019-06-05 17:13:50Z

2

It can be done with the csv module. Here is the way for Python 3:

import csv
import collections

with open ('myfile.csv', 'r') as filein, open ('restructured.csv', 'w', newline='') as fileout:
    data = collections.defaultdict(dict)
    names = set()
    for row in csv.DictReader(filein, skipinitialspace=True):
        data[row['Date']][row['Name']] = row['Wage']
        names.add(row['Name'])
    csvout = csv.DictWriter(fileout, fieldnames = ['Date'] + list(names))
    csvout.writeheader()
    for dat in sorted(data.keys()):
        row = data[dat]
        row['Date'] = dat
        csvout.writerow(row)

The generated csv should look like:

Date,Kate,Joe,Sam
5/1/19,$30,$100,$120
5/2/19,$56,$120,$134
5/3/19,$231,$89,$90

It is the same for Python 2 except for the first line which should be:

with open ('myfile.csv', 'rb') as filein, open ('restructured.csv', 'wb') as fileout:

edited Jun 5, 2019 at 17:13

answered Jun 5, 2019 at 16:06

Serge Ballesta

150k13 gold badges137 silver badges267 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

manticora Over a year ago

It did work for me - thank you very much! But the data is not sorted by date :( My first column looks like this: Date 5/1/19 5/2/19 5/19/19 5/29/19 5/24/19 5/27/19 5/21/19 5/9/19 I tried sorting it with python afterwords, but got the following error: ValueError: time data '5' does not match format '%m-%d-%y'

Serge Ballesta Over a year ago

It can easily be sorted by date. See my edit at for dat in sorted(data.keys()):

manticora Over a year ago

I think it doesn't recognize it as date because this time it sorted it this way: 5/1/19, 5/10/19, 5/11/19 and so on

RomanPerekhrest · Accepted Answer · 2019-06-05 16:18:57Z

2

Simply with pandas library:

import pandas as pd

df = pd.read_csv("test.csv", sep="\s+")
p_table = pd.pivot_table(df, values='Wage', columns=['Name'], index='Date', 
                         aggfunc=lambda x:x)
p_table = p_table.reset_index()
p_table.columns.name = None

print(p_table)

The output:

     Date   Joe  Kate   Sam
0  5/1/19  $100   $30  $120
1  5/2/19  $120   $56  $134
2  5/3/19   $89  $231   $90

Reference links:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

edited Jun 5, 2019 at 16:18

answered Jun 5, 2019 at 15:58

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

1 Comment

m13op22 Over a year ago

I like your aggregating function here, I hadn't seen or thought of that before.

m13op22 · Accepted Answer · 2019-06-05 16:35:42Z

1

What you want to do is also known as converting from long to wide format. Using pandas you can easily do this by

import pandas as pd

df = pd.read_csv("myfile.csv", sep = ',')

# Restructure the dataframe
tdf = df.pivot(index = 'Date', columns = 'Name', values = 'Wage')

tdf.to_csv("restructured.csv", sep = ',')

print(tdf)
Name     Joe  Kate   Sam
Date                    
5/1/19  $100   $30  $120
5/2/19  $120   $56  $134
5/3/19   $89  $231   $90

edited Jun 5, 2019 at 16:35

answered Jun 5, 2019 at 16:14

m13op22

2,3792 gold badges20 silver badges41 bronze badges

Comments

Kurtis Streutker · Accepted Answer · 2019-06-05 15:53:47Z

This should get you on the right track

data.csv

5/1/19,Joe,$100
5/1/19,Sam,$120
5/1/19,Kate,$30
5/2/19,Joe,$120
5/2/19,Sam,$134
5/2/19,Kate,$56
5/3/19,Joe,$89
5/3/19,Sam,$90
5/3/19,Kate,$231

data = {}
people = set()
with open('data.csv', 'r') as f:
    for line in f.read().splitlines():
        values = line.split(',')

        if values[0] not in data:
            data[values[0]] = {}

        data[values[0]][values[1]] = values[2]
        people.add(values[1])

print('Date,' + ','.join([per for per in people]))
for date in data:
    print(f"{date},{','.join([data[date][per] for per in people])}"

output:

Date,Sam,Kate,Joe
5/1/19,$120,$30,$100
5/2/19,$134,$56,$120
5/3/19,$90,$231,$89

I think OP wants to save as a CSV file, not print the outputs.

Collectives™ on Stack Overflow

Restructure CSV file with Python

4 Answers 4

3 Comments

1 Comment

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

1 Comment

Comments

1 Comment

Related