Read specific columns from a csv file with csv module?

Question

I'm trying to parse through a csv file and extract the data from only specific columns.

Example csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

I'm trying to capture only specific columns, say ID, Name, Zip and Phone.

Code I've looked at has led me to believe I can call the specific column by its corresponding number, so ie: Name would correspond to 2 and iterating through each row using row[2] would produce all the items in column 2. Only it doesn't.

Here's what I've done so far:

import sys, argparse, csv
from settings import *

# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
 fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file

# open csv file
with open(csv_file, 'rb') as csvfile:

    # get number of columns
    for line in csvfile.readlines():
        array = line.split(',')
        first_item = array[0]

    num_columns = len(array)
    csvfile.seek(0)

    reader = csv.reader(csvfile, delimiter=' ')
        included_cols = [1, 2, 6, 7]

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content

and I'm expecting that this will print out only the specific columns I want for each row except it doesn't, I get the last column only.

@Elazar: in Python 2 (which the OP is using) "rb" is appropriate for passing to csv.reader. — DSM
– DSM, Commented May 12, 2013 at 2:14
Why does your example CSV file show the pipe character as the delimiter but your example code use a space? — Kelly S. French
– Kelly S. French, Commented Sep 30, 2015 at 14:06
@KellyS.French I thought it would help visualize the data for the purposes of this question. — frankV
– frankV, Commented Apr 22, 2017 at 15:37

Ryan Saxe · Accepted Answer · 2013-05-12 03:06:30Z

The only way you would be getting the last column from this code is if you don't include your print statement in your for loop.

This is most likely the end of your code:

for row in reader:
    content = list(row[i] for i in included_cols)
print content

You want it to be this:

for row in reader:
        content = list(row[i] for i in included_cols)
        print content

Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.

Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:

import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']

so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:

names = df.Names

It's a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn't happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!

Is it possible to remove the index numbers from the query? @Ryan Saxe
"Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module." Ah, yes you're not doing Python until you've used Pandas!

Subash · Accepted Answer · 2022-06-05 08:37:31Z

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('file.txt') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

print(columns['name'])
print(columns['phone'])
print(columns['street'])

With a file like

name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.

Will output

>>> 
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']

Or alternatively if you want numerical indexing for the columns:

with open('file.txt') as f:
    reader = csv.reader(f)
    next(reader)
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
print(columns[0])

>>> 
['Bob', 'James', 'Smithers']

To change the deliminator add delimiter=" " to the appropriate instantiation, i.e reader = csv.reader(f,delimiter=" ")

VasiliNovikov · Accepted Answer · 2019-02-17 16:56:51Z

36

Use pandas:

import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']

Discard unneeded columns at parse time:

my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

P.S. I'm just aggregating what other's have said in a simple manner. Actual answers are taken from here and here.

edited Feb 17, 2019 at 16:56

answered May 23, 2017 at 9:05

VasiliNovikov

10.5k5 gold badges52 silver badges65 bronze badges

2 Comments

frankV Over a year ago

I think Pandas is a perfectly acceptable solution. I use Pandas often and really like the library, but this question specifically referenced the CSV module.

VasiliNovikov Over a year ago

@frankV Well, the title, the tags and the first paragraph do not forbid pandas in any way, AFAI can see. I've actually just hoped to add a simpler answer to those already made here (other answers use pandas, too).

G M · Accepted Answer · 2014-01-16 16:19:26Z

You can use numpy.loadtext(filename). For example if this is your database .csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

And you want the Name column:

import numpy as np 
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))

>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

More easily you can use genfromtext:

b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

user2285236 · Accepted Answer · 2016-12-06 20:26:01Z

With pandas you can use read_csv with usecols parameter:

df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

Example:

import pandas as pd
import io

s = '''
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
'''

df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])
print(df)

   total_bill  day  size
0       16.99  Sun     2
1       10.34  Sun     3
2       21.01  Sun     3

Tshilidzi Mudau · Accepted Answer · 2016-07-20 00:37:00Z

Context: For this type of work you should use the amazing python petl library. That will save you a lot of work and potential frustration from doing things 'manually' with the standard csv module. AFAIK, the only people who still use the csv module are those who have not yet discovered better tools for working with tabular data (pandas, petl, etc.), which is fine, but if you plan to work with a lot of data in your career from various strange sources, learning something like petl is one of the best investments you can make. To get started should only take 30 minutes after you've done pip install petl. The documentation is excellent.

Answer: Let's say you have the first table in a csv file (you can also load directly from the database using petl). Then you would simply load it and do the following.

from petl import fromcsv, look, cut, tocsv 

#Load the table
table1 = fromcsv('table1.csv')
# Alter the colums
table2 = cut(table1, 'Song_Name','Artist_ID')
#have a quick look to make sure things are ok. Prints a nicely formatted table to your console
print look(table2)
# Save to new file
tocsv(table2, 'new.csv')

Nuriddin Kudratov · Accepted Answer · 2020-02-13 11:38:26Z

5

I think there is an easier way

import pandas as pd

dataset = pd.read_csv('table1.csv')
ftCol = dataset.iloc[:, 0].values

So in here iloc[:, 0], : means all values, 0 means the position of the column. in the example below ID will be selected

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

answered Feb 13, 2020 at 11:38

Nuriddin Kudratov

5185 silver badges14 bronze badges

Comments

Hari K · Accepted Answer · 2019-05-30 16:58:38Z

3

import pandas as pd 
csv_file = pd.read_csv("file.csv") 
column_val_list = csv_file.column_name._ndarray_values

answered May 30, 2019 at 16:58

Hari K

2694 silver badges8 bronze badges

1 Comment

user3064538 Over a year ago

You'll have to pip install pandas first

AJW · Accepted Answer · 2022-11-13 18:26:24Z

2

From CSV File Reading and Writing you can import csv and use this code:

with open('names.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(row['first_name'], row['last_name'])

edited Nov 13, 2022 at 18:26

answered Nov 13, 2022 at 18:24

AJW

212 bronze badges

Comments

vestland · Accepted Answer · 2018-12-10 08:33:55Z

Thanks to the way you can index and subset a pandas dataframe, a very easy way to extract a single column from a csv file into a variable is:

myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']

A few things to consider:

The snippet above will produce a pandas Series and not dataframe. The suggestion from ayhan with usecols will also be faster if speed is an issue. Testing the two different approaches using %timeit on a 2122 KB sized csv file yields 22.8 ms for the usecols approach and 53 ms for my suggested approach.

And don't forget import pandas as pd

Robert Jensen · Accepted Answer · 2019-01-17 19:43:43Z

1

If you need to process the columns separately, I like to destructure the columns with the zip(*iterable) pattern (effectively "unzip"). So for your example:

ids, names, zips, phones = zip(*(
  (row[1], row[2], row[6], row[7])
  for row in reader
))

edited Jan 17, 2019 at 19:43

answered Jan 15, 2019 at 18:59

Robert Jensen

212 bronze badges

Comments

Tonechas · Accepted Answer · 2021-11-21 23:41:44Z

1

import pandas as pd

dataset = pd.read_csv('Train.csv')
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values

X is a a bunch of columns, use it if you want to read more that one column
y is single column, use it to read one column
[:, 1:-1] are [row_index : to_row_index, column_index : to_column_index]

edited Nov 21, 2021 at 23:41

Tonechas

13.8k16 gold badges52 silver badges85 bronze badges

answered Nov 20, 2021 at 11:21

Lalan Kumar

111 bronze badge

Comments

Chris · Accepted Answer · 2022-08-23 10:16:00Z

1

import csv

with open('input.csv', encoding='utf-8-sig') as csv_file:
    # the below statement will skip the first row
    next(csv_file)
    reader= csv.DictReader(csv_file)
   
    Time_col ={'Time' : []}
    #print(Time_col)
    for record in reader :
        Time_col['Time'].append(record['Time'])
        print(Time_col)

edited Aug 23, 2022 at 10:16

Chris

139k139 gold badges316 silver badges293 bronze badges

answered Aug 19, 2022 at 5:00

shivalingesh

111 bronze badge

1 Comment

Chris Over a year ago

Welcome to Stack Overflow. Code is a lot more helpful when it is accompanied by an explanation. Stack Overflow is about learning, not providing snippets to blindly copy and paste. Please edit your answer and explain how it answers the specific question being asked. See How to Answer.

Fred · Accepted Answer · 2020-10-22 15:10:57Z

0

SAMPLE.CSV
a, 1, +
b, 2, -
c, 3, *
d, 4, /
column_names = ["Letter", "Number", "Symbol"]
df = pd.read_csv("sample.csv", names=column_names)
print(df)
OUTPUT
  Letter  Number Symbol
0      a       1      +
1      b       2      -
2      c       3      *
3      d       4      /

letters = df.Letter.to_list()
print(letters)
OUTPUT
['a', 'b', 'c', 'd']

answered Oct 22, 2020 at 15:10

Fred

2713 silver badges12 bronze badges

Comments

Eugene Lycenok · Accepted Answer · 2025-03-12 22:35:55Z

import csv
import sys
import os

def narrow_csv(input_csv_path, columns_to_keep):
    """
    Reads a CSV file, selects specified columns, and writes them to a new CSV file.

    Args:
        input_csv_path (str): Path to the input CSV file.
        columns_to_keep (list): List of column names to keep.

    Returns:
        str: Path to the newly created CSV file, or None if an error occurred.
    """
    try:
        with open(input_csv_path, 'r', newline='', encoding='utf-8') as infile:
            reader = csv.DictReader(infile)
            fieldnames = [col for col in columns_to_keep if col in reader.fieldnames] #ensure only valid columns are added.
            if not fieldnames:
                print(f"Error: None of the specified columns found in the CSV file.")
                return None

            base_name, extension = os.path.splitext(input_csv_path)
            output_csv_path = f"{base_name}.narrowed{extension}"

            with open(output_csv_path, 'w', newline='', encoding='utf-8') as outfile:
                writer = csv.DictWriter(outfile, fieldnames=fieldnames)
                writer.writeheader()
                for row in reader:
                    new_row = {col: row[col] for col in fieldnames}
                    writer.writerow(new_row)

        return output_csv_path

    except FileNotFoundError:
        print(f"Error: Input CSV file not found: {input_csv_path}")
        return None
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python script.py <input_csv_path> <column1,column2,...>")
        sys.exit(1)

    input_csv_path = sys.argv[1]
    columns_string = sys.argv[2]
    columns_to_keep = [col.strip() for col in columns_string.split(',')]

    output_path = narrow_csv(input_csv_path, columns_to_keep)

    if output_path:
        print(output_path)

Suren · Accepted Answer · 2017-05-15 13:52:43Z

-3

To fetch column name, instead of using readlines() better use readline() to avoid loop & reading the complete file & storing it in the array.

with open(csv_file, 'rb') as csvfile:

    # get number of columns

    line = csvfile.readline()

    first_item = line.split(',')

answered May 15, 2017 at 13:52

Suren

357 bronze badges

Collectives™ on Stack Overflow

Read specific columns from a csv file with csv module?

16 Answers 16

3 Comments

Comments

2 Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

3 Comments

Comments

2 Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Linked

Related