0

I am working on a python project in which I read csv files using pythons csv lib. I dont need all of the files data, just a few lines to do some analysis. So I just want to read in a sample (a certain number of lines). I could simply do that like the following:

num_rows = 1000
with open(path, newline='') as my_file:
    sample_reader = csv.reader(my_file)
    count = 0
    for row in sample_reader:
        # do sth with row
        count += 1
        if count >= num_rows:
            break

My problem:

How does 'sample_reader' read in the lines while iterating over it? Does it only read in a 'row' for each for-loop iteration? Or does it use a buffer, or even worse does it read in the whole file before the iteration?

I tried to find an answer reading the doc (https://docs.python.org/3/library/csv.html#csv.reader), and even looked up the code, but I couldnt find any usefull informaiton.

1

2 Answers 2

1

Does it only read in a 'row' for each for-loop iteration? Or does it use a buffer, or even worse does it read in the whole file before the iteration?

As the documentation states, csv.reader will give you an iterator(reader object).

In your example, you are just reading one line at a time from this iterator, so you are not reading the whole file into memory. It calls a __next__() method to give you each line one at a time as you iterate over it.

You can verify this from the documentation for iterator:

An object representing a stream of data. Repeated calls to the iterator’s next() method (or passing it to the built-in function next()) return successive items in the stream.

What would cause you to read the whole file into memory would be doing something like this:

sample_reader = list(csv.reader(my_file))

# Loop over 1000 rows from list
for row in sample_reader[:num_rows]:
     # Do something with each line

Which will exhaust the iterator and load all the file contents into a list. This is fine for smaller files, but for larger files(like yours), its much faster to just read one line at time from the iterator, like you are doing already.

Sign up to request clarification or add additional context in comments.

Comments

1

csv.reader returns a reader object that calls __next__ method of iterator passed to it (in this case, a file object). With each call to it, it returns a list of strings in the corresponding row of file. Note that once it traverses the file once it reaches the end of file. If you want to reset the file cursor use seek(0) (not recommended).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.