1

My laptops memory is 8 gig and I was trying to read and process a big csv file, and got memory issues, I found a solution which is using chunksize to process the file chunk by chunk, but apperntly when uisng chunsize the file format vecoe textreaderfile and the code I was using to process normal csvs with it doesnt work anymore, this is the code I'm trying to use to read how many sentences inside the csv file.

wdata = pd.read_csv(fileinput, nrows=0,).columns[0]
skip = int(wdata.count(' ') == 0)
wdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=1000)

data = wdata.count()
print(data)

the error I'm getting is:-

Traceback (most recent call last):
  File "table.py", line 24, in <module>
    data = wdata.count()
AttributeError: 'TextFileReader' object has no attribute 'count'

I tried another way arround aswell by running this code


TextFileReader = pd.read_csv(fileinput, chunksize=1000)  # the number of rows per chunk

dfList = []
for df in TextFileReader:
    dfList.append(df)

df = pd.concat(dfList,sort=False)
print(df)

and it gives this error


   data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 908, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2132, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 4

1 Answer 1

2

You have to iterate over the chunks:

csv_length = 0    
for chunk in pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=10000):
    csv_length += chunk.count()
print(csv_length )
Sign up to request clarification or add additional context in comments.

10 Comments

this is printing 1000 , 1000 , 1000, 1000 fore more than 200 times
@programmingfreak yes. obviously. you are reading chunks with length of 1000. and printing the length
@programmingfreak you have to add the count of each chunk to a variable to get the full length
I know I tried to append all of them to print the length of the file but it killed the process automatically for some reason, is there a way around it ?
the other attempt doesnt really make sense. your memory is too small to process the full csv. you cant read the chunks and append them together. you have to process each chunk and clear it out of memory
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.