0

I'm trying to import the data. I'm getting Memory Error. I increased the virtual memory, and the data size is 2.71 GB. I thought about setting the data types in advance to optimize memory consumption, so I found this site: Optimize Pandas Memory Usage for Large Datasets

base_path = pathlib.Path('dataset')

base_airbnb = pd.DataFrame()

for file in base_path.iterdir():
    df = pd.read_csv(r'dataset\{}'.format(file.name))
    base_airbnb = base_airbnb.append(df)
    
display(base_airbnb) 

How to set pandas column types to decrease memory consumption?

ParserError: Error tokenizing data. C error: out of memory

1 Answer 1

1

First off, df.append is deprecated and pd.concat should be used instead.

base_path = pathlib.Path('dataset')
base_airbnb = []

for file in base_path.iterdir():
    base_airbnb.append(pd.read_csv(rf'dataset\{file.name}', dtype={'a': np.float64, 'b': np.int32, 'c': 'Int64'})

base_airbnb = pd.concat(base_airbnb)

As for how to set dtypes... follow the pattern given in the documentation.

  • {'a': np.float64, 'b': np.int32, 'c': 'Int64'}
Sign up to request clarification or add additional context in comments.

1 Comment

I don't understand how the error persists... Error tokenizing data. C error: out of memory

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.