Optimize Pandas Memory Usage

Question

I'm trying to import the data. I'm getting Memory Error. I increased the virtual memory, and the data size is 2.71 GB. I thought about setting the data types in advance to optimize memory consumption, so I found this site: Optimize Pandas Memory Usage for Large Datasets

base_path = pathlib.Path('dataset')

base_airbnb = pd.DataFrame()

for file in base_path.iterdir():
    df = pd.read_csv(r'dataset\{}'.format(file.name))
    base_airbnb = base_airbnb.append(df)
    
display(base_airbnb)

How to set pandas column types to decrease memory consumption?

ParserError: Error tokenizing data. C error: out of memory

BeRT2me · Accepted Answer · 2022-06-26 02:44:54Z

1

First off, df.append is deprecated and pd.concat should be used instead.

base_path = pathlib.Path('dataset')
base_airbnb = []

for file in base_path.iterdir():
    base_airbnb.append(pd.read_csv(rf'dataset\{file.name}', dtype={'a': np.float64, 'b': np.int32, 'c': 'Int64'})

base_airbnb = pd.concat(base_airbnb)

As for how to set dtypes... follow the pattern given in the documentation.

{'a': np.float64, 'b': np.int32, 'c': 'Int64'}

answered Jun 26, 2022 at 2:44

BeRT2me

13.3k2 gold badges17 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jhon Oliver Over a year ago

I don't understand how the error persists... Error tokenizing data. C error: out of memory

Collectives™ on Stack Overflow

Optimize Pandas Memory Usage

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related