0

I have a large database (more than 5 million rows) accessible with microsoft access. I am able so far to pull the entire table in a data frame with python but this is very long to be processed (more than 15 minutes) which is nuts as I only need to work on a much smaller section of the entire table ...

import pyodbc
import pandas as pd


conn_str(
r'DRIVER = {Adaptive Server Enterprise};'
r'DBQ = \\path_where_the_table_is_located.mdb;'
r'SERVER = server_name;'
r'Databse = ...;'
r'UID = xxx;'
r'PWD = xxx;'
r'port = xxx')


con = pyodbc.connect(conn_str)
cursor = conn.cursor()

df  = pd.read_sql('select * table_to_be_converted_into_a_df, conn)

how can I enhance my query above to only request a smaller portion of the entire table and run it much faster ?

df1 = df.loc[df['date'] == '2021-07-07']

this is the code I run to shrink the df once it's done and that I would like to add somehow to the initial query to ONLY query the data I need and run it much faster

1 Answer 1

3

Altering the SQL query itself such that it only returns rows of that date would look something like...

select * from table_to_be_converted_into_a_df where date = '2021-07-07';

This will reduce the total amount of data returned from the DB to your python script—which could significantly speed up your script. However, if your table table_to_be_converted_into_a_df does NOT have an index on the date column, then your query will still be scanning the entire table, which may take a while.

If thats the case consider adding an index to the date column.

Sign up to request clarification or add additional context in comments.

3 Comments

Amazing ! I tried your code and I have my result in less than 3 seconds from 20 minutes sounds like a good improvement to me :)
@chris is it possible to add index in access db ?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.