How to convert SQL Query result to PANDAS Data Structure?

Question

Any help on this problem will be greatly appreciated.

So basically I want to run a query to my SQL database and store the returned data as Pandas data structure.

I have attached code for query.

I am reading the documentation on Pandas, but I have problem to identify the return type of my query.

I tried to print the query result, but it doesn't give any useful information.

Thanks!!!!

from sqlalchemy import create_engine

engine2 = create_engine('mysql://THE DATABASE I AM ACCESSING')
connection2 = engine2.connect()
dataid = 1022
resoverall = connection2.execute("
    SELECT 
       sum(BLABLA) AS BLA,
       sum(BLABLABLA2) AS BLABLABLA2,
       sum(SOME_INT) AS SOME_INT,
       sum(SOME_INT2) AS SOME_INT2,
       100*sum(SOME_INT2)/sum(SOME_INT) AS ctr,
       sum(SOME_INT2)/sum(SOME_INT) AS cpc
    FROM daily_report_cooked
    WHERE campaign_id = '%s'",
    %dataid
)

So I sort of want to understand what's the format/datatype of my variable "resoverall" and how to put it with PANDAS data structure.

Daniel · Accepted Answer · 2012-08-21 18:28:30Z

185

Here's the shortest code that will do the job:

from pandas import DataFrame
df = DataFrame(resoverall.fetchall())
df.columns = resoverall.keys()

You can go fancier and parse the types as in Paul's answer.

answered Aug 21, 2012 at 18:28

Daniel

27.8k12 gold badges65 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

specstr Over a year ago

This worked for me for 1.000.000 records fecthed from an Oracle database.

Mobigital Over a year ago

df = DataFrame(cursor.fetchall()) returns ValueError: DataFrame constructor not properly called!, it appears that the tuple of tuples is not acceptable for DataFrame constructor. There is also no .keys() on cursor either in dictionary or tuple mode.

Filippos Zofakis Over a year ago

Just note that the keys method will only work with results obtained using sqlalchemy. Pyodbc uses the description attribute for columns.

Gnudiff Over a year ago

@BowenLiu Yes, you can use with psycopg2 df.columns=[ x.name for x in recoverall.description ]

user398843 Over a year ago

Why does "df.columns = resoverall.keys()" not work?

|

Paul P · Accepted Answer · 2021-10-29 00:10:34Z

177

Edit: Mar. 2015

As noted below, pandas now uses SQLAlchemy to both read from (read_sql) and insert into (to_sql) a database. The following should work

import pandas as pd

df = pd.read_sql(sql, cnxn)

Previous answer: Via mikebmassey from a similar question

import pyodbc
import pandas.io.sql as psql
    
cnxn = pyodbc.connect(connection_info) 
cursor = cnxn.cursor()
sql = "SELECT * FROM TABLE"
    
df = psql.frame_query(sql, cnxn)
cnxn.close()

edited Oct 29, 2021 at 0:10

Paul P

4,0272 gold badges15 silver badges28 bronze badges

answered Jan 23, 2013 at 19:38

beardc

21.2k19 gold badges81 silver badges97 bronze badges

4 Comments

RobinL Over a year ago

This seems to be the best way to do it, as you don't need to manually use .keys() to get the column index. Probably Daniel's answer was written before this method existed. You can also use pandas.io.sql.read_frame()

3kstc Over a year ago

@openwonk where would implement pd.read_sql() in the code snippet above?

openwonk Over a year ago

Actually, since my last response, I've used pyodbc and pandas together quite a bit. Adding new answer with example, FYI.

tags Over a year ago

hello. what is cursor used for in your example, please?

Nathan Gould · Accepted Answer · 2018-05-11 03:00:55Z

44

If you are using SQLAlchemy's ORM rather than the expression language, you might find yourself wanting to convert an object of type sqlalchemy.orm.query.Query to a Pandas data frame.

The cleanest approach is to get the generated SQL from the query's statement attribute, and then execute it with pandas's read_sql() method. E.g., starting with a Query object called query:

df = pd.read_sql(query.statement, query.session.bind)

edited May 11, 2018 at 3:00

answered Sep 26, 2013 at 14:56

Nathan Gould

8,3032 gold badges19 silver badges15 bronze badges

2 Comments

LeoRochael Over a year ago

A more efficient approach is to get the statement from sqlalchemy and let pandas do the query itself with pandas.read_sql_query, passing query.statement to it. See this answer: stackoverflow.com/a/29528804/1273938

Nathan Gould Over a year ago

Thanks @LeoRochael! I edited my answer. Definitely cleaner!

Community · Accepted Answer · 2020-06-20 09:12:55Z

1. Using MySQL-connector-python

# pip install mysql-connector-python

import mysql.connector
import pandas as pd

mydb = mysql.connector.connect(
    host = 'host',
    user = 'username',
    passwd = 'pass',
    database = 'db_name'
)
query = 'select * from table_name'
df = pd.read_sql(query, con = mydb)
print(df)

2. Using SQLAlchemy

# pip install pymysql
# pip install sqlalchemy

import pandas as pd
import sqlalchemy

engine = sqlalchemy.create_engine('mysql+pymysql://username:password@localhost:3306/db_name')

query = '''
select * from table_name
'''
df = pd.read_sql_query(query, engine)
print(df)

Alex Bondarenko · Accepted Answer · 2021-07-20 20:35:42Z

Edit 2014-09-30:

pandas now has a read_sql function. You definitely want to use that instead.

Original answer:

I can't help you with SQLAlchemy -- I always use pyodbc, MySQLdb, or psychopg2 as needed. But when doing so, a function as simple as the one below tends to suit my needs:

import decimal

import pyodbc #just corrected a typo here
import numpy as np
import pandas

cnn, cur = myConnectToDBfunction()
cmd = "SELECT * FROM myTable"
cur.execute(cmd)
dataframe = __processCursor(cur, dataframe=True)

def __processCursor(cur, dataframe=False, index=None):
    '''
    Processes a database cursor with data on it into either
    a structured numpy array or a pandas dataframe.

    input:
    cur - a pyodbc cursor that has just received data
    dataframe - bool. if false, a numpy record array is returned
                if true, return a pandas dataframe
    index - list of column(s) to use as index in a pandas dataframe
    '''
    datatypes = []
    colinfo = cur.description
    for col in colinfo:
        if col[1] == unicode:
            datatypes.append((col[0], 'U%d' % col[3]))
        elif col[1] == str:
            datatypes.append((col[0], 'S%d' % col[3]))
        elif col[1] in [float, decimal.Decimal]:
            datatypes.append((col[0], 'f4'))
        elif col[1] == datetime.datetime:
            datatypes.append((col[0], 'O4'))
        elif col[1] == int:
            datatypes.append((col[0], 'i4'))

    data = []
    for row in cur:
        data.append(tuple(row))

    array = np.array(data, dtype=datatypes)
    if dataframe:
        output = pandas.DataFrame.from_records(array)

        if index is not None:
            output = output.set_index(index)

    else:
        output = array

    return output

@joefromct Perhaps, but this answer is so obsolete I really should just strike the whole thing and shows the pandas methods.
It may be relevent for some... the reason i was studying this was because of my other issue, using read_sql() here stackoverflow.com/questions/32847246/…
It's relevant for those who can't use SQLAlchemy which doesn't support all databases.
@lamecicle somewhat disagree. IIRC, read_sql can still accept non-SQLAlchemy connections through e.g., pyodbc, psychopg2, etc

Thomas Devoogdt · Accepted Answer · 2017-09-06 09:06:28Z

MySQL Connector

For those that works with the mysql connector you can use this code as a start. (Thanks to @Daniel Velkov)

Used refs:

import pandas as pd
import mysql.connector

# Setup MySQL connection
db = mysql.connector.connect(
    host="<IP>",              # your host, usually localhost
    user="<USER>",            # your username
    password="<PASS>",        # your password
    database="<DATABASE>"     # name of the data base
)   

# You must create a Cursor object. It will let you execute all the queries you need
cur = db.cursor()

# Use all the SQL you like
cur.execute("SELECT * FROM <TABLE>")

# Put it all to a data frame
sql_data = pd.DataFrame(cur.fetchall())
sql_data.columns = cur.column_names

# Close the session
db.close()

# Show the data
print(sql_data.head())

Murali Bala · Accepted Answer · 2017-12-06 16:01:00Z

Here's the code I use. Hope this helps.

import pandas as pd
from sqlalchemy import create_engine

def getData():
  # Parameters
  ServerName = "my_server"
  Database = "my_db"
  UserPwd = "user:pwd"
  Driver = "driver=SQL Server Native Client 11.0"

  # Create the connection
  engine = create_engine('mssql+pyodbc://' + UserPwd + '@' + ServerName + '/' + Database + "?" + Driver)

  sql = "select * from mytable"
  df = pd.read_sql(sql, engine)
  return df

df2 = getData()
print(df2)

DeshDeep Singh · Accepted Answer · 2018-07-12 14:55:47Z

This is a short and crisp answer to your problem:

from __future__ import print_function
import MySQLdb
import numpy as np
import pandas as pd
import xlrd

# Connecting to MySQL Database
connection = MySQLdb.connect(
             host="hostname",
             port=0000,
             user="userID",
             passwd="password",
             db="table_documents",
             charset='utf8'
           )
print(connection)
#getting data from database into a dataframe
sql_for_df = 'select * from tabledata'
df_from_database = pd.read_sql(sql_for_df , connection)

Janak Mayer · Accepted Answer · 2013-10-02 18:53:25Z

5

Like Nathan, I often want to dump the results of a sqlalchemy or sqlsoup Query into a Pandas data frame. My own solution for this is:

query = session.query(tbl.Field1, tbl.Field2)
DataFrame(query.all(), columns=[column['name'] for column in query.column_descriptions])

edited Oct 2, 2013 at 18:53

answered Sep 30, 2013 at 19:42

Janak Mayer

1161 silver badge5 bronze badges

1 Comment

LeoRochael Over a year ago

If you have a query object. It's more efficient to get the statement from sqlalchemy and let pandas do the query itself with pandas.read_sql_query, passing query.statement to it. See this answer: stackoverflow.com/a/29528804/1273938

Wouter Overmeire · Accepted Answer · 2012-08-21 12:36:18Z

4

resoverall is a sqlalchemy ResultProxy object. You can read more about it in the sqlalchemy docs, the latter explains basic usage of working with Engines and Connections. Important here is that resoverall is dict like.

Pandas likes dict like objects to create its data structures, see the online docs

Good luck with sqlalchemy and pandas.

answered Aug 21, 2012 at 12:36

Wouter Overmeire

69.7k10 gold badges67 silver badges44 bronze badges

Comments

openwonk · Accepted Answer · 2017-08-16 15:59:21Z

4

Simply use pandas and pyodbc together. You'll have to modify your connection string (connstr) according to your database specifications.

import pyodbc
import pandas as pd

# MSSQL Connection String Example
connstr = "Server=myServerAddress;Database=myDB;User Id=myUsername;Password=myPass;"

# Query Database and Create DataFrame Using Results
df = pd.read_sql("select * from myTable", pyodbc.connect(connstr))

I've used pyodbc with several enterprise databases (e.g. SQL Server, MySQL, MariaDB, IBM).

answered Aug 16, 2017 at 15:59

openwonk

15.8k7 gold badges49 silver badges41 bronze badges

4 Comments

Ramsey Over a year ago

How to write this dataframe again back to MSSQL using Pyodbc? Otherthan using sqlalchemy

openwonk Over a year ago

Use the to_sql method on the DataFrame object. That method defaults to SQLite, so you have to explicitly pass it an object pointing to the MSSQL database. See docs.

Ramsey Over a year ago

I tried the below one and I have around 200K rows with 13 columns. It is not completed after 15 minutes also. Any ideas? df.to_sql('tablename',engine,schema='schemaname',if_exists='append',index=False)

openwonk Over a year ago

That does seem slow... I would probably need to see whole code in action, sorry. I wish pandas was more optimized for light ETL work, but alas...

BubbleGuppies · Accepted Answer · 2013-07-31 18:55:22Z

This question is old, but I wanted to add my two-cents. I read the question as " I want to run a query to my [my]SQL database and store the returned data as Pandas data structure [DataFrame]."

From the code it looks like you mean mysql database and assume you mean pandas DataFrame.

import MySQLdb as mdb
import pandas.io.sql as sql
from pandas import *

conn = mdb.connect('<server>','<user>','<pass>','<db>');
df = sql.read_frame('<query>', conn)

For example,

conn = mdb.connect('localhost','myname','mypass','testdb');
df = sql.read_frame('select * from testTable', conn)

This will import all rows of testTable into a DataFrame.

kennyut · Accepted Answer · 2017-06-05 15:57:02Z

Here is mine. Just in case if you are using "pymysql":

import pymysql
from pandas import DataFrame

host   = 'localhost'
port   = 3306
user   = 'yourUserName'
passwd = 'yourPassword'
db     = 'yourDatabase'

cnx    = pymysql.connect(host=host, port=port, user=user, passwd=passwd, db=db)
cur    = cnx.cursor()

query  = """ SELECT * FROM yourTable LIMIT 10"""
cur.execute(query)

field_names = [i[0] for i in cur.description]
get_data = [xx for xx in cur]

cur.close()
cnx.close()

df = DataFrame(get_data)
df.columns = field_names

江明哲 · Accepted Answer · 2019-07-26 05:31:58Z

pandas.io.sql.write_frame is DEPRECATED. https://pandas.pydata.org/pandas-docs/version/0.15.2/generated/pandas.io.sql.write_frame.html

Should change to use pandas.DataFrame.to_sql https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

There is another solution. PYODBC to Pandas - DataFrame not working - Shape of passed values is (x,y), indices imply (w,z)

As of Pandas 0.12 (I believe) you can do:

import pandas
import pyodbc

sql = 'select * from table'
cnn = pyodbc.connect(...)

data = pandas.read_sql(sql, cnn)

Prior to 0.12, you could do:

import pandas
from pandas.io.sql import read_frame
import pyodbc

sql = 'select * from table'
cnn = pyodbc.connect(...)

data = read_frame(sql, cnn)

Wtower · Accepted Answer · 2015-07-23 09:21:05Z

1

Long time from last post but maybe it helps someone...

Shorted way than Paul H:

my_dic = session.query(query.all())
my_df = pandas.DataFrame.from_dict(my_dic)

edited Jul 23, 2015 at 9:21

Wtower

20.1k12 gold badges110 silver badges86 bronze badges

answered Jul 21, 2015 at 17:23

Antonio Fernandez

231 bronze badge

Comments

Mohit S · Accepted Answer · 2015-09-09 08:55:58Z

1

best way I do this

db.execute(query) where db=db_class() #database class
    mydata=[x for x in db.fetchall()]
    df=pd.DataFrame(data=mydata)

edited Sep 9, 2015 at 8:55

Mohit S

14.1k6 gold badges36 silver badges73 bronze badges

answered Sep 9, 2015 at 8:26

Berto

1992 silver badges2 bronze badges

Comments

tanza9 · Accepted Answer · 2017-08-14 06:43:07Z

0

If the result type is ResultSet, you should convert it to dictionary first. Then the DataFrame columns will be collected automatically.

This works on my case:

df = pd.DataFrame([dict(r) for r in resoverall])

answered Aug 14, 2017 at 6:43

tanza9

1,5671 gold badge10 silver badges8 bronze badges

Comments

Raphvanns · Accepted Answer · 2021-02-12 20:34:38Z

Here is a simple solution I like:

Put your DB connection info in a YAML file in a secure location (do not version it in the code repo).

---
host: 'hostname'
port: port_number_integer
database: 'databasename'
user: 'username'
password: 'password'

Then load the conf in a dictionary, open the db connection and load the result set of the SQL query in a data frame:

import yaml
import pymysql
import pandas as pd

db_conf_path = '/path/to/db-conf.yaml'

# Load DB conf
with open(db_conf_path) as db_conf_file:
    db_conf = yaml.safe_load(db_conf_file)

# Connect to the DB
db_connection = pymysql.connect(**db_conf)

# Load the data into a DF
query = '''
SELECT *
FROM my_table
LIMIT 10
'''

df = pd.read_sql(query, con=db_connection)

Collectives™ on Stack Overflow

How to convert SQL Query result to PANDAS Data Structure?

18 Answers 18

8 Comments

4 Comments

2 Comments

1. Using MySQL-connector-python

2. Using SQLAlchemy

1 Comment

Edit 2014-09-30:

Original answer:

7 Comments

MySQL Connector

Comments

Comments

Comments

1 Comment

Comments

4 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

18 Answers 18

8 Comments

4 Comments

2 Comments

1. Using MySQL-connector-python

2. Using SQLAlchemy

1 Comment

Edit 2014-09-30:

Original answer:

7 Comments

MySQL Connector

Comments

Comments

Comments

1 Comment

Comments

4 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Related