0

I have a database of every major airport's lat/long coords all across the world. I only need a portion of them (specifically in the USA) that are listed in a separate .csv file.

This csv file has two columns I extracted data from into two lists: The origin airport code (IATA code) and the destination airport code (also IATA).

My database has a column for IATA, and essentially I'm trying to query this database to get the respective lat/long coords for each airport in the two lists I have.

Here's my code:

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('sqlite:///airport_coordinates.db')

# The dataframe that contains the IATA codes for the airports I need
airport_relpath = "data/processed/%s_%s_combined.csv" % (file, airline)
script_dir = os.path.dirname(os.getcwd())
temp_file = os.path.join(script_dir, airport_relpath)
fields = ["Origin_Airport_Code", "Destination_Airport_Code"]
df_airports = pd.read_csv(temp_file, usecols=fields)

# the origin/destination IATA codes for the airports I need
origin = df_airports.Origin_Airport_Code.values
dest = df_airports.Destination_Airport_Code.values

# query the database for the lat/long coords of the airports I need
sql = ('SELECT lat, long FROM airportCoords WHERE iata IN %s' %(origin))
indexcols = ['lat', 'long']

df_origin = pd.read_sql(sql, engine)
# testing the origin coordinates
print(df_origin)

This is the error I'm getting:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such 
table: 'JFK' 'JFK' 'JFK' ... 'MIA' 'JFK' 'MIA' [SQL: "SELECT lat, long 
FROM airportCoords WHERE iata IN ['JFK' 'JFK' 'JFK' ... 'MIA' 'JFK' 
'MIA']"] (Background on this error at: http://sqlalche.me/e/e3q8)

It's definitely because I'm not querying it correctly (since it thinks my queries are supposed to tables).

I tried looping through the list to query each element individually, but the list contains over 604,885 elements and my computer was not able to come up with any output.

4
  • You also don't need to create the SQLAlchemy engine yourself, you can pass in the sqlite:///... URL directly to pandas.read_sql(). Commented Nov 13, 2018 at 15:50
  • @MartijnPieters You're right, I mistakenly left that pasted in from another function in which I created/entered data into the database. Commented Nov 13, 2018 at 15:54
  • I've removed the sqlite3 connection from the question, no need to distract with that. Commented Nov 13, 2018 at 16:05
  • Save yourself some headache....if you're using sqlalchemy you should use the ORM....its SO MUCH easier than having to hand build SQL commands Commented Nov 13, 2018 at 16:33

1 Answer 1

1

Your error is in using string interpolation:

sql = ('SELECT lat, long FROM airportCoords WHERE iata IN %s' %(origin))

Because origin is a Numpy array this results in a [....] SQL identifier syntax in the query; see the SQLite documentation:

If you want to use a keyword as a name, you need to quote it. There are four ways of quoting keywords in SQLite:

[...]
[keyword] A keyword enclosed in square brackets is an identifier. [...]

You asked SQLite to check if iata is in a table named ['JFK' 'JFK' 'JFK' ... 'MIA' 'JFK' 'MIA'] because that's the string representation of a Numpy array.

You are already using SQLAlchemy, it would be easier if you used that library to generate all SQL for you, including the IN (....) membership test:

from sqlalchemy import *

filter = literal_column('iata', String).in_(origin)
sql = select([
    literal_column('lat', Float),
    literal_column('long', Float),
]).select_from(table('airportCoords')).where(filter)

then pass sql in as the query.

I used literal_column() and table() objects here to shortcut directly to the names of the objects, but you could also ask SQLAlchemy to reflect your database table directly from the engine object you already created, then use the resulting table definition to generate the query:

metadata = MetaData()
airport_coords = Table('airportCoords', metadata, autoload=True, autoload_with=engine)

at which point the query would be defined as:

filter = airport_coords.c.iata.in_(origin)
sql = select([airport_coords.c.lat, airport_coords.c.long]).where(filter)

I'd also include the iata code in the output, otherwise you will have no path back to connecting IATA code to the matching coordinates:

sql = select([airport_coords.c.lat, airport_coords.c.long, airport_coords.c.iata]).where(filter)

Next, as you say you have 604,885 elements in the list, so you probably want to load that CSV data into a temporary table to keep the query efficient:

engine = create_engine('sqlite:///airport_coordinates.db')

# code to read CSV file
# ...
df_airports = pd.read_csv(temp_file, usecols=fields)

# SQLAlchemy table wrangling
metadata = MetaData()
airport_coords = Table('airportCoords', metadata, autoload=True, autoload_with=engine)
temp = Table(
    "airports_temp",
    metadata,
    *(Column(field, String) for field in fields),
    prefixes=['TEMPORARY']
)
with engine.begin() as conn:
    # insert CSV values into a temporary table in SQLite
    temp.create(conn, checkfirst=True)
    df_airports.to_sql(temp.name), engine, if_exists='append')

# Join the airport coords against the temporary table
joined = airport_coords.join(temp, airport_coords.c.iata==temp.c.Origin_Airport_Code)

# select coordinates per airport, include the iata code
sql = select([airport_coords.c.lat, airport_coords.c.long, airport_coords.c.iata]).select_from(joined)
df_origin = pd.read_sql(sql, engine)
Sign up to request clarification or add additional context in comments.

7 Comments

Just as a note in case it is of interest, as of 1.2 one can pass lists as parameters for textual SQL using the bindparam.expanding feature.
If I wanted to avoid repeats (as the csv has many of the same airports repeating), how would I go about changing the 'joined' parameter?
@TimKrash: you can add .distinct() to the sql query object.
@TimKrash: try df_origin = pd.read_sql(sql.distinct(), engine) for example.
@MartijnPieters unfortunately that didn't work. It's okay though, it probably doesn't matter so much at this step of what I'm trying to do. Thank you very much for the help, though!
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.