80

This seems like a simple enough question, but I can't figure out how to convert a Pandas DataFrame to a GeoDataFrame for a spatial join?

Here is an example of what my data looks like using df.head():

    Date/Time           Lat       Lon       ID
0   4/1/2014 0:11:00    40.7690   -73.9549  140
1   4/1/2014 0:17:00    40.7267   -74.0345  NaN

In fact, this DataFrame was created from a CSV so if it's easier to read the CSV directly as a GeoDataFrame that's fine too.

1
  • 2
    use GeoPandas Commented Dec 16, 2015 at 21:17

3 Answers 3

141

Convert the DataFrame's content (e.g. Lat and Lon columns) into appropriate Shapely geometries first and then use them together with the original DataFrame to create a GeoDataFrame.

from geopandas import GeoDataFrame
from shapely.geometry import Point

geometry = [Point(xy) for xy in zip(df.Lon, df.Lat)]
df = df.drop(['Lon', 'Lat'], axis=1)
gdf = GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

Result:

    Date/Time           ID      geometry
0   4/1/2014 0:11:00    140     POINT (-73.95489999999999 40.769)
1   4/1/2014 0:17:00    NaN     POINT (-74.03449999999999 40.7267)

Since the geometries often come in the WKT format, I thought I'd include an example for that case as well:

import geopandas as gpd
import shapely.wkt

geometry = df['wktcolumn'].map(shapely.wkt.loads)
df = df.drop('wktcolumn', axis=1)
gdf = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)
5
  • 1
    Thanks again! That's much simpler and runs very fast - much better than iterating through every row of the df at my n=500,000 :) Commented Dec 16, 2015 at 22:42
  • 7
    Gosh, thanks! I check this answer like every 2 days :) Commented Dec 21, 2016 at 16:25
  • 1
    you'd think this would be the first entry in the documentation! Commented May 14, 2017 at 16:53
  • +1 for the shapely.wkt. It took me a while to figure this out! Commented Dec 12, 2017 at 15:14
  • 1
    In order to avoid deleting lat/lon columns from the pandas df (in case you need to use it later), I would instead recommend dropping lat/lon in the creation of gdf like so gdf = GeoDataFrame(df.drop(['Lon', 'Lat'], axis=1), crs=crs, geometry=geometry) Commented May 27, 2020 at 19:43
58

Update 2019-12: The official documentation does it succinctly using geopandas.points_from_xy like so:

gdf = geopandas.GeoDataFrame(
    df,
    geometry=geopandas.points_from_xy(x=df.Longitude, y=df.Latitude)
)

You can also set a crs or z (e.g. elevation) value if you want.


Old Method: Using shapely

One-liners! Plus some performance pointers for big-data people.

Given a pandas.DataFrame that has x Longitude and y Latitude like so:

df.head()
x   y
0   229.617902  -73.133816
1   229.611157  -73.141299
2   229.609825  -73.142795
3   229.607159  -73.145782
4   229.605825  -73.147274

Let's convert the pandas.DataFrame into a geopandas.GeoDataFrame as follows:

Library imports and shapely speedups:

import geopandas as gpd
import shapely
shapely.speedups.enable() # enabled by default from version 1.6.0

Code + benchmark times on a test dataset I have lying around:

#Martin's original version:
#%timeit 1.87 s ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
                                crs={'init': 'epsg:4326'},
                                geometry=[shapely.geometry.Point(xy) for xy in zip(df.x, df.y)])



#Pandas apply method
#%timeit 8.59 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
                       crs={'init': 'epsg:4326'},
                       geometry=df.apply(lambda row: shapely.geometry.Point((row.x, row.y)), axis=1))

Using pandas.apply is surprisingly slower, but may be a better fit for some other workflows (e.g. on bigger datasets using dask library):

Credits to:

Some Work-In-Progress references (as of 2017) for handling big dask datasets:

1
  • Thanks for the comparison, indeed the zip version is way faster Commented Mar 27, 2019 at 10:58
0

Here's a function taken from the internals of geopandas and slightly modified to handle a dataframe with a geometry/polygon column already in wkt format.

from geopandas import GeoDataFrame
import shapely

def df_to_geodf(df, geom_col="geom", crs=None, wkt=True):
  """
  Transforms a pandas DataFrame into a GeoDataFrame.
  The column 'geom_col' must be a geometry column in WKB representation.
  To be used to convert df based on pd.read_sql to gdf.
  Parameters
  ----------
  df : DataFrame
      pandas DataFrame with geometry column in WKB representation.
  geom_col : string, default 'geom'
      column name to convert to shapely geometries
  crs : pyproj.CRS, optional
      CRS to use for the returned GeoDataFrame. The value can be anything accepted
      by :meth:`pyproj.CRS.from_user_input() <pyproj.crs.CRS.from_user_input>`,
      such as an authority string (eg "EPSG:4326") or a WKT string.
      If not set, tries to determine CRS from the SRID associated with the
      first geometry in the database, and assigns that to all geometries.
  Returns
  -------
  GeoDataFrame
  """

  if geom_col not in df:
    raise ValueError("Query missing geometry column '{}'".format(geom_col))

  geoms = df[geom_col].dropna()

  if not geoms.empty:
    if wkt == True:
      load_geom = shapely.wkt.loads
    else:
      load_geom_bytes = shapely.wkb.loads
      """Load from Python 3 binary."""

      def load_geom_buffer(x):
        """Load from Python 2 binary."""
        return shapely.wkb.loads(str(x))

      def load_geom_text(x):
        """Load from binary encoded as text."""
        return shapely.wkb.loads(str(x), hex=True)

      if isinstance(geoms.iat[0], bytes):
        load_geom = load_geom_bytes
      else:
        load_geom = load_geom_text

    df[geom_col] = geoms = geoms.apply(load_geom)
    if crs is None:
      srid = shapely.geos.lgeos.GEOSGetSRID(geoms.iat[0]._geom)
      # if no defined SRID in geodatabase, returns SRID of 0
      if srid != 0:
        crs = "epsg:{}".format(srid)

  return GeoDataFrame(df, crs=crs, geometry=geom_col)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.