SQL Group By and min (MySQL)

Question

I have the following SQL:

select code, distance from places;

The output is below:

CODE    DISTANCE            LOCATION
106     386.895834130068    New York, NY
80      2116.6747774121     Washington, DC
80      2117.61925131453    Alexandria, VA
106     2563.46708627407    Charlotte, NC

I want to be able to just get a single code and the closest distance. So I want it to return this:

CODE    DISTANCE            LOCATION
106     386.895834130068    New York, NY
80      2116.6747774121     Washington, DC

I originally had something like this:

SELECT code, min(distance), location
GROUP BY code
HAVING distance > 0 
ORDER BY distance ASC

The min worked fine if I didn't want to get the correct location that was associated with the least distance. How do I get the min(distance) and the correct location (depending on the ordering on the inserts in the table, sometimes you could end up with the New York distance but the Charlotte in Location).

chris, why you wonder for performance so much under each answer? wouldn't you execute proposed queries once and buffer the results in order to obtain simple code 1:1 closest location relationship? as far as I'm concerned distances between codes and locations do not change very often... — Kuba Wyrostek
– Kuba Wyrostek, Commented Jul 27, 2012 at 8:41

Zane Bien · Accepted Answer · 2012-07-27 08:36:05Z

9

To get the correct associated location, you'll need to join a subselect which gets the minimum distance per code on the condition that the distance in the outer main table matches with the minimum distance derived in the subselect.

SELECT a.code, a.distance
FROM   places a
INNER JOIN
(
    SELECT   code, MIN(distance) AS mindistance
    FROM     places
    GROUP BY code
) b ON a.code = b.code AND a.distance = b.mindistance
ORDER BY a.distance

edited Jul 27, 2012 at 8:36

answered Jul 27, 2012 at 8:30

Zane Bien

23.2k6 gold badges47 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

17 Comments

cdub Over a year ago

how's the performance on this for like 100,000s of locations?

Zane Bien Over a year ago

@chris, since you're using MySQL, this is likely the most efficient solution you will find. You'll have to make sure you have the proper indexes set up on code and distance fields.

cdub Over a year ago

yeah code is a PK so that's okay, but distance is calcualted with math (using lat and long and google's map api)

Zane Bien Over a year ago

@chris, if performance becomes suboptimal for your needs, you may want to look into using a spatial database instead of a relational one. Relational databases will only get you so far with these types of queries, but spatial databases are much more geared towards them.

ErikE Over a year ago

Use a "bounding box" to get a shorter candidate list, then use the distance function. That lets you exclude most of the 100,000.

|

tobias86 · Accepted Answer · 2012-07-27 08:06:56Z

You can try to do a nested lookup between the minimum grouping and the original table.

This seems to do the trick

SELECT MinPlaces.Code, MinPlaces.Distance, Places.Location 
FROM Places INNER JOIN
(
    SELECT Code, MIN(Distance) AS Distance
    FROM Places
    GROUP BY Code
    HAVING MIN(Distance) > 0 
) AS MinPlaces ON Places.Code = MinPlaces.Code AND Places.Distance = MinPlaces.Distance
ORDER BY MinPlaces.Distance ASC

UPDATE: Tested using the following:

DECLARE @Places TABLE ( Code INT, Distance FLOAT, Location VARCHAR(50) )

INSERT INTO @Places (Code, Distance, Location)
VALUES
(106, 386.895834130068, 'New York, NY'),
(80, 2116.6747774121, 'Washington, DC'),
(80, 2117.61925131453, 'Alexandria, VA'),
(106, 2563.46708627407, 'Charlotte, NC')

SELECT MinPlaces.Code, MinPlaces.Distance, P.Location 
FROM @Places P INNER JOIN
(
    SELECT Code, MIN(Distance) AS Distance
    FROM @Places
    GROUP BY Code
    HAVING MIN(Distance) > 0 
) AS MinPlaces ON P.Code = MinPlaces.Code AND P.Distance = MinPlaces.Distance
ORDER BY MinPlaces.Distance ASC

And this yields:

enter image description here

You're still self-joining Places which will be worse performance than a sequence project...

ErikE · Accepted Answer · 2012-07-27 08:22:58Z

0

You did not say your DBMS. The following solutions are for SQL Server.

WITH D AS (
   SELECT code, distance, location,
      Row_Number() OVER (PARTITION BY code ORDER BY distance) Seq
   FROM places
)
SELECT *
FROM D
WHERE Seq = 1

If you have a table with unique Codes, and an index in your Places table on [Code, Distance] then a CROSS APPLY solution could be better:

SELECT
   X.*
FROM
   Codes C
   CROSS APPLY (
      SELECT TOP 1 *
      FROM Places P
      WHERE C.Code = P.Code
      ORDER BY P.Distance
   ) X

I cannot work on a solution for mysql unti much later.

P.S. You cannot rely on insertion order. Do not try!

edited Jul 27, 2012 at 8:22

answered Jul 27, 2012 at 8:03

ErikE

50.6k23 gold badges157 silver badges201 bronze badges

5 Comments

cdub Over a year ago

what do you mean unique codes, as the sample i gave has duplicate codes

ErikE Over a year ago

If you have a separate table listing all the Codes with 1 row per code!

cdub Over a year ago

yes i know i can't rely on insertion order. anyway my code is actual a user.id which comes from a user table and links to a locations table with distance and location

cdub Over a year ago

then yes there are unique and they have a one to many relationship with locations table

cdub Over a year ago

is it in mysql and is the performance fast as i'll have 100,000s of locations

Collectives™ on Stack Overflow

SQL Group By and min (MySQL)

3 Answers 3

17 Comments

2 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

17 Comments

2 Comments

5 Comments

Linked

Related