6

I'm querying bus stops from a database, and I wish to have it only return 1 stop per bus line/direction. This query does just that:

Stop.select("DISTINCT line_id, direction")

Except that it won't give me any other attribute than those 2. I tried a couple of other queries to have it return the id in addition to the line_id and direction fields (ideally it would return all columns), with no luck:

Stop.select("DISTINCT line_id, direction, id")

and

Stop.select("DISTINCT(line_id || '-' || direction), id")

In both cases, the query loses its distinct clause and all rows are returned.

Some awesome dude helped me out and suggested to use a subquery to have it return all the ids:

Stop.find_by_sql("SELECT DISTINCT a1.line_id, a1.direction, (SELECT a2.id from stops a2 where a2.line_id = a1.line_id AND a2.direction = a1.direction ORDER BY a2.id ASC LIMIT 1) as id FROM stops a1

I can then extract all the ids and perform a 2nd query to fetch the full attributes for each stop.

Is there a way to have it all inside 1 query AND have it return all the attributes?

2
  • I'm not sure what your asking makes any sense. Either you want the stops or you don't. There will be more rows if you want the stops. If you want stop ids in a columns wrap array_agg around the the subquery and remove the limit. It seems though you are going to have some query after this or why just return the stop_ids. I think it is best to state what in want in a question. It may be a lot easier for people to answer Commented Feb 15, 2011 at 21:03
  • 1
    I want the stops, only I just want one per line/direction. So if a a bus line has 2 directions, I want the query to return the 1 stop for each direction. @pothibo's answer is right on though, thanks anyway. Commented Feb 15, 2011 at 21:13

2 Answers 2

27
Stop.select("DISTINCT ON (line_id, direction) *")
Sign up to request clarification or add additional context in comments.

2 Comments

nice. haven't seen that before. The question was postgres specific, so this answer is good, but worth noting that it looks like it is a postgres specific answer postgresql.org/docs/8.1/static/queries-select-lists.html
Actually 'DISTINCT (line_id, direction), *' works for me too with postgres
3

Not so fast - The other answer selects stop_id arbitrary

This is why your question makes no sense. We can pull stop_ids and have distinct line_id and direction. But we have no idea why we have the stop_id we do.

    create temp table test( line_id integer, direction char(1), stop_id      integer);
    insert into test values
            (1, 'N', 1),
            (1, 'N', 2),
            (1, 'S', 1),
            (1, 'S', 2),
            (2, 'N', 1),
            (2, 'N', 2),
            (2, 'S', 1),
            (2, 'S', 2)
    ;
    select distinct on (line_id, direction) * from test;
    -- do this again but will reverse the order of stop_ids
    -- could it possible change our Robust Query?!!!
    drop table test;
    create temp table test(line_id integer,direction char(1),stop_id integer);
    insert into test values
            (1, 'N', 2),
            (1, 'N', 1),
            (1, 'S', 2),
            (1, 'S', 1),
            (2, 'N', 2),
            (2, 'N', 1),
            (2, 'S', 2),
            (2, 'S', 1)
    ;
    select distinct on (line_id, direction) * from test;

First select:

line_id | direction | stop_id 
---------+-----------+---------
       1 | N         |       1
       1 | S         |       1
       2 | N         |       1
       2 | S         |       1

Second select:

line_id | direction | stop_id 
---------+-----------+---------
       1 | N         |       2
       1 | S         |       2
       2 | N         |       2
       2 | S         |       2

So we got away without grouping stop_id but we have no guarantees why we got the one we did. All we know is that this is valid stop_id. Any updates, inserts, and other stuff that no RDMS will guarantee can be changing around the physical order of rows.

This is what I meant in the top comment. There is no known reason for pulling one stop_id over the other one, but somehow you need this stop_id (or whatever else) desperately.

2 Comments

Gotcha. In my case I'm going to order the results by distance from the user. Does adding a ORDER BY clause will make sure to select the first stop_id?
Your warning is valid. But the question wasn't about the validity of the result. It was about the query itself being wrong. I'm guessing this query is a simple test and not the actual query.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.