108

I have the following schema:

CREATE TABLE author (
    id   integer
  , name varchar(255)
);
CREATE TABLE book (
    id        integer
  , author_id integer
  , title     varchar(255)
  , rating    integer
);

And I want each author with its last book:

SELECT book.id, author.id, author.name, book.title as last_book
FROM author
JOIN book book ON book.author_id = author.id

GROUP BY author.id
ORDER BY book.id ASC

Apparently you can do that in mysql: Join two tables in MySQL, returning just one row from the second table.

But postgres gives this error:

ERROR: column "book.id" must appear in the GROUP BY clause or be used in an aggregate function: SELECT book.id, author.id, author.name, book.title as last_book FROM author JOIN book book ON book.author_id = author.id GROUP BY author.id ORDER BY book.id ASC

It's because:

When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.

How can I specify to postgres: "Give me only the last row, when ordered by joined_table.id, in the joined table ?"


Edit: With this data:

INSERT INTO author (id, name) VALUES
  (1, 'Bob')
, (2, 'David')
, (3, 'John');

INSERT INTO book (id, author_id, title, rating) VALUES
  (1, 1, '1st book from bob', 5)
, (2, 1, '2nd book from bob', 6)
, (3, 1, '3rd book from bob', 7)
, (4, 2, '1st book from David', 6)
, (5, 2, '2nd book from David', 6);

I should see:

book_id author_id name    last_book
3       1         "Bob"   "3rd book from bob"
5       2         "David" "2nd book from David"
1

7 Answers 7

145
select distinct on (author.id)
    book.id, author.id, author.name, book.title as last_book
from
    author
    inner join
    book on book.author_id = author.id
order by author.id, book.id desc

Check distinct on

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

With distinct on it is necessary to include the "distinct" columns in the order by. If that is not the order you want then you need to wrap the query and reorder

select 
    *
from (
    select distinct on (author.id)
        book.id, author.id, author.name, book.title as last_book
    from
        author
        inner join
        book on book.author_id = author.id
    order by author.id, book.id desc
) authors_with_first_book
order by authors_with_first_book.name

Another solution is to use a window function as in Lennart's answer. And another very generic one is this

select 
    book.id, author.id, author.name, book.title as last_book
from
    book
    inner join
    (
        select author.id as author_id, max(book.id) as book_id
        from
            author
            inner join
            book on author.id = book.author_id
        group by author.id
    ) s
    on s.book_id = book.id
    inner join
    author on book.author_id = author.id
Sign up to request clarification or add additional context in comments.

8 Comments

Does the job. distinct on is a bit postgres-specific. If there is another way, i'd be glad to know it.
@My guess goes to distinct on. But don't guess. Check both solutions with explain analyze. New solution in my answer
distinct on is cool feature but please remember it causes to sorting which is good if can execute in memory. Once data set in subquery became larger the sorting involves disk operations (temp file will be written to disk to make sorting happen)
@Zorg does the "window" solution avoids that? It's so strange this isn't more trivial...
Oh believe me... 'Strange this isn't more trivial' has been boggling me for months. I wish the postgres team (and I do love them dearly) would get on this issue rather than exotic xyz, this would touch so many more people than 'Allow invisible PROMPT 2' and the other stuff they work on. Just like Zorg says, you need to use distinct on an outer query, from subquery results. Your performance will instantly go to sh*t regardless of what indexes you have, as the entire subquery will be evaluated before your outer distinct and sort...
|
35

I've done something similar for a chat system, where room holds the metadata and list contains the messages. I ended up using the Postgresql LATERAL JOIN which worked like a charm.

SELECT MR.id AS room_id, MR.created_at AS room_created, 
    lastmess.content as lastmessage_content, lastmess.datetime as lastmessage_when
FROM message.room MR
    LEFT JOIN LATERAL (
        SELECT content, datetime
        FROM message.list
        WHERE room_id = MR.id
        ORDER BY datetime DESC 
        LIMIT 1) lastmess ON true
ORDER BY lastmessage_when DESC NULLS LAST, MR.created_at DESC

For more info see https://www.heap.io/blog/postgresqls-powerful-new-join-type-lateral

4 Comments

This answer seems easier to understand and more modern. Does anyone know if there are drawbacks to this over the other ones?
This is what I'm looking for. Join lateral seems similar with subquery but it allows reference columns to "outer query"
Link in heap.io has moved to here.
FWIW, lateral is not postgres specific. It was introduced in SQL99. The first implementation (I know) was SQL Server 2005 (though they named it CROSS APPLY) with Db2 9.1 in second place 2006. All major DBMS supports it, but some on-memory databases like H2 does not.
14

This may look archaic and overly simple, but it does not depend on window functions, CTE's and aggregating subqueries. In most cases it is also the fastest.

SELECT bk.id, au.id, au.name, bk.title as last_book
FROM author au
JOIN book bk ON bk.author_id = au.id
WHERE NOT EXISTS (
    SELECT *
    FROM book nx
    WHERE nx.author_id = bk.author_id
    AND nx.book_id > bk.book_id
    )
ORDER BY book.id ASC
    ;

4 Comments

I find it hard to believe that NOT EXISTS is ever the fastest for anything except extremely trivial datasets...
EXISTS() is older than sql92 style joins. Before outer joins existed, we had tobuild them by usingselect ... from a where not exists(select ... from b where ...) union all select ... from a,b where ... That is why developers put a lot of effort into implementing them. When available, indexes are used to implement the anti-join. [on most platforms]
The anti-join approach was faster than the other solutions suggested here for me. Always check with an explain analyze. This also surprised me @Ajax.
This query worked like magic for me! I have tried many other solutions including the accepted answer of using DISTINCT ON, but this is by far the fastest for me. I am using this in a view, and this query performs by far the best, when other WHERE and/or ORDER BY conditions are added when the view is queried. Thank you so so much! :D
10

You could add a rule into the join for specifying only one row. I had work for me.

Like this:

SELECT 
    book.id, 
    author.id, 
    author.name, 
    book.title as last_book
FROM author auth1
JOIN book book ON (book.author_id = auth1.id AND book.id = (select max(b.id) from book b where b.author_id = auth1))
GROUP BY auth1.id
ORDER BY book.id ASC

This way you get the data from the book with the higher ID. You could add "date" and make the same with the max(date).

3 Comments

this one gives an error : ERROR: aggregate functions are not allowed in JOIN conditions
That is weird. I use this myself on some queries. It is very useful. What PostgreSQL version are you using?
This one and the one with LATERAL JOIN (works similar I guess?) were the fastest for me, comparing to solutions with DISTINCT and WHERE NOT EXISTS (which surprisingly was the slowest of them all).
7

Here is one way:

SELECT book_id, author_id, author_name, last_book
FROM (
    SELECT b.id as book_id
         , a.id as author_id
         , a.name as author_name
         , b.title as last_book
         , row_number() over (partition by a.id
                              order by b.id desc) as rn
    FROM author a
    JOIN book b 
        ON b.author_id = a.id
) last_books
WHERE rn = 1;

5 Comments

I'm getting subquery in FROM must have an alias
@pinouchon, replace the 2nd last line ) with ) a for example (which would give the subquery an alias of a)
@OGHaza Yes, thanks. I was wondering if I have to include the alias in the select. But just naming the alias is enough.
@pinouchon, as long as there are no conflicts in column names you don't need to use the alias, although it's good practice to anyway at least for complex queries (e.g. if you join table A to table B, and both have a column called ID, you can't just SELECT ID - it'll throw an "ambiguous column name ID" error)
I like the solution much more over the accepted answer as IMHO it's much easier to understand and quite easy to verify
0

As a slight variation on @wildplasser's suggestion, which still works across implementations, you can use max rather than not exists. This reads better if you like short joins better than long where clauses

select * 
  from author au
  join (
    select max(id) as max_id, author_id
      from book bk
     group by author_id) as lb 
    on lb.author_id = au.id
  join bk 
    on bk.id = lb.max_id;

or, to give a name to the subquery, which clarifies things, go with WITH

with last_book as 
   (select max(id) as max_id, author_id
      from book bk
     group by author_id)

select * 
  from author au
  join last_book lb
    on au.id = lb.author_id
  join bk 
    on bk.id = lb.max_id;

1 Comment

Be really careful with using max... if the other columns you are selecting differ, you could wind up with a result that is a mix of two colums... not likely ever something you actually want.
0
create temp table book_1 as (
SELECT
id
,title
,author_id
,row_number() OVER (PARTITION BY id) as rownum 
FROM
book)  distributed by ( id );

select author.id,b.id, author.id, author.name, b.title as last_book
from
    author

    left  join
   (select * from  book_1 where rownum = 1 ) b on b.author_id = author.id
order by author.id, b.id desc

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.