2

For each customer_id, I want to find the first delivery order. For each row of Delivery, I compare order_date with the smallest order_date of that customer_id.

Why does the SELECT statement below only return the row that contains the smallest order_date?

Code.

CREATE TABLE IF NOT EXISTS Delivery (
  delivery_id int,
  customer_id int,
  order_date date,
  customer_pref_delivery_date date
);

TRUNCATE TABLE Delivery;

INSERT INTO Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date)
VALUES (1, 1, '2019-08-01', '2019-08-02');
INSERT INTO Delivery VALUES (2, 2, '2019-08-02', '2019-08-02');
INSERT INTO Delivery VALUES (3, 1, '2019-08-11', '2019-08-12');
INSERT INTO Delivery VALUES (4, 3, '2019-08-24', '2019-08-24');
INSERT INTO Delivery VALUES (5, 3, '2019-08-21', '2019-08-22');
INSERT INTO Delivery VALUES (6, 2, '2019-08-11', '2019-08-13');
INSERT INTO Delivery VALUES (7, 4, '2019-08-09', '2019-08-09');

-- Write your PostgreSQL query statement below
WITH temp AS (
    SELECT
        d.delivery_id,
        d.customer_id,
        d.order_date,
        d.customer_pref_delivery_date
    FROM Delivery d
    WHERE order_date = (SELECT MIN(order_date) FROM Delivery D
                        WHERE d.customer_id = D.customer_id)
)
select * from temp;

Input:

Delivery table:
+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1           | 1           | 2019-08-01 | 2019-08-02                  |
| 2           | 2           | 2019-08-02 | 2019-08-02                  |
| 3           | 1           | 2019-08-11 | 2019-08-12                  |
| 4           | 3           | 2019-08-24 | 2019-08-24                  |
| 5           | 3           | 2019-08-21 | 2019-08-22                  |
| 6           | 2           | 2019-08-11 | 2019-08-13                  |
| 7           | 4           | 2019-08-09 | 2019-08-09                  |
+-------------+-------------+------------+-----------------------------+

Output:

 delivery_id | customer_id | order_date | customer_pref_delivery_date 
-------------+-------------+------------+-----------------------------
           1 |           1 | 2019-08-01 | 2019-08-02
(1 row)
2
  • 4
    Your alias D and your alias d are treated case-insensitive, so they are considered to be the same. Use a different alias for the table in your subquery. (Or explicitly quote them both, in both places - then they will be treated case-sensitive.) Commented Oct 7 at 10:58
  • Debug questions require a minimal reproducible example in the post, not just elsewhere. Basic question can be expected to be faqs. How much research effort is expected of Stack Overflow users? & a zillionth duplicate is not a useful contribution to the site. tour How to Ask Help center It is explicitly not allowed to compose posts using LLMs. Include only what you have explored yourself that you compose yourself after interacting with one. Commented Oct 7 at 22:43

1 Answer 1

2

The problem is that your correlation names d and D are the same. Identifiers are case-insensitive by default.

https://www.postgresql.org/docs/current/sql-syntax-lexical.html includes interesting details about how PostgreSQL identifiers are treated.

This means that the subquery is not examining a subset of rows related to the outer query. It's searching for d.customer_id = d.customer_id, which is true for all rows. So it always returns the same single value for MIN(order_date) with respect to the full set of rows.

The query works if you use distinct correlation names.

WITH temp AS (
    SELECT
        d.delivery_id,
        d.customer_id,
        d.order_date,
        d.customer_pref_delivery_date
    FROM Delivery d
    WHERE order_date = (SELECT MIN(order_date) FROM Delivery d2 WHERE d.customer_id = d2.customer_id)
)
select * from temp;

It also works if you delimit the upper-case correlation name, which causes their case difference to be significant.

WITH temp AS (
    SELECT
        d.delivery_id,
        d.customer_id,
        d.order_date,
        d.customer_pref_delivery_date
    FROM Delivery d
    WHERE order_date = (SELECT MIN(order_date) FROM Delivery "D" WHERE d.customer_id = "D".customer_id)
)
select * from temp;

By the way, in this example there is no reason to use a common table expression. If you just do select * from temp then you might as well just run the inner query without using a CTE.

An alternative method of getting your query result that does use a CTE is to use a window function:

WITH temp AS (
  SELECT delivery_id, customer_id, order_date, customer_pref_delivery_date,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS rownum
  FROM Delivery
) 
SELECT * FROM temp
WHERE rownum = 1;
Sign up to request clarification or add additional context in comments.

5 Comments

Wow, it's such a simple mistake lol. What's more interesting is that both ChatGPT 5 Pro and Gemini 2.5 pro can't solve this
Amazing. You're saying that LLM's don't actually know how to program? Huh. Who knew?
That's not my point, my point is that it's interesting as a "most simple program that ChatGPT can't debug". It would be funny to make a list of stupidly easy programming bugs that ChatGPT can't handle
At least ChatGPT and ClaudeAI came up with the correct answer when I asked them what was wrong. Both LLMs can perform more than 60 to 80 percent of the tasks that professional programmers can accomplish. And much faster.
I don't know what you asked ChatGPT, but this is part of the answer I get when asking what is wrong with your query: PostgreSQL is case-insensitive for table aliases, so Delivery d and Delivery D use the same alias internally, which causes ambiguity and scoping issues.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.