22

Is this correct?

SELECT * 
FROM   contract 
JOIN   team USING (name_team) 
JOIN   player USING(name_player) 
WHERE  name_team = ? 
AND    DATE_PART('YEAR',date_join)>= ? 
AND    DATE_PART('YEAR',date_leave)<= ?

My table contract has the player name, team name and the dates when he joined and left the club.
I want to make a function listing all players that were on the team in specific years.
The above query doesn't seem to be working ...

2
  • there is no error.. the results are not correct.. Commented Dec 18, 2010 at 23:29
  • Then you don't have data for the criteria you've specified Commented Dec 18, 2010 at 23:30

2 Answers 2

108

The currently accepted answer does not answer the question. And it is wrong in principle. a BETWEEN x AND y translates to:

a >= x AND a <= y

Including the upper bound, while people typically need to exclude it:

a >= x AND a < y

With dates you can easily adjust. For the year 2009 use '2009-12-31' as upper bound.
But it's not as simple with timestamps which allow fractional digits. Modern Postgres versions use an 8-byte integer internally to store up to 6 fractional seconds (µs resolution). Knowing this we could still make it work, but that's not intuitive and depends on an implementation detail. Bad idea.

Moreover, a BETWEEN x AND y does not find overlapping ranges. We need:

b >= x AND a < y

And players that never left are not considered, yet.

Proper answer

Assuming the year 2009, I'll rephrase the question without changing its meaning:

"Find all players of a given team who joined before 2010 and did not leave before 2009."

Basic query:

SELECT p.*
FROM   team     t
JOIN   contract c USING (name_team) 
JOIN   player   p USING (name_player) 
WHERE  t.name_team = ? 
AND    c.date_join  <  date '2010-01-01'
AND    c.date_leave >= date '2009-01-01';

But there is more:

If referential integrity is enforced with FK constraints, the table team itself is just noise in the query and can be removed.

While the same player can leave and rejoin the same team, we also need to fold possible duplicates, for instance with DISTINCT.

And we may need to provide for a special case: players that never left. Assuming those players have NULL in date_leave.

"A player that is not known to have left is assumed to be playing for the team to this day."

Refined query:

SELECT DISTINCT p.* 
FROM   contract c
JOIN   player   p USING (name_player) 
WHERE  c.name_team = ? 
AND    c.date_join  <  date '2010-01-01'
AND   (c.date_leave >= date '2009-01-01' OR c.date_leave IS NULL);

Operator precedence works against us, AND binds before OR. We need parentheses.

Related answer with optimized DISTINCT (if duplicates are common):

Typically, names of natural persons aren't unique and a surrogate primary key is used. But, obviously, name_player is the primary key of player. If all you need is player names we don't need the table player in the query, either:

SELECT DISTINCT name_player 
FROM   contract
WHERE  name_team = ? 
AND    date_join  <  date '2010-01-01'
AND   (date_leave >= date '2009-01-01' OR date_leave IS NULL);

SQL OVERLAPS operator

The manual:

OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant.

To take care of potential NULL values, COALESCE seems easiest:

SELECT DISTINCT name_player 
FROM   contract
WHERE  name_team = ? 
AND    (date_join, COALESCE(date_leave, CURRENT_DATE)) OVERLAPS
       (date '2009-01-01', date '2010-01-01');  -- upper bound excluded

Range type with index support

In Postgres 9.2 or later you can also operate with actual range types:

SELECT DISTINCT name_player 
FROM   contract
WHERE  name_team = ? 
AND    daterange(date_join, date_leave) &&
       daterange '[2009-01-01,2010-01-01)';  -- upper bound excluded

Range types add some overhead and occupy more space. 2 x date = 8 bytes; 1 x daterange = 14 bytes on disk or 17 bytes in RAM. But in combination with the overlap operator && the query can be supported with a GiST index.

Also, no need to special-case NULL values. NULL means "open range" in a range type - exactly what we need. The table definition doesn't even have to change: we can create the range type on the fly - and support the query with a matching expression index:

CREATE INDEX mv_stock_dr_idx ON mv_stock USING gist (daterange(date_join, date_leave));

Related:

Sign up to request clarification or add additional context in comments.

6 Comments

I'll up vote you but you know no one is ever gonna see this being it's 2+ years old right? :)
@ScottMarlowe: Thanks. Well, an edit to your answer pulled this one into the "active" list and I couldn't refrain from correcting what seemed incorrect to me. SO is the place to act out on one's OCD and get upvotes for it, isn't it? :)
Agreed. And yeah overlaps is the better answer by far.
This is the most complete and correct answer, because other answers does not resolve the case when the player joined the team BEFORE and left if AFTER the requested period of time.
@ScottMarlowe, I saw it just now.
|
3

Why not use between without the date part thing:

WHERE datefield BETWEEN '2009-10-10 00:00:00' AND '2009-10-11 00:00:00'

or something like that?

3 Comments

UPPER CASE IS NOT REQUIRED BY THE SQL SPEC WHY WAS MY ANSWER EDITED? See, upper case isn't necessarily easier to read. Joshua, please don't edit my answer because it offends your upper case sensibilities.
For those learning, I think it is much easier to read because it extremely clearly distinguishes SQL keywords from developer editable object names.
I think that the OP was asking how to compare date ranges. Isn't this condition to check whether a datetime is in a time range?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.