23

In PostgreSQL 9.4 the window functions have the new option of a FILTER to select a sub-set of the window frame for processing. The documentation mentions it, but provides no sample. An online search yields some samples, including from 2ndQuadrant but all that I found were rather trivial examples with constant expressions. What I am looking for is a filter expression that includes the value of the current row.

Assume I have a table with a bunch of columns, one of which is of date type:

col1 | col2 |     dt
------------------------
  1  |  a   | 2015-07-01
  2  |  b   | 2015-07-03
  3  |  c   | 2015-07-10
  4  |  d   | 2015-07-11
  5  |  e   | 2015-07-11
  6  |  f   | 2015-07-13
...

A window definition for processing on the date over the entire table is trivially constructed: WINDOW win AS (ORDER BY dt)

I am interested in knowing how many rows are present in, say, the 4 days prior to the current row (inclusive). So I want to generate this output:

col1 | col2 |     dt     | count
--------------------------------
  1  |  a   | 2015-07-01 |   1
  2  |  b   | 2015-07-03 |   2
  3  |  c   | 2015-07-10 |   1
  4  |  d   | 2015-07-11 |   3
  5  |  e   | 2015-07-11 |   3
  6  |  f   | 2015-07-13 |   4
...

The FILTER clause of the window functions seems like the obvious choice:

count(*) FILTER (WHERE current_row.dt - dt <= 4) OVER win

But how do I specify current_row.dt (for lack of a better syntax)? Is this even possible?

If this is not possible, are there other ways of selecting date ranges in a window frame? The frame specification is no help as it is all row-based.

I am not interested in alternative solutions using sub-queries, it has to be based on window processing.

3
  • Interesting problem. As Postgres regular you should know to provide a table definition (or better yet, a complete CREATE TABLE script) Commented Jul 14, 2015 at 4:55
  • @ErwinBrandstetter Is this something that would merit flagging as a "feature request"? To me it seems quite valuable to be able to filter the window frame on the basis of some condition present in the current row. Commented Jul 16, 2015 at 14:15
  • Yes, It would be useful for various problems, I have seen several related questions here on SO. I am afraid the feature would require a lot of work due to principal changes in the implementation, but it won't hurt to document public interest in the feature. Seems related to the missing RANGE BETWEEN ... PRECEDING/FOLLOWING feature that is already documented in the ToDo Wiki Commented Jul 16, 2015 at 14:49

3 Answers 3

7

You are not actually aggregating rows, so the new aggregate FILTER clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS clause.

To make that work, aggregate counts per day and LEFT JOIN to a full set of days in range. Then you can apply a window function:

SELECT t.*, ct.ct_last4days
FROM  (
   SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
   FROM  (
      SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
      FROM   tbl t1
      ) d
   LEFT   JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
   ) ct
JOIN  tbl t USING (dt);

Omitting ORDER BY dt in the widow frame definition usually works, since the order is carried over from generate_series() in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY and it might break in more complex queries.

SQL Fiddle.

Related:

Sign up to request clarification or add additional context in comments.

2 Comments

When the FILTER clause applies to aggregate functions in the presence of an OVER clause they act as window functions (...) otherwise they act as regular aggregates so that would invalidate your opening sentence, if I understand both the documentation and you correctly. Accepted for being a working solution and meticulous attention to detail, but only reluctantly so for the dependency on sub-queries, for which you are obviously not to blame.
@Patrick: Aggregate functions (can) fold rows, while window functions don't. You obviously don't want to fold rows, so you are in the realm of window functions. The FILTER clause implements some features of window functions for aggregate functions, but you need a window function anyway.
5

I don't think there is any syntax that means "current row" in an expression. The gram.y file for postgres makes a filter clause take just an a_expr, which is just the normal expression clauses. There is nothing specific to window functions or filter clauses in an expression. As far as I can find, the only current row notion in a window clause is for specifying the window frame boundaries. I don't think this gets you what you want.

It's possible that you could get some traction from an enclosing query:

http://www.postgresql.org/docs/current/static/sql-expressions.html

When an aggregate expression appears in a subquery (see Section 4.2.11 and Section 9.22), the aggregate is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate's arguments (and filter_clause if any) contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query.

but it's not obvious to me how.

Comments

3

https://www.postgresql.org/docs/release/11.0/

Window functions now support all framing options shown in the SQL:2011 standard, including RANGE distance PRECEDING/FOLLOWING, GROUPS mode, and frame exclusion options

https://dbfiddle.uk/p-TZHp7s

You can do something like

count(dt) over(order by dt RANGE BETWEEN INTERVAL '3 DAYS' PRECEDING AND CURRENT ROW)

1 Comment

This answer is the exact feature the OP (and I!) was looking for. Sadly the feature was only 7 years too late for the OP. Glad I found it though!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.