2

I have a case where the user can specify an arbitrary number of parameters that will be filtered against a table. Simply put, there are a series of parameters with 64 buckets each. All together, this represents a linear sequence of numbers. Each record contains any arbitrary number of bucket points.

Further, these number are in ranges within each bucket.

The user can specify the desired value range of any number of arbitrary records. Records where an overlap for ALL the parameters (buckets) specified are returned.

You'll notice there is a low and a high. This is the range. By seeing if either overlaps, I can get the results considerably faster than using a range query. This is an optimization technique.

Here is an example with two conditions:

SELECT  id
FROM    mytable2
WHERE   (val_low && (ARRAY(SELECT generate_series((0 * 64) + 20, (0 * 64) + 28))) OR
        val_high && (ARRAY(SELECT generate_series((0 * 64) + 20, (0 * 64) + 28))))
AND     (val_low && (ARRAY(SELECT generate_series((1 * 64) + 12, (1 * 64) + 15))) OR 
        val_high && (ARRAY(SELECT generate_series((1 * 64) + 12, (1 * 64) + 15))))

The val_low and val_high buckets are tested for intersection against an array of specified ranges.

The problem is I have to dynamically build this query in a function. The parameter list is passed to the function (as a user defined type [array]), the query dynamically generated, then executed.

It works, but I want to be able to do this without having to write SQL in a function.

Specifically, the function will be passed a custom type array as follows:

param_num int,
val_low   int,
val_high  int

The values in the generate_series function call are (param_num * 64) + val_low, (param_num * 64) + val_high.

Is this possible?

Sample data creation:

DROP TABLE IF EXISTS
        mytable2;

CREATE TABLE
        mytable2
        (
                id          INT NOT NULL PRIMARY KEY,
                val_low     int[],
                val_high    int[]
        );

SELECT  SETSEED(0.20130725);

WITH    t AS
        (
        SELECT  id,
                1 + FLOOR(RANDOM() * 24) AS l1, (RANDOM() * 8)::int AS h1,
                1 + FLOOR(RANDOM() * 24) AS l2, (RANDOM() * 8)::int AS h2,
                1 + FLOOR(RANDOM() * 24) AS l3, (RANDOM() * 8)::int AS h3,
                1 + FLOOR(RANDOM() * 24) AS l4, (RANDOM() * 8)::int AS h4
        FROM    generate_series(1, 500000) id
        )
INSERT
INTO    mytable2
SELECT  T.id, array[t.l1, (1 * 64) + t.l2, (2 * 64) + t.l3, (3 * 64) + t.l4], 
        array[t.l1 + t.h1, (1 * 64) + t.l2 + t.h2, (2 * 64) + t.l3 + t.h3, 
        (3 * 64) + t.l4 + t.h4]
FROM    T;

CREATE INDEX
    ix_mytable2_vhstore_low
ON      mytable2
USING   GIN (val_low);


CREATE INDEX
    ix_mytable2_vhstore_high
ON      mytable2
USING   GIN (val_high);

Sample query:

--EXPLAIN ANALYZE
SELECT COUNT(1)
FROM
(
    SELECT  id
    FROM    mytable2
    WHERE   (val_low && (ARRAY(SELECT generate_series(20, 28))) OR val_high &&
                (ARRAY(SELECT generate_series(20, 28))))
        AND (val_low && (ARRAY(SELECT generate_series((1 * 64) + 12, (1 * 64) + 15)))
                OR val_high && (ARRAY(SELECT generate_series((1 * 64) + 12, (1 * 64) + 15))))
) m;

Results: 54983

2
  • I don't see the table values being used. That means either all or none rows will be returned. Correct? Commented Jul 27, 2013 at 13:12
  • All rows where either val_low or val_high overlap ALL of the supplied series are returned. Commented Jul 27, 2013 at 13:16

1 Answer 1

1

SQL Fiddle

with s as (
    select array(select generate_series(
            a[i][1] * 64 + a[i][2], a[i][1] * 64 + a[i][3]
        )) as a
    from
        (values (array[[0,20,28],[1,12,15]])) s(a)
        cross join
        generate_series(1, array_length(array[[0,20,28],[1,12,15]], 1)) g(i)
)
select id
from mytable2 cross join s
group by id
having count((not(val_low && a or val_high && a)) or null) = 0

array[[0,20,28],[1,12,15]] is the passed parameter

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for this. I'll have to play with it for a while since this is returning more records than it should.
The source data is a range type.
the random is seeded, so the results will be the same. But, you can use the query sample to see what the results should be, and compare to your new function.
The setseed make it pseudo-random, so we will both get the same set. But, it isn't an issue. You can compare the results of the two queries and if the same, your one works.
Remove the "[1,12,15]" part of the array (leaving only the first segment), and you'll see 0 results, instead of 187168. Also, on the fiddle page, your source was int, not int[], so I'm not sure that will process the same.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.