0

I'm having some trouble formulating the correct recursive behaviour to get what I want in SQL. I'm limited to the BigQuery environment and I want to avoid using any JavaScript if I can, so I wanted to try and recreate this behaviour just using CTEs if possible. I can do exactly what I want in Python, but have been unable to do it in SQL. Yes I know SQL is not really designed for this, but I do have a reason for it so would like to try and get it to work. I want to be able to provide any length array and return a list of permutations. For example:

INPUT [1, 1, 1] OUTPUT [0, 0, 0]
INPUT [2, 1, 1] OUTPUT [[0, 0, 0], [1, 0, 0]]
INPUT [2, 1, 2] OUTPUT [[0, 0, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
INPUT [1, 5] OUTPUT [[0,0], [0,1], [0,2], [0,3], [0,4]]

I don't mind how the inputs and outputs are formatted as long as I can parse them one way or another. E.g. returning rows of strings are fine, each element in a separate row is fine as I can use array_agg, etc.

In Python doing this recursively or using for-loops is pretty trivial. I start with a list of zeroes and work my way along each index, incrementing the value at that index until it reaches input value-1 and store each permutation before moving on.

In SQL this is what I have so far, which is only really the first step:

WITH RECURSIVE
  OrigSeq AS (SELECT [2, 1, 2] as some_numbers), -- input sequence
  BaseSeq AS (SELECT ARRAY(SELECT 0 as a FROM UNNEST(OrigSeq.some_numbers)) AS base FROM OrigSeq), -- get array of 0s of same size as input sequence
  Sequences AS (
    (SELECT 0 AS perm_id, 0 as idx, BaseSeq.base[0] as iter, OrigSeq.some_numbers[0]-1 AS orig, BaseSeq.base as base, OrigSeq.some_numbers as num FROM OrigSeq, BaseSeq)
    UNION ALL
    -- increment index
    SELECT perm_id, idx+1 as idx, base[idx+1] as iter, num[idx+1]-1 as orig, base, num FROM Sequences WHERE idx < ARRAY_LENGTH(num)-1
  ),
  Seq2 AS (
    SELECT * FROM Sequences
    UNION ALL
    -- parse iters - not doing this quite right and my brain stops functioning around here
    SELECT perm_id+1, idx, base[idx]+1 as iter, orig, base, num FROM Sequences where iter <= orig
  ),
  Seq3 AS (
    Select * From Seq2
    UNION ALL
    SELECT perm_id, idx, iter, orig, base, num from Sequences
  )
SELECT perm_id, idx, orig, iter
FROM Seq3
ORDER BY perm_id, idx

Any guidance appreciated.

2 Answers 2

2

Recursive solution example for BigQuery SQL
Source:

WITH RECURSIVE
 OrigSeq as(
   select [1, 1, 1] as some_numbers --  [0, 0, 0]
  union all
  select [2, 1, 1] --  [[0, 0, 0], [1, 0, 0]]
  union all
  select [2, 1, 2] --  [[0, 0, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
  union all
  select [1, 5] as some_numbers --  [[0,0], [0,1], [0,2], [0,3], [0,4]]
)

Query:

  1. It is necessary to distinguish the rows somehow. Just give some kind of Id.
  2. Expand every array element from 0 to some_numbers[N]-1
  3. Recursively: Cartesian product sets {0,...,N1}X{0,...,N2}X{0,...,N3}...
  4. Concat back
,OrigSeqOrdered as( 
  select * ,row_number()over(order by (select null)) id
  from OrigSeq
)
,r as(
  select id, 0 lvl, nn,some_numbers,ARRAY_LENGTH(some_numbers) l
    ,[cast(nn as string)] arr
  from OrigSeqOrdered
  cross join UNNEST(GENERATE_ARRAY(0, some_numbers[0]-1)) as nn 
  union all
  select id,lvl+1, nn,some_numbers,l
    ,array_concat(arr,[cast(nn as string)]) arr
  from r
  cross join UNNEST(GENERATE_ARRAY(0, some_numbers[lvl+1]-1)) as nn
  where (lvl+1)<l
)
select id ,concat("[ [",array_to_string(array_agg(array_to_string(arr,",")),"], [:"),"] ]") sa
from r
where lvl=(l-1)
group by id
order by id 
id sa
1 "[ [0,0,0] ]"
2 "[ [1,0,0], [:0,0,0] ]"
3 "[ [1,0,1], [:1,0,0], [:0,0,1], [:0,0,0] ]"
4 "[ [0,1], [:0,2], [:0,3], [:0,4], [:0,0] ]"

Or as array<Int64>

r as(
  select id,0 lvl, nn,some_numbers
    ,ARRAY_LENGTH(some_numbers) l  -- array length, used for recursion stop
    ,[nn] arr
  from OrigSeqOrdered 
  cross join UNNEST(GENERATE_ARRAY(0, some_numbers[0]-1)) as nn 
  union all
  select id
     ,lvl+1, nn,some_numbers,l
    ,array_concat(arr,[nn]) arr
  from r
  cross join UNNEST(GENERATE_ARRAY(0, some_numbers[lvl+1]-1)) as nn 
  where (lvl+1)<l
)
select id ,some_numbers, arr,lvl,nn
from r
where lvl=(l-1)  -- only completed array's
order by id,lvl

Details for cartesian product of sets: Set A{0,1}, set B{0,1}, Set C{0,1,2}.( For OP array[2,2,3])

Create array for some_numbers[0]

GENERATE_ARRAY(0, some_numbers[0]-1)
3->(3-1)
Output: [0,1,2]

Expand array to set of rows

UNNEST([0,1,2])
Output:{0}
       {1}
       {2}

Recursive Cartesian product

Anchor part of recursive query
|  First recursion
|  |            Next recursion  
↓  ↓            ↓ 
A  x B     =    x C   =
          a,b       a,b,c
{0} {0}  {0,0} ---→{0,0,0}
{1} {1}  {1,0}  |-→{0,0,1}
         {0,1}  └-→{0,0,2}
         {1,1}     {1,0,0}
                   {1,0,1}
                   {1,0,2}
                   {0,1,0}
                   {0,1,1}
                   {0,1,2}
                   {1,1,0}
                   {1,1,1}
                   {1,1,2}

Concatenating arrays

first array_concat([0],0)->[0,0]  
then  array_concat([0,0],1)->[0,0,1]

Fiddle

Sign up to request clarification or add additional context in comments.

1 Comment

I feel like yours are closer to what I was thinking in my head and didn't quite manage to achieve. Without doing some checks it's hard to tell if these would be less or more optimal than Guillaume's solutions. His are less complicated to read, though, so I personally prefer that in code. Thanks for your contribution!
1

Recursivity to append

A first recursive loop can compute all possible values at each position,
then a second recursive one will append to an array starting from the empty array.

WITH RECURSIVE
  OrigSeq AS (SELECT ARRAY[2, 1, 2] AS a),
  -- List the indices of our array (which will give us the number of cells too):
  Pos AS (SELECT n, pos FROM OrigSeq, UNNEST(a) AS n WITH OFFSET AS pos),
  -- For each position list its possible values:
  Poss AS
  (
    SELECT n - 1 AS n, Pos.pos FROM Pos -- Per specification, "2" should give a range from 0 to 1 (2 is the count, not the max), so start one step beyond.
    UNION ALL
    SELECT n - 1, pos FROM Poss WHERE n > 0
  ),
  -- Start from an empty array, generate as many arrays of 1 element as there are possibilities for cell 1, and so on:
  Combi AS
  (
    SELECT [] a, 0 next
    UNION ALL
    SELECT ARRAY_CONCAT(a, [ n ]), next + 1
    FROM Combi, Poss
    WHERE Poss.pos = next
  )
-- Keep only the completed arrays:
SELECT * from Combi WHERE next = (SELECT COUNT(1) FROM Pos);
a lvl
{1,0,1} 3
{0,0,1} 3
{1,0,0} 3
{0,0,0} 3

See the PostgreSQL equivalent in a fiddle:

Recursivity to decrement

There is a way to first generate all positions as posToDecrement,
then recursively starting from the initial numbers array as a, joining unnest(a) to posToDecrement to generate the different combinations, and array_agg(a.value) over (order by pos rows between unbounded preceding and unbounded following), discarding rows where any value falls under 0 (min() < 0).

You'll find a fiddle proof-of-concept for PostgreSQL.

However my intuition tells there is room for improvement:

  • given a start array of 3 elements, on each iteration I'd like to generate 3 triplets per previous triplet (thus 1 triplet on first iteration, 3 on the second one, 9 on the third, etc.) where each new triplet is the previous, minus 1 on 1 of its fields (thus if iteration 0 was [3,2,4], iteration 1 should produce [2,2,4],[3,1,4],[3,2,3]); relying on a UNION non-ALL to discard duplicates (at iteration 2 two paths lead to [2,1,4]: [2,2-1,4] and [3-1,1,4]).
  • as UNNEST generates 3 rows per triplet, and a windowed ARRAY_AGG can generate a new array of 3 elements for each of those rows, there should be no need to join anything else than the UNNEST peudo-table.
  • however I'm stuck trying to ARRAY_AGG(prev.n - CASE WHEN v.pos = CURRENT_ROW THEN 1 ELSE 0 END) FROM UNNEST(…) v
    (which already made questions on SO)
  • thus for now I have only the suboptimal DISTINCT after having joined Pos (listing all possible indices)
  • another way could be to ARRAY_CONCAT(ARRAY_AGG(v.n) OVER (… TO 1 PRECEDING), v.n - 1, ARRAY_AGG(v.n) OVER (… FROM 1 FOLLOWING))

2 Comments

Ah that's interesting, I had somehow missed ARRAY_CONCAT when looking into available functions to use, which definitely makes life easier. This does exactly what I want and as expected has passed all the use cases I've provided to it so far. I'll give it until the end of the day to see if anyone else has any alternatives/improvements before I accept this answer.
After I posted my question and slept on the problem, I remembered I have access to copilot. I know AI solutions aren't allowed here so I won't add it, but I will say it is remarkably close to what you have come up with. There are some differences, I think the main one being that the combi and poss CTEs are simplified into one.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.