-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
Cube fails to generate correct SQL when a join's ON
condition relies on columns from other cubes (let's call them "intermediate cubes"), which are themselves joined to the primary query cube. If no fields are explicitly selected from these intermediate cubes, they are not included in the FROM
clause, even though they are necessary for a subsequent join's ON
condition. This results in an SQL error because the aliases/tables referenced in the final join's ON
clause are not available.
In the example below, test_facts
joins to merchant_and_product_dims
. This join's ON
condition ({merchant_dims.merchant_id} = {merchant_and_product_dims.merchant_id} AND {product_dims.product_id} = {merchant_and_product_dims.product_id}
) requires merchant_dims
and product_dims
to be available. However, if no dimensions/measures are selected directly from merchant_dims
or product_dims
, Cube omits them from the FROM
clause, breaking the join to merchant_and_product_dims
.
To Reproduce
-
Define the following Cube schema:
cubes: - name: merchant_dims sql: | ( SELECT 101 AS merchant_sk, 'M1' AS merchant_id UNION ALL SELECT 102 AS merchant_sk, 'M2' AS merchant_id ) public: false dimensions: - name: merchant_sk sql: merchant_sk type: number primary_key: true - name: merchant_id sql: merchant_id type: string - name: product_dims sql: | ( SELECT 201 AS product_sk, 'P1' AS product_id UNION ALL SELECT 202 AS product_sk, 'P2' AS product_id ) public: false dimensions: - name: product_sk sql: product_sk type: number primary_key: true - name: product_id sql: product_id type: string - name: merchant_and_product_dims sql: | ( SELECT 'M1' AS merchant_id, 'P1' AS product_id, 'Organic' AS acquisition_channel UNION ALL SELECT 'M1' AS merchant_id, 'P2' AS product_id, 'Paid' AS acquisition_channel UNION ALL SELECT 'M2' AS merchant_id, 'P1' AS product_id, 'Referral' AS acquisition_channel ) public: false dimensions: - name: product_id sql: product_id type: string primary_key: true - name: merchant_id sql: merchant_id type: string primary_key: true - name: acquisition_channel sql: acquisition_channel type: string - name: test_facts sql: | ( SELECT DATE '2023-01-01' AS reporting_date, 101 AS merchant_sk, 201 AS product_sk, 100 AS amount UNION ALL SELECT DATE '2023-01-01' AS reporting_date, 101 AS merchant_sk, 202 AS product_sk, 150 AS amount UNION ALL SELECT DATE '2023-01-02' AS reporting_date, 102 AS merchant_sk, 201 AS product_sk, 200 AS amount ) public: false joins: - name: merchant_dims relationship: many_to_one sql: "{CUBE}.merchant_sk = {merchant_dims.merchant_sk}" - name: product_dims relationship: many_to_one sql: "{CUBE}.product_sk = {product_dims.product_sk}" - name: merchant_and_product_dims # This join depends on merchant_dims and product_dims relationship: many_to_one sql: "{merchant_dims.merchant_id} = {merchant_and_product_dims.merchant_id} AND {product_dims.product_id} = {merchant_and_product_dims.product_id}" dimensions: - name: reporting_date sql: reporting_date type: time primary_key: true - name: merchant_sk sql: merchant_sk type: number primary_key: true - name: product_sk sql: product_sk type: number primary_key: true - name: acquisition_channel # This dimension triggers the join to merchant_and_product_dims sql: "{merchant_and_product_dims.acquisition_channel}" type: string measures: - name: amount_sum sql: amount type: sum
-
Execute a query that selects the
acquisition_channel
fromtest_facts
(which implicitly requires joiningmerchant_and_product_dims
) and any measure. For example:{ "measures": ["test_facts.amount_sum"], "dimensions": [ "test_facts.reporting_date", "test_facts.merchant_sk", "test_facts.product_sk", "test_facts.acquisition_channel" ] }
Actual Results:
The generated SQL is incorrect and causes an error as the "merchant_dims".merchant_id
and "product_dims".product_id
columns can not be resolved.
This is because merchant_dims
and product_dims
tables are referenced in the ON
clause for merchant_and_product_dims
but are not included in the FROM
clause.
-- Incorrect SQL Generated
SELECT
"test_facts".reporting_date,
"test_facts".merchant_sk AS merchant_sk,
"test_facts".product_sk AS product_sk,
"merchant_and_product_dims".acquisition_channel AS acquisition_channel,
SUM("test_facts".amount) AS amount_sum
FROM
(
(
SELECT
CAST('2023-01-01' AS DATE) AS reporting_date,
101 AS merchant_sk,
201 AS product_sk,
100 AS amount
UNION ALL
SELECT
CAST('2023-01-01' AS DATE) AS reporting_date,
101 AS merchant_sk,
203 AS product_sk,
150 AS amount
UNION ALL
SELECT
CAST('2023-01-02' AS DATE) AS reporting_date,
102 AS merchant_sk,
201 AS product_sk,
200 AS amount
UNION ALL
SELECT
CAST('2023-01-02' AS DATE) AS reporting_date,
103 AS merchant_sk,
202 AS product_sk,
200 AS amount
)
) AS "test_facts"
LEFT JOIN (
(
SELECT
'M1' AS merchant_id,
'P1' AS product_id,
'Organic' AS acquisition_channel
UNION ALL
SELECT
'M1' AS merchant_id,
'P2' AS product_id,
'Paid' AS acquisition_channel
UNION ALL
SELECT
'M2' AS merchant_id,
'P1' AS product_id,
'Referral' AS acquisition_channel
)
) AS "merchant_and_product_dims" ON "merchant_dims".merchant_id = "merchant_and_product_dims".merchant_id
AND "product_dims".product_id = "merchant_and_product_dims".product_id
GROUP BY
1,
2,
3,
4
Expected behavior
Cube should recognize that the join to merchant_and_product_dims
depends on merchant_dims
and product_dims
being joined to test_facts
first. Therefore, merchant_dims
and product_dims
should be included in the FROM
clause.
-- Expected SQL
SELECT
"test_facts".reporting_date,
"test_facts".merchant_sk AS merchant_sk,
"test_facts".product_sk AS product_sk,
"merchant_and_product_dims".acquisition_channel AS acquisition_channel,
SUM("test_facts".amount) AS amount_sum
FROM
(
(
SELECT
CAST('2023-01-01' AS DATE) AS reporting_date,
101 AS merchant_sk,
201 AS product_sk,
100 AS amount
UNION ALL
SELECT
CAST('2023-01-01' AS DATE) AS reporting_date,
101 AS merchant_sk,
203 AS product_sk,
150 AS amount
UNION ALL
SELECT
CAST('2023-01-02' AS DATE) AS reporting_date,
102 AS merchant_sk,
201 AS product_sk,
200 AS amount
UNION ALL
SELECT
CAST('2023-01-02' AS DATE) AS reporting_date,
103 AS merchant_sk,
202 AS product_sk,
200 AS amount
)
) AS "test_facts"
LEFT JOIN (
SELECT
101 AS merchant_sk,
'M1' AS merchant_id
UNION ALL
SELECT
102 AS merchant_sk,
'M1' AS merchant_id
UNION ALL
SELECT
103 AS merchant_sk,
'M2' AS merchant_id
) AS "merchant_dims" -- This join should be present
ON "test_facts".merchant_sk = "merchant_dims".merchant_sk
LEFT JOIN (
SELECT
201 AS product_sk,
'P1' AS product_id
UNION ALL
SELECT
202 AS product_sk,
'P1' AS product_id
UNION ALL
SELECT
203 AS product_sk,
'P2' AS product_id
) AS "product_dims" -- This join should be present
ON "test_facts".product_sk = "product_dims".product_sk
LEFT JOIN (
(
SELECT
'M1' AS merchant_id,
'P1' AS product_id,
'Organic' AS acquisition_channel
UNION ALL
SELECT
'M1' AS merchant_id,
'P2' AS product_id,
'Paid' AS acquisition_channel
UNION ALL
SELECT
'M2' AS merchant_id,
'P1' AS product_id,
'Referral' AS acquisition_channel
)
) AS "merchant_and_product_dims" ON "merchant_dims".merchant_id = "merchant_and_product_dims".merchant_id
AND "product_dims".product_id = "merchant_and_product_dims".product_id
GROUP BY
1,
2,
3,
4
Version:
v1.3.6