Skip to content

Incorrect SQL: Join dependency on columns from other unreferenced joined cubes is not resolved #9557

@nandresen-stripe

Description

@nandresen-stripe

Describe the bug
Cube fails to generate correct SQL when a join's ON condition relies on columns from other cubes (let's call them "intermediate cubes"), which are themselves joined to the primary query cube. If no fields are explicitly selected from these intermediate cubes, they are not included in the FROM clause, even though they are necessary for a subsequent join's ON condition. This results in an SQL error because the aliases/tables referenced in the final join's ON clause are not available.

In the example below, test_facts joins to merchant_and_product_dims. This join's ON condition ({merchant_dims.merchant_id} = {merchant_and_product_dims.merchant_id} AND {product_dims.product_id} = {merchant_and_product_dims.product_id}) requires merchant_dims and product_dims to be available. However, if no dimensions/measures are selected directly from merchant_dims or product_dims, Cube omits them from the FROM clause, breaking the join to merchant_and_product_dims.

To Reproduce

  1. Define the following Cube schema:

    cubes:
      - name: merchant_dims
        sql: |
          (
            SELECT 101 AS merchant_sk, 'M1' AS merchant_id
            UNION ALL
            SELECT 102 AS merchant_sk, 'M2' AS merchant_id
          )
        public: false
        dimensions:
          - name: merchant_sk
            sql: merchant_sk
            type: number
            primary_key: true
          - name: merchant_id
            sql: merchant_id
            type: string
    
      - name: product_dims
        sql: |
          (
            SELECT 201 AS product_sk, 'P1' AS product_id
            UNION ALL
            SELECT 202 AS product_sk, 'P2' AS product_id
          )
        public: false
        dimensions:
          - name: product_sk
            sql: product_sk
            type: number
            primary_key: true
          - name: product_id
            sql: product_id
            type: string
    
      - name: merchant_and_product_dims
        sql: |
          (
            SELECT 'M1' AS merchant_id, 'P1' AS product_id, 'Organic' AS acquisition_channel
            UNION ALL
            SELECT 'M1' AS merchant_id, 'P2' AS product_id, 'Paid' AS acquisition_channel
            UNION ALL
            SELECT 'M2' AS merchant_id, 'P1' AS product_id, 'Referral' AS acquisition_channel
          )
        public: false
        dimensions:
          - name: product_id
            sql: product_id
            type: string
            primary_key: true
          - name: merchant_id
            sql: merchant_id
            type: string
            primary_key: true
          - name: acquisition_channel
            sql: acquisition_channel
            type: string
    
      - name: test_facts
        sql: |
          (
            SELECT DATE '2023-01-01' AS reporting_date, 101 AS merchant_sk, 201 AS product_sk, 100 AS amount
            UNION ALL
            SELECT DATE '2023-01-01' AS reporting_date, 101 AS merchant_sk, 202 AS product_sk, 150 AS amount
            UNION ALL
            SELECT DATE '2023-01-02' AS reporting_date, 102 AS merchant_sk, 201 AS product_sk, 200 AS amount
          )
        public: false
        joins:
          - name: merchant_dims
            relationship: many_to_one
            sql: "{CUBE}.merchant_sk = {merchant_dims.merchant_sk}"
          - name: product_dims
            relationship: many_to_one
            sql: "{CUBE}.product_sk = {product_dims.product_sk}"
          - name: merchant_and_product_dims # This join depends on merchant_dims and product_dims
            relationship: many_to_one
            sql: "{merchant_dims.merchant_id} = {merchant_and_product_dims.merchant_id} AND {product_dims.product_id} = {merchant_and_product_dims.product_id}"
        dimensions:
          - name: reporting_date
            sql: reporting_date
            type: time
            primary_key: true
          - name: merchant_sk
            sql: merchant_sk
            type: number
            primary_key: true
          - name: product_sk
            sql: product_sk
            type: number
            primary_key: true
          - name: acquisition_channel # This dimension triggers the join to merchant_and_product_dims
            sql: "{merchant_and_product_dims.acquisition_channel}"
            type: string
        measures:
          - name: amount_sum
            sql: amount
            type: sum
  2. Execute a query that selects the acquisition_channel from test_facts (which implicitly requires joining merchant_and_product_dims) and any measure. For example:

    {
      "measures": ["test_facts.amount_sum"],
      "dimensions": [
        "test_facts.reporting_date",
        "test_facts.merchant_sk",
        "test_facts.product_sk",
        "test_facts.acquisition_channel"
      ]
    }

Actual Results:
The generated SQL is incorrect and causes an error as the "merchant_dims".merchant_id and "product_dims".product_id columns can not be resolved.

This is because merchant_dims and product_dims tables are referenced in the ON clause for merchant_and_product_dims but are not included in the FROM clause.

-- Incorrect SQL Generated
SELECT
  "test_facts".reporting_date,
  "test_facts".merchant_sk AS merchant_sk,
  "test_facts".product_sk AS product_sk,
  "merchant_and_product_dims".acquisition_channel AS acquisition_channel,
  SUM("test_facts".amount) AS amount_sum
FROM
  (
    (
      SELECT
        CAST('2023-01-01' AS DATE) AS reporting_date,
        101 AS merchant_sk,
        201 AS product_sk,
        100 AS amount
      UNION ALL
      SELECT
        CAST('2023-01-01' AS DATE) AS reporting_date,
        101 AS merchant_sk,
        203 AS product_sk,
        150 AS amount
      UNION ALL
      SELECT
        CAST('2023-01-02' AS DATE) AS reporting_date,
        102 AS merchant_sk,
        201 AS product_sk,
        200 AS amount
      UNION ALL
      SELECT
        CAST('2023-01-02' AS DATE) AS reporting_date,
        103 AS merchant_sk,
        202 AS product_sk,
        200 AS amount
    )
  ) AS "test_facts"
  LEFT JOIN (
    (
      SELECT
        'M1' AS merchant_id,
        'P1' AS product_id,
        'Organic' AS acquisition_channel
      UNION ALL
      SELECT
        'M1' AS merchant_id,
        'P2' AS product_id,
        'Paid' AS acquisition_channel
      UNION ALL
      SELECT
        'M2' AS merchant_id,
        'P1' AS product_id,
        'Referral' AS acquisition_channel
    )
  ) AS "merchant_and_product_dims" ON "merchant_dims".merchant_id = "merchant_and_product_dims".merchant_id
  AND "product_dims".product_id = "merchant_and_product_dims".product_id
GROUP BY
  1,
  2,
  3,
  4

Expected behavior
Cube should recognize that the join to merchant_and_product_dims depends on merchant_dims and product_dims being joined to test_facts first. Therefore, merchant_dims and product_dims should be included in the FROM clause.

-- Expected SQL
SELECT
  "test_facts".reporting_date,
  "test_facts".merchant_sk AS merchant_sk,
  "test_facts".product_sk AS product_sk,
  "merchant_and_product_dims".acquisition_channel AS acquisition_channel,
  SUM("test_facts".amount) AS amount_sum
FROM
  (
    (
      SELECT
        CAST('2023-01-01' AS DATE) AS reporting_date,
        101 AS merchant_sk,
        201 AS product_sk,
        100 AS amount
      UNION ALL
      SELECT
        CAST('2023-01-01' AS DATE) AS reporting_date,
        101 AS merchant_sk,
        203 AS product_sk,
        150 AS amount
      UNION ALL
      SELECT
        CAST('2023-01-02' AS DATE) AS reporting_date,
        102 AS merchant_sk,
        201 AS product_sk,
        200 AS amount
      UNION ALL
      SELECT
        CAST('2023-01-02' AS DATE) AS reporting_date,
        103 AS merchant_sk,
        202 AS product_sk,
        200 AS amount
    )
  ) AS "test_facts"
  LEFT JOIN (
    SELECT
      101 AS merchant_sk,
      'M1' AS merchant_id
    UNION ALL
    SELECT
      102 AS merchant_sk,
      'M1' AS merchant_id
    UNION ALL
    SELECT
      103 AS merchant_sk,
      'M2' AS merchant_id
  ) AS "merchant_dims" -- This join should be present
  ON "test_facts".merchant_sk = "merchant_dims".merchant_sk
  LEFT JOIN (
    SELECT
      201 AS product_sk,
      'P1' AS product_id
    UNION ALL
    SELECT
      202 AS product_sk,
      'P1' AS product_id
    UNION ALL
    SELECT
      203 AS product_sk,
      'P2' AS product_id
  ) AS "product_dims" -- This join should be present
  ON "test_facts".product_sk = "product_dims".product_sk
  LEFT JOIN (
    (
      SELECT
        'M1' AS merchant_id,
        'P1' AS product_id,
        'Organic' AS acquisition_channel
      UNION ALL
      SELECT
        'M1' AS merchant_id,
        'P2' AS product_id,
        'Paid' AS acquisition_channel
      UNION ALL
      SELECT
        'M2' AS merchant_id,
        'P1' AS product_id,
        'Referral' AS acquisition_channel
    )
  ) AS "merchant_and_product_dims" ON "merchant_dims".merchant_id = "merchant_and_product_dims".merchant_id
  AND "product_dims".product_id = "merchant_and_product_dims".product_id
GROUP BY
  1,
  2,
  3,
  4

Version:
v1.3.6

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      close