1

Attempting to optimize a portion of a query that is joining two related tables, and getting odd results compared to other queries in the project with similar structures. Here is a very simplified example where I'm still seeing the issue:

SELECT
    `j`.`job_date` AS `VOUCHERDATE`,
    `labor_equipment`.`time_entry` AS `HOURS`
FROM `uat_portal`.`jobs` `j`
    LEFT JOIN(
        SELECT
            `uat_portal`.`jobs_employees`.`job_id`,
            `uat_portal`.`jobs_employees`.`time_entry`
        FROM
            `uat_portal`.`jobs_employees`
        UNION ALL
        SELECT
            `uat_portal`.`jobs_equipment`.`job_id`,
            `uat_portal`.`jobs_equipment`.`time_entry`
        FROM
            `uat_portal`.`jobs_equipment`
    ) `labor_equipment`
    ON
        `j`.`id` = `labor_equipment`.`job_id`

This yields the following EXPLAIN statement- as you can see all rows are fetched for jobs_employees and jobs_equipment

1   PRIMARY j   index   NULL    idx_jobs_job_date   3   NULL    218110  Using index 
1   PRIMARY <derived2>  ref key0    key0    5   uat_portal.j.id 10      
2   DERIVED jobs_employees  index   NULL    job_id_index    4   NULL    953371  Using index 
3   UNION   jobs_equipment  index   NULL    job_id_index    4   NULL    391702  Using index 

EXPLAIN statement for the above query showing that all rows are fetched for jobs_employees and jobs_equipment

Removing the UNION ALL and just joining one or the other table yields the expected results where only a row or two are fetched:

SELECT
    `j`.`job_date` AS `VOUCHERDATE`,
    `labor_equipment`.`time_entry` AS `HOURS`
FROM `uat_portal`.`jobs` `j`
    LEFT JOIN(
        SELECT
            `uat_portal`.`jobs_employees`.`job_id`,
            `uat_portal`.`jobs_employees`.`time_entry`
        FROM
            `uat_portal`.`jobs_employees`
    ) `labor_equipment`
    ON
        `j`.`id` = `labor_equipment`.`job_id`
1   SIMPLE  j   index   NULL    idx_jobs_job_date   3   NULL    218110  Using index 
1   SIMPLE  jobs_equipment  ref job_id_index    job_id_index    4   uat_portal.j.id 1   Using index 

EXPLAIN statement for the above query showing that only 2 rows are fetched for jobs_employees, which is close to the expected 1

SELECT
    `j`.`job_date` AS `VOURCHERDATE`,
    `labor_equipment`.`time_entry` AS `HOURS`
FROM `uat_portal`.`jobs` `j`
    LEFT JOIN(
        SELECT
            `uat_portal`.`jobs_equipment`.`job_id`,
            `uat_portal`.`jobs_equipment`.`time_entry`
        FROM
            `uat_portal`.`jobs_equipment`
    ) `labor_equipment`
    ON
        `j`.`id` = `labor_equipment`.`job_id`
1   SIMPLE  j   index   NULL    idx_jobs_job_date   3   NULL    218110  Using index 
1   SIMPLE  jobs_equipment  ref job_id_index    job_id_index    4   uat_portal.j.id 1   Using index 

EXPLAIN statement for the above query showing that only 1 row is fetched for jobs_equipment, as expected

With where statements back in place, just this very small and heavily simplified portion of the larger query is taking about 3 seconds with the UNION ALL in place, which isn't terrible but it compounds with the further complexity (phpmyadmin currently cannot load the associated View at all).

I saw something similar with UNION ALL's in other portions of the project causing similar issues, and that came down to the fields in the subquery's SELECT being different types (I had 3 tables UNION ALL'd, one with an int primary key and two with Unsigned BigInt primary keys).

In this case though, everything seems to line up for all three tables involved in the query. Here are my SHOW CREATE TABLE statements for all three tables:

CREATE TABLE `jobs` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `id_bigint_test` bigint(20) unsigned NOT NULL,
  `parent_job` int(11) NOT NULL,
  `workorder_id` int(11) DEFAULT NULL,
  `wo_day_id` int(11) NOT NULL,
  `quote_id` int(11) NOT NULL,
  `job_type` varchar(100) NOT NULL,
  `job_name` varchar(255) NOT NULL,
  `job_number` varchar(25) NOT NULL,
  `job_date` date NOT NULL,
  `job_color` varchar(10) NOT NULL,
  `onsite_time` time NOT NULL,
  `sales_person` varchar(25) NOT NULL,
  `badging_needed` tinyint(1) NOT NULL,
  `badging_completed` tinyint(1) NOT NULL,
  `notes` text NOT NULL,
  `location` varchar(25) NOT NULL,
  `added_on` datetime NOT NULL,
  `added_by` varchar(25) NOT NULL,
  `updated_on` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `updated_by` varchar(25) NOT NULL,
  `removed` tinyint(1) NOT NULL,
  `removed_on` datetime NOT NULL,
  `removed_by` varchar(25) NOT NULL,
  `required_crew_size` varchar(255) DEFAULT NULL,
  `min_skill_level` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_jobs_job_date` (`job_date`),
  KEY `idx_jobs_job_number` (`job_number`),
  KEY `indx_jobs_parent_job` (`parent_job`),
  KEY `id_bigint_test` (`id_bigint_test`)
) ENGINE=InnoDB AUTO_INCREMENT=209896 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
CREATE TABLE `jobs_equipment` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `rate_card_owned_equipment_id` int(11) NOT NULL,
  `job_id` int(11) NOT NULL,
  `job_id_bigint_test` bigint(20) unsigned NOT NULL,
  `equipment_id` int(11) NOT NULL,
  `owned_equipment_id` varchar(255) DEFAULT NULL,
  `start_time` varchar(25) NOT NULL,
  `end_time` varchar(25) NOT NULL,
  `override_time` tinyint(1) NOT NULL,
  `time_entry` float(8,2) NOT NULL,
  `billable` tinyint(1) NOT NULL,
  `removed` tinyint(1) NOT NULL,
  `removed_by` varchar(25) NOT NULL,
  `removed_on` datetime NOT NULL,
  `added_on` datetime NOT NULL,
  `added_by` varchar(25) NOT NULL,
  `updated_on` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `updated_by` varchar(25) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `job_id_index` (`job_id`),
  KEY `job_id_bigint_index` (`job_id_bigint_test`)
) ENGINE=InnoDB AUTO_INCREMENT=392211 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
CREATE TABLE `jobs_employees` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `rate_card_labor_id` int(11) NOT NULL,
  `job_id` int(11) NOT NULL,
  `job_id_bigint_test` bigint(20) unsigned NOT NULL,
  `employee_id` int(11) NOT NULL,
  `title` varchar(255) NOT NULL,
  `labor_id` varchar(255) DEFAULT NULL,
  `category` varchar(50) NOT NULL,
  `start_time` varchar(25) NOT NULL,
  `end_time` varchar(25) NOT NULL,
  `override_time` tinyint(1) NOT NULL,
  `truck` varchar(100) NOT NULL,
  `trailer` varchar(100) NOT NULL,
  `job_action` varchar(100) NOT NULL,
  `time_entry` float(8,2) NOT NULL,
  `billable` tinyint(1) NOT NULL,
  `incl_break` tinyint(1) NOT NULL,
  `added_on` datetime NOT NULL,
  `added_by` varchar(25) NOT NULL,
  `updated_on` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `updated_by` varchar(25) NOT NULL,
  `removed` tinyint(1) NOT NULL,
  `removed_on` datetime NOT NULL,
  `removed_by` varchar(25) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `job_id_index` (`job_id`),
  KEY `job_id_bigint_index` (`job_id_bigint_test`)
) ENGINE=InnoDB AUTO_INCREMENT=1018745 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci

Thinking it was still a data type issue like before, I've tried alternate column types for the jobs 'id' and jobs_equipment/jobs_employees 'job_id' columns- you'll notice the 'job_id_bigint_test' columns in the SHOW CREATE TABLE statements.

Thinking it was an indexing issue (though the individual tables perform fine without UNION ALL's) I've tried deleting and recreating the indexes for the 'job_id' columns and creating and removing foreign key constraints.

To try to force the optimizer to pull what I'm expecting, I've tried fetching different columns, joining on different columns, and haphazardly adding GROUP BY and WHERE statements.

UPDATE 9/15- added some actual examples of selects used by the View.

8
  • BTW, if you don't get a good answer here, there's a database admin StackExchange site where a lot of DBA types hang out where you might get a better answer. Commented Sep 12 at 22:43
  • May you add other columns to your question's query? Because with just j.id with no column from the LEFT JOINed labor_equipment, we could just answer that SELECT id FROM jobs would return the same results… Commented Sep 13 at 5:53
  • Some references that I found no explanation in, but that could serve as starting points for further exploration: the official doc about subquery optimization focuses on IN (<subquery>), and SO question MariaDB using PK for inner query result looks exactly like your question (still with no answer) ; there's also How to improve UNION query but it's not as near to your question. Commented Sep 13 at 7:24
  • Your query tells "UNION then JOIN". UNION needs in all rows reading. Why this is a surprise for you? In general, w/o any restrictions, (SELECT UNION SELECT) JOIN is not an equivalent for SELECT JOIN UNION SELECT JOIN. For example, when any non-stable construction (variable, function) is used. Commented Sep 13 at 19:33
  • @GuillaumeOutters Fair point, I oversimplified! I will further edit the example after looking into Osasunkanmi's proposed solution, if that's feasible. This is a small piece of a much larger query for sure. Commented Sep 15 at 20:49

1 Answer 1

0

You can perform two efficient joins separately and then combine their results. Use UNION instead of UNION ALL to automatically remove duplicate job_ids

SELECT `j`.`id`
FROM `uat_portal`.`jobs` `j`
INNER JOIN `uat_portal`.`jobs_employees` `je` ON `j`.`id` = `je`.`job_id`

UNION

SELECT `j`.`id`
FROM `uat_portal`.`jobs` `j`
INNER JOIN `uat_portal`.`jobs_equipment` `jq` ON `j`.`id` = `jq`.`job_id`;

This runs two separate, fast queries and then merges the unique results.

Sign up to request clarification or add additional context in comments.

2 Comments

There just lacks the case where one jobs row has no corresponding row in neither jobs_employees nor jobs_equipment: according to the LEFT JOIN, its jobs.id should appear in the results, while your UNION of INNER JOINs will miss it. Of course it would be easier to understand what jdfr228 would like if he included a column from the LEFT JOINed tables so that we understand the desired result. Probably he wants each job ID associated with either one or more employee and/or equipment IDs, or null for an orphan job.
But perhaps SELECT j.id FROM uat_portal.jobs j LEFT JOIN (<your query>) labeq ON j.id = labeq.id would perfectly answer. You could add it as a second §. But: 1. This relies on OP (original poster, jdfr228) needing additional columns from labeq (else a simple IN would be sufficient, instead of a LEFT JOIN); 2. If they differ (e.g. , 'emp' / 'eq' AS emp_or_eq) then of course UNION becomes UNION ALL (→ further gain); 3. still, this should be EXPLAINed, because if each job has employees or eqs the subquery table risks materializing (defeat your optim): thus a 2nd, more uncertain §.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.