I have created a MySQL query that works, but I feel there must be a better way. The query will be used by a PHP script whose purpose is to assign conditions & subconditions to new participants in an online experiment based on how many times each combination of condition & subcondition was already assigned to previous participants. (There are 2556 conditions and 8 subconditions, all combinations are allowed.) The purpose of the query is to calculate the number of "filtered completes" for each condition, defined as follows:
- for a given condition C, the number of completes in each subcondition S is the number of unique data rows with condition==C and subcondition==S
- for a given condition C, the "target" is 1 + the smallest number of completes for any subcondition S in C
- for a given condition C and subcondition S, the "filtered completes" for S in C is the minimum of the "target" for C and the actual number of completes for S in C
- the total "filtered completes" for the condition C is the sum of filtered completes for all subconditions S in C
Intuitively, at a given moment, I want to act as though my goal is to assign the "target" number of participants to each subcondition in a given condition. The "target" can increase only once ALL subconditions in the condition have met the target. At a given moment, I want to ignore completes in any subcondition that exceed the target. OK, here now is the query which works:
select `condition`, (
LEAST( `Target`, `subcond_0` ) +
LEAST( `Target`, `subcond_1` ) +
LEAST( `Target`, `subcond_2` ) +
LEAST( `Target`, `subcond_3` ) +
LEAST( `Target`, `subcond_4` ) +
LEAST( `Target`, `subcond_5` ) +
LEAST( `Target`, `subcond_6` ) +
LEAST( `Target`, `subcond_7` ) ) as `filtered_completes`
from (
select `condition`,
(1+LEAST( `subcond_0`, `subcond_1`, `subcond_2`, `subcond_3`, `subcond_4`, `subcond_5`, `subcond_6`, `subcond_7` )) as `Target`,
`subcond_0`, `subcond_1`, `subcond_2`, `subcond_3`, `subcond_4`, `subcond_5`, `subcond_6`, `subcond_7`
from (
select `condition`,
SUM( CASE `subcondition` WHEN 0 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_0`,
SUM( CASE `subcondition` WHEN 1 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_1`,
SUM( CASE `subcondition` WHEN 2 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_2`,
SUM( CASE `subcondition` WHEN 3 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_3`,
SUM( CASE `subcondition` WHEN 4 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_4`,
SUM( CASE `subcondition` WHEN 5 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_5`,
SUM( CASE `subcondition` WHEN 6 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_6`,
SUM( CASE `subcondition` WHEN 7 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_7`
from
(
select `condition`, `subcondition`, count( distinct id ) from test_table_3
where approve=1 group by `condition`, `subcondition`
) as T1 group by `condition`
) as T2
) as T3;
Reasons I don't think it's good are (1) it requires me to hand-code the subconditions, so it would not be easy to change to a different number of subconditions, (2) the nested selects look ugly to me, and (3) it seems to run a bit slow. Any suggestions for how to improve it? (Should I be doing this in PHP instead?)