I have created a MySQL query that works, but I feel there must be a better way. The query will be used by a PHP script whose purpose is to assign conditions & subconditions to new participants in an online experiment based on how many times each combination of condition & subcondition was already assigned to previous participants. (There are 2556 conditions and 8 subconditions, all combinations are allowed.) The purpose of the query is to calculate the number of "filtered completes" for each condition, defined as follows:
- for a given condition C, the number of completes in each subcondition S is the number of unique data rows with condition==C and subcondition==S
- for a given condition C, the "target" is 1 + the smallest number of completes for any subcondition S in C
- for a given condition C and subcondition S, the "filtered completes" for S in C is the minimum of the "target" for C and the actual number of completes for S in C
- the total "filtered completes" for the condition C is the sum of filtered completes for all subconditions S in C
Intuitively, at a given moment, I want to act as though my goal is to assign the "target" number of participants to each subcondition in a given condition. The "target" can increase only once ALL subconditions in the condition have met the target. At a given moment, I want to ignore completes in any subcondition that exceed the target. OK, here now is the query which works:
select `condition`, (
LEAST( `Target`, `subcond_0` ) +
LEAST( `Target`, `subcond_1` ) +
LEAST( `Target`, `subcond_2` ) +
LEAST( `Target`, `subcond_3` ) +
LEAST( `Target`, `subcond_4` ) +
LEAST( `Target`, `subcond_5` ) +
LEAST( `Target`, `subcond_6` ) +
LEAST( `Target`, `subcond_7` ) ) as `filtered_completes`
from (
select `condition`,
(1+LEAST( `subcond_0`, `subcond_1`, `subcond_2`, `subcond_3`, `subcond_4`, `subcond_5`, `subcond_6`, `subcond_7` )) as `Target`,
`subcond_0`, `subcond_1`, `subcond_2`, `subcond_3`, `subcond_4`, `subcond_5`, `subcond_6`, `subcond_7`
from (
select `condition`,
SUM( CASE `subcondition` WHEN 0 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_0`,
SUM( CASE `subcondition` WHEN 1 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_1`,
SUM( CASE `subcondition` WHEN 2 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_2`,
SUM( CASE `subcondition` WHEN 3 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_3`,
SUM( CASE `subcondition` WHEN 4 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_4`,
SUM( CASE `subcondition` WHEN 5 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_5`,
SUM( CASE `subcondition` WHEN 6 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_6`,
SUM( CASE `subcondition` WHEN 7 THEN `count( distinct id )` ELSE 0 END ) AS `subcond_7`
from
(
select `condition`, `subcondition`, count( distinct id ) from test_table_3
where approve=1 group by `condition`, `subcondition`
) as T1 group by `condition`
) as T2
) as T3;
Reasons I don't think it's good are
- it requires me to hand-code the subconditions, so it would not be easy to change to a different number of subconditions
- the nested selects look ugly to me
- it seems to run a bit slow. Any suggestions for how to improve it?
Should I be doing this in PHP instead?
Here is some sample data as requested, eliminating columns that I don't think are relevant to the question:
id subjid condition subcondition approve
51 A3GM1SF5FUOS8L 1175 3 1
52 A3GM1SF5FUOS8L 456 0 1
53 A3GM1SF5FUOS8L 1301 0 1
54 A3GM1SF5FUOS8L 499 5 1
55 A3GM1SF5FUOS8L 886 5 1
56 A3GM1SF5FUOS8L 2257 7 1
57 A3GM1SF5FUOS8L 1955 4 1
58 A3GM1SF5FUOS8L 890 2 1
59 A3GM1SF5FUOS8L 1667 0 1
60 A3GM1SF5FUOS8L 1546 2 1
61 A3GM1SF5FUOS8L 1315 5 1
62 A3GM1SF5FUOS8L 139 1 1
63 A3GM1SF5FUOS8L 374 3 1
64 A3GM1SF5FUOS8L 1249 4 1
65 A3GM1SF5FUOS8L 1658 7 1
66 A3GM1SF5FUOS8L 2241 3 1
67 A3GM1SF5FUOS8L 167 6 1
68 A3GM1SF5FUOS8L 1370 0 1
69 A3GM1SF5FUOS8L 26 3 1
70 A3GM1SF5FUOS8L 2499 6 1