I'm trying to accomplish a query that requires a calculated column using a subquery that passes the date reference via a variable. I'm not sure if I'm not "doing it right" but essentially the query never finishes and spins for minutes on end. This is my query:
select @groupdate:=date_format(order_date,'%Y-%m'), count(distinct customer_email) as num_cust,
(
select count(distinct cev.customer_email) as num_prev
from _pj_cust_email_view cev
inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @groupdate) and (cev.customer_email=prev_purch.customer_email)
where cev.order_date > @groupdate
) as prev_cust_count
from _pj_cust_email_view
group by @groupdate;
Subquery has an inner join accomplishes the self-join that only gives me the count of people that have purchased prior to the date in @groupdate. The EXPLAIN is below:
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
| 1 | PRIMARY | _pj_cust_email_view | ALL | NULL | NULL | NULL | NULL | 140147 | Using temporary; Using filesort |
| 2 | UNCACHEABLE SUBQUERY | cev | ALL | IDX_EMAIL | NULL | NULL | NULL | 140147 | Using where |
| 2 | UNCACHEABLE SUBQUERY | prev_purch | ref | IDX_EMAIL | IDX_EMAIL | 768 | cart_A.cev.customer_email | 1 | Using where |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
And the structure of the table _pj_cust_email_view is as such:
'_pj_cust_email_view', 'CREATE TABLE `_pj_cust_email_view` (
`order_date` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
`customer_email` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
KEY `IDX_EMAIL` (`customer_email`),
KEY `IDX_ORDERDATE` (`order_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
Again, as I said earlier, I'm not really sure that this is the best way to accomplish this. Any criticism, direction is appreciated!
Update
I've made a little progress, and I'm now doing the above procedurally by iterating through all known months instead of months in the database and setting the vars ahead of time. I don't like this still. This is what I've got now:
Sets the user defined vars
set @startdate:='2010-08', @enddate:='2010-09';
Gets total distinct emails in the given range
select count(distinct customer_email) as num_cust
from _pj_cust_email_view
where order_date between @startdate and @enddate;
Gets the total count of customers who had purchased prior to the given range
select count(distinct cev.customer_email) as num_prev
from _pj_cust_email_view cev
inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @startdate) and (cev.customer_email=prev_purch.customer_email)
where cev.order_date between @startdate and @enddate;
Where @startdate is set to the start of the month and @enddate signifies the end of that month's range.
I really feel like this still can be done in one full query.