5
\$\begingroup\$

Details

Can anyone instruct me on a more optimised way of querying this statistics of registered users? I currently lack the knowledge to create a better query. Right now it works fine, but I want to learn more about other ways to do the same thing and improve on it, this way I can expand my knowledge on MySQL, and study on other functions.

Also, I'm doing everything in a single query, would it better to split it into more focused queries?

QUERY

SELECT
  # Count users with complete profile registered current month and year
  SUM(CASE WHEN completed_registration = 1 AND MONTH(created_at) = MONTH(CURDATE()) AND YEAR(created_at) = YEAR(CURDATE()) THEN 1 ELSE 0 END)  AS completed_0,

  # Count users with complete profile registered 1 month ago
  SUM(CASE WHEN completed_registration = 1 AND MONTH(created_at) = MONTH(CURDATE() - INTERVAL 1 MONTH) AND YEAR(created_at) = YEAR(CURDATE() - INTERVAL 1 MONTH) THEN 1 ELSE 0 END ) AS completed_1,

  # Count users with incomplete profile and verified email registered current month and year
  SUM(CASE WHEN completed_registration = 0 AND verified_email = 1 AND MONTH(created_at) = MONTH(CURDATE()) AND YEAR(created_at) = YEAR(CURDATE()) THEN 1 ELSE 0 END) AS incomplete_0,

  # Count users with incomplete profile and verified email registered 1 month ago
  SUM(CASE WHEN completed_registration = 0 AND verified_email = 1 AND MONTH(created_at) = MONTH(CURDATE() - INTERVAL 1 MONTH) AND YEAR(created_at) = YEAR(CURDATE() - INTERVAL 1 MONTH) THEN 1 ELSE 0 END) AS incomplete_1,

  # Count users with unverified email registered current month and year
  SUM(CASE WHEN verified_email = 0 AND MONTH(created_at) = MONTH(CURDATE()) AND YEAR(created_at) = YEAR(CURDATE()) THEN 1 ELSE 0 END) AS unverified_0,

  # Count users with unverified email registered 1 month ago
  SUM(CASE WHEN verified_email = 0 AND MONTH(created_at) = MONTH(CURDATE() - INTERVAL 1 MONTH) AND YEAR(created_at) = YEAR(CURDATE() - INTERVAL 1 MONTH) THEN 1 ELSE 0 END) AS unverified_1,

  # Count total users registered as Onsite Teacher this month
  ( SELECT COUNT(*) FROM onsite_teachers WHERE MONTH(created_at) = MONTH(CURDATE()) AND YEAR(created_at) = YEAR(CURDATE()) ) AS onsite_teachers_0,

  # Count total users registered as Onsite Teacher last month
  ( SELECT COUNT(*) FROM onsite_teachers WHERE MONTH(created_at) = MONTH(CURDATE() - INTERVAL 1 MONTH) AND YEAR(created_at) = YEAR(CURDATE() - INTERVAL 1 MONTH) ) AS onsite_teachers_1,

  # Count how many Onsite Teachers have been hired this month
  ( SELECT COUNT(*) FROM purchases INNER JOIN purchased_profiles ON purchased_profiles.purchase_id = purchases.id AND purchased_profiles.profile_type = 'onsite_teacher' AND purchases.transaction_status = 'completed' AND MONTH(purchases.created_at) = MONTH(NOW()) AND YEAR(purchases.created_at) = YEAR(NOW()) ) AS current_month_purchased_onsite_teachers,

  # Count total users registered as Translator this month
  ( SELECT COUNT(*) FROM translators WHERE MONTH(created_at) = MONTH(CURDATE()) AND YEAR(created_at) = YEAR(CURDATE()) ) AS translators_0,

  # Count total users registered as Translator last month
  ( SELECT COUNT(*) FROM translators WHERE MONTH(created_at) = MONTH(CURDATE() - INTERVAL 1 MONTH) AND YEAR(created_at) = YEAR(CURDATE() - INTERVAL 1 MONTH) ) AS translators_1,

  # Count how many Translators have been hired this month
  ( SELECT COUNT(*) FROM purchases INNER JOIN purchased_profiles ON purchased_profiles.purchase_id = purchases.id AND purchased_profiles.profile_type = 'translator' AND purchases.transaction_status = 'completed' AND MONTH(purchases.created_at) = MONTH(NOW()) AND YEAR(purchases.created_at) = YEAR(NOW()) ) AS current_month_purchased_translators,

  # Count total users registered as Interpreter this month
  ( SELECT COUNT(*) FROM interpreters WHERE MONTH(created_at) = MONTH(CURDATE()) AND YEAR(created_at) = YEAR(CURDATE()) ) AS interpreters_0,

  # Count total users registered as Interpreter last month
  ( SELECT COUNT(*) FROM interpreters WHERE MONTH(created_at) = MONTH(CURDATE() - INTERVAL 1 MONTH) AND YEAR(created_at) = YEAR(CURDATE() - INTERVAL 1 MONTH) ) AS interpreters_1,

  # Count how many Interpreters have been hired this month
  ( SELECT COUNT(*) FROM purchases INNER JOIN purchased_profiles ON purchased_profiles.purchase_id = purchases.id AND purchased_profiles.profile_type = 'interpreter' AND purchases.transaction_status = 'completed' AND MONTH(purchases.created_at) = MONTH(NOW()) AND YEAR(purchases.created_at) = YEAR(NOW()) ) AS current_month_purchased_interpreters,

  # Count all registered users
  COUNT(*) AS total_users,

  # Count all registered clients
  ( SELECT COUNT(*) FROM clients ) AS total_clients,

  # Sum all completed sales totals
  ( SELECT SUM(purchase_total) FROM purchases WHERE transaction_status = 'completed' ) AS total_profits,

  # Count all completed sales
  ( SELECT COUNT(*) FROM purchases WHERE transaction_status = 'completed' ) AS total_sales,

  # Count all complete profiles
  SUM(CASE WHEN completed_registration = 1 THEN 1 ELSE 0 END) AS total_registered_users,

  # Count all incomplete profiles
  SUM(CASE WHEN verified_email = 1 AND completed_registration = 0 THEN 1 ELSE 0 END) AS total_incomplete_users,

  # Count all users with unverified email
  SUM(CASE WHEN verified_email = 0 THEN 1 ELSE 0 END) AS total_unverified_users,

  # Sum current month completed sales total
  ( SELECT SUM(purchase_total) FROM purchases WHERE transaction_status = 'completed' AND MONTH(created_at) = MONTH(CURDATE()) AND YEAR(created_at) = YEAR(CURDATE()) ) AS profits_0,

  # Sum last month completed sales total
  ( SELECT SUM(purchase_total) FROM purchases WHERE transaction_status = 'completed' AND MONTH(created_at) = MONTH(CURDATE() - INTERVAL 1 MONTH) AND YEAR(created_at) = YEAR(CURDATE() - INTERVAL 1 MONTH) ) AS profits_1

FROM
  users
\$\endgroup\$
4
  • \$\begingroup\$ @CupOfJoe is this ran in batch mode? Curious about the #Comments \$\endgroup\$ Commented Jan 14, 2015 at 2:44
  • \$\begingroup\$ @Phrancis well this runs using PHP Laravel 3 PDO connection, how can I check if it runs in batch mode? I'm using raw queries with Laravel's DB instance. I thought I had specified I was using PHP in the tags but for some reason I didn't, sorry about that. \$\endgroup\$ Commented Jan 14, 2015 at 3:27
  • \$\begingroup\$ No worries. As I posted in my answer, I think you would gain performance regardless of input system, if you made it into a stored procedure. This may have to be run from MySQL WorkBench, although I feel sure you could just pass it from Laravel, as long as your DB allows it. \$\endgroup\$ Commented Jan 14, 2015 at 4:37
  • \$\begingroup\$ What you may and may not do after receiving answers. \$\endgroup\$ Commented Jan 14, 2015 at 15:36

2 Answers 2

4
\$\begingroup\$

General observations

Lines that extend to 353 columns are hard to read. You should find a code formatting style with line breaks and indentation that emphasize the structure of your query.

In many cases, you compare months and years like this:

MONTH(created_at) = MONTH(CURDATE() - INTERVAL 1 MONTH) AND YEAR(created_at) = YEAR(CURDATE() - INTERVAL 1 MONTH)

This is inefficient, because the server needs to call the MONTH() and YEAR() function on every single created_at value, and cannot take advantage of indexes. (I hope that your created_at columns are indexed.) Instead, you want to compare created_at against thresholds:

created_at BETWEEN CURDATE() - INTERVAL (DAYOFMONTH(CURDATE()) - 1) DAY - INTERVAL 1 MONTH
               AND CURDATE() - INTERVAL (DAYOFMONTH(CURDATE()) - 1) DAY - INTERVAL 1 SECOND

Sales query

Let's start with the easy decision first: four of the twenty-something SELECTs are not like the others. In fact, they have nothing to do whatsoever with the user statistics. The sales figures should clearly be split into a separate query:

SELECT SUM(purchase_total) AS total_profits
     , SUM(CASE WHEN created_at      >= CURDATE() - INTERVAL (DAYOFMONTH(CURDATE()) - 1) DAY THEN purchase_total END) AS profits_0
     , SUM(CASE WHEN created_at BETWEEN CURDATE() - INTERVAL (DAYOFMONTH(CURDATE()) - 1) DAY - INTERVAL 1 MONTH
                                    AND CURDATE() - INTERVAL (DAYOFMONTH(CURDATE()) - 1) DAY - INTERVAL 1 SECOND THEN purchase_total END) AS profits_1
     , COUNT(*) AS total_sales
    FROM purchases
    WHERE transaction_status = 'completed';

User summary query

Another five SELECTs are distinguished by the fact that they are timeless system-wide totals of certain categories of users. This should be another separate query.

SELECT COUNT(*) AS total_users
     , COUNT(CASE WHEN completed_registration = 1 THEN 1 END) AS total_registered_users
     , COUNT(CASE WHEN completed_registration = 0 THEN 1 END) AS total_incomplete_users
     , COUNT(CASE WHEN verified_email = 0 THEN 1 END) AS total_unverified_users
     , (SELECT COUNT(*) FROM clients) AS total_clients
    FROM users;

Arguably, total_clients should be further split into a standalone query.

This month vs. last month query

SQL excels at working with sets of data. However, when you create a table with one row and twenty columns, you don't really have a data set anymore — you have twenty scalar values. In that form, you can only address the twenty values individually, not collectively.

Therefore, I recommend transposing the table, such that it is tall rather than wide. Let's start by creating a view:

CREATE OR REPLACE VIEW user_events AS
    SELECT 'Complete profile registered' AS type, created_at
        FROM users
        WHERE completed_registration = 1
    UNION ALL
    SELECT 'Incomplete profile and verified email', created_at
        FROM users
        WHERE completed_registration = 0 AND verified_email = 1
    UNION ALL
    SELECT 'Unverified email registered this month', created_at
        FROM users
        WHERE verified_email = 0

    UNION ALL

    SELECT 'Onsite Teachers registered', created_at
        FROM onsite_teachers
    UNION ALL
    SELECT 'Onsite Teachers hired', created_at
        FROM purchases
            INNER JOIN purchased_profiles
                ON purchased_profiles.purchase_id = purchases.id
                AND purchased_profiles.profile_type = 'onsite_teacher'
        WHERE purchases.transaction_status = 'completed'
    UNION ALL
    SELECT 'Translators registered', created_at
        FROM translators
    UNION ALL
    SELECT 'Translators hired', created_at
        FROM purchases
            INNER JOIN purchased_profiles
                ON purchased_profiles.purchase_id = purchases.id
                AND purchased_profiles.profile_type = 'translator'
        WHERE purchases.transaction_status = 'completed'
    UNION ALL
    SELECT 'Interpreters registered', created_at
        FROM interpreters
    UNION ALL
    SELECT 'Interpreters hired', created_at
        FROM purchases
            INNER JOIN purchased_profiles
                ON purchased_profiles.purchase_id = purchases.id
                AND purchased_profiles.profile_type = 'interpreter'
        WHERE purchases.transaction_status = 'completed';

Then we can obtain monthly aggregate counts:

SELECT type
     , COUNT(CASE WHEN created_at      >= CURDATE() - INTERVAL (DAYOFMONTH(CURDATE()) - 1) DAY THEN 1 END) AS this_month
     , COUNT(CASE WHEN created_at BETWEEN CURDATE() - INTERVAL (DAYOFMONTH(CURDATE()) - 1) DAY - INTERVAL 1 MONTH
                                      AND CURDATE() - INTERVAL (DAYOFMONTH(CURDATE()) - 1) DAY - INTERVAL 1 SECOND THEN 1 END) AS last_month
    FROM user_events
    GROUP BY type;
\$\endgroup\$
2
  • 1
    \$\begingroup\$ I posted a follow up question to your answer here. Cheers! \$\endgroup\$ Commented Jan 14, 2015 at 18:11
  • \$\begingroup\$ I tested using an SQL view as suggested in your answer against my original query using MySQL Workbench and with the view took on average 0.0286s to complete while the original took 0.0166s, it is slower with the view but the code is more readable and reusable. The only difference was that the view would retrieve past 6 months and past 3 weeks of values for every field, while the original retrieved only for some fields. \$\endgroup\$ Commented Jan 15, 2015 at 15:48
1
\$\begingroup\$

It looks like this is being ran from the console, perhaps explaining the lack of formatting. I would recommend, if this is not ad-hoc, to make a stored procedure so the execution plan will be stored along with the code.

Example code:

DELIMITER //
CREATE PROCEDURE foo()
  BEGIN
-- code here
 END; //
DELIMITER ;

In your case, for example (just using the first statements; combine as many as needed) with some formatting added:

DELIMITER //
CREATE PROCEDURE user_statistics()
BEGIN
SELECT
  -- Count users with complete profile registered current month and year
  SUM
  (
    CASE 
      WHEN completed_registration = 1 
        AND MONTH(created_at) = MONTH(CURDATE()) 
        AND YEAR(created_at) = YEAR(CURDATE()) 
      THEN 1 
      ELSE 0 
    END
  )  AS completed_0,
  -- Count users with complete profile registered 1 month ago
  SUM
  (
    CASE 
      WHEN completed_registration = 1 
        AND MONTH(created_at) = MONTH(CURDATE() - INTERVAL 1 MONTH) 
        AND YEAR(created_at) = YEAR(CURDATE() - INTERVAL 1 MONTH) 
      THEN 1 
      ELSE 0 
    END 
  ) AS completed_1, -- etc.
END; //
DELIMITER ;

Then, all you need to do from console (or anywhere else) is pass this to MySQL:

CALL user_statistics()
\$\endgroup\$
1
  • \$\begingroup\$ I need to thank you! first for opening a new door for me, I had never heard off stored procedures, and in my findings, it looks interesting, and secondly I had to open up MySQLWorkbench and this is one awesome tool, can be better... but I constantly struggled on testing my queries and learn a bit more about what I'm doing right/wrong, and with the editor it gives me all the feedback and information I require, thank you! I'm currently testing this out and any suggestions for further reading on this much appreciated. \$\endgroup\$ Commented Jan 14, 2015 at 5:50

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.