4

Basically, what I want to do is to get all children in another table within a certain birth year. So I have two tables. Let's say it's a school table

School

school_id
child_id

Children

child_id
birth_year
name
etc

My first attempt is to use subquery, which is something like this

SELECT *, (SELECT COUNT(*) FROM school LEFT JOIN children ON school.child_id = children.child_id) as total FROM school LEFT JOIN children ON school.child_id = children.child_id GROUP BY birth_year

The problem with this query is the subquery will run all throughout the records, so if I have 1000 records, I think the query (and the subquery) will run 1000 times before grouping by birth_year, which is slow, it's almost 3-5 seconds for 500 sample data.

So to optimize it, I'm doing this.

Recursive Query Using PHP

First, get the distinct birth year of ALL children who are in school. So I'm gonna query something like

SELECT birth_year FROM school LEFT JOIN children ON school.child_id = children.child_id

It will return data like

birth_year
==========
2009
2010
2011

Which I'm gonna use in another query in PHP (let's say I store the result in $row variable)

foreach ($row as $r){
    $new_array[] = count($this->db->get_child_data($r->birth_year)); //this is pseudocode only, to get the number of children data who have birth_year of 2009-2011
}

Though it will run additional three queries, this is really fast as the count is simple. It only takes less than 0.5 seconds for 500 sample data.

However, I'm wondering if there's any way of optimizing it? Or better, is there a way to do it in a single query with similar performance?

I'm trying to do this, but it ends up super slow and crashes my WAMP.

SELECT * FROM children WHERE birth_year IN (SELECT GROUP_CONCAT(DISTINCT birth_year) FROM school LEFT JOIN children ON school.child_id = children.child_id )

The subquery

SELECT GROUP_CONCAT(DISTINCT birth_year) FROM school LEFT JOIN children ON school.child_id = children.child_id

when run separately correctly and quickly returns

2009,2010,2011

And when I query

SELECT * FROM children WHERE birth_year IN (2009,2010,2011)

It also works fast , so I'm quite confused why when I join both queries, it is slow to the point to crash my WAMP.

Sorry for the long post and thanks in advance

2 Answers 2

3

The problem is that using subqueries in the IN or SELECT clauses often causes them to be run again for each row in the outer query. To avoid this, try joining to the subquery. This should result in the subquery being run only once, and cached.

SELECT c.* 
FROM children c, (SELECT birth_year FROM school LEFT JOIN children ON school.child_id = children.child_id GROUP BY birth_year) b
WHERE c.birth_year = b.birth_year

That said, it looks like you're just getting every child in any school anyway, so a simpler JOIN might also give you the same result.

SELECT c.* 
FROM children c, school s
WHERE c.child_id = s.child_id

If you just want to get the count of kids in each school, for each birth_year

SELECT count(c.child_id), s.* 
FROM school s 
LEFT JOIN children c ON c.child_id = s.child_id
GROUP BY s.school_id, c.birth_year
Sign up to request clarification or add additional context in comments.

3 Comments

actually I tried using subquery, see my first attempt, as I want to get the total of the children each year. The bottleneck seems like the count subquery though...
The key here is the location of the subquery. I have to say though, it's not exactly clear what your end goal is.
i think i'd go with recursive PHP query, it's blazing fast, i'm talking about 4 seconds (subqueries) vs 0.2 second (recursive PHP). You're right, my end goal is to get total of children in school each year, the easiest way to do it is recursive query I guess as it lets PHP do the hard work
-1

If you are trying to get the number of children in a birth year

SELECT COUNT (c.child_id) FROM child c INNER JOIN school s ON (c.child_id = s.child_id) GROUP BY birthyear 

If you need additional filters then you can add a where clause for the years that you want

1 Comment

that's what I want to do as in my first attempt : SELECT , (SELECT COUNT() FROM school LEFT JOIN children ON school.child_id = children.child_id) as total FROM school LEFT JOIN children ON school.child_id = children.child_id GROUP BY birth_year which is slow. I think the joining of school with children in the subquery is necessary to give more accurate result ( so the children which are not in school are not included)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.