Improve the "good" example in "How to ask a good question"

Question

Close to the end of of How do I ask a good question? , we see how a poor question can be transformed into a much better one.

It doesn't go quite as far as I would like, however. The sql tag guidance says that presenting the database schema helps users to understand the queries, and may also present opportunities for improvement (e.g. to the indexes).

Could somebody more expert than I with SQL please provide some suitable data definition statements to this example to make it a true exemplar of how to present a database query question?

To make sure I fully understand your gripes with the page: The sql help asks for 3 items being 1) context, 2) schema 3) indexes and output of explain select (when asking about performance, which the example question seems to do) and you object to 2) and 3) missing in the help/how-to-ask? That page is not mod-editable (requires a CM or the page being unlocked), so we better do it right on the first try. — Mast
– Mast Mod, Commented Mar 9 at 9:08
Yes - mainly the second item, since indexes are only required for a "performance" question. That said, including the indexes and EXPLAIN SELECT would make it a paragon. If we need to get it right, then perhaps we need a proposal in a Community answer here, so we can refine it until we have a consensus? — Toby Speight
– Toby Speight, Commented Mar 9 at 9:21
Considering the lack of popularity of your question so far, I think you can make a suggestion in the question itself that can perhaps be reviewed in the answers. Any future feedback can still be collected on this question then. Not ideal, but the alternative is reviewing in the comments which isn't ideal either. — Mast
– Mast Mod, Commented Mar 18 at 5:29

J_H · Accepted Answer · 2025-06-30 16:54:09Z

Well stated, @TobySpeight. I agree with you.

We're running this query in a loop, so speed is important.

for ($cluster_hosts as $hostname) {
    $res = mysql_query("SELECT *

If this was a real question I were answering, I would immediately lean on that for loop. It's an anti-pattern. The rule is pretty much to avoid making repeated queries to the DB backend if you possibly can. Let the backend do the looping, rather than the app.

And then of course in a production query the SELECT * should be SELECT a, b, c ..., to improve maintainability, and to avoid retrieving columns the app will just ignore. Pruning SELECT a, b, c down to SELECT a, b can alter the query plan, for example when we have a covering index so we needn't consult disk blocks containing the rows and can rely just on the index blocks.

But the bigger issue is that we don't actually care about the query plan for that SELECT *. Suppose it's typically a 2-second query, and we have N cluster hosts, so total elapsed time shall be 2 × N. The appropriate thing to do is (quickly!) create an N-row indexed temp table to JOIN against, or to supply a big ugly IN conjunct in the WHERE clause. And then we can analyze what we really care about: Does the query plan for that single giant query complete in less than 2 × N seconds?

There are several bits of intuition behind that:

The app loop hides information.
Random I/O seeking takes longer than sequential tablescan.
Multiple rows can fit into a single disk block (in wide table row blocks and especially in narrow index blocks).
Repeated queries might (re)fetch the very same blocks on each repetition.

Hiding information from the backend optimizer is a big no-no. We know that N cluster hosts will be dealt with before we click "done" on the stopwatch, but we didn't advise the optimizer of that. So it can't devise the best plan, since we're not letting it see into the future. It's entirely possible that the best plan for N small queries is random reads based on an index, but the best plan for a single giant query is to tablescan since we'll eventually inspect every block anyway.

A question should disclose CREATE INDEX details when performance is a concern, sure. But any UNIQUE index should always be revealed, as it affects correctness, it tells us how to interpret the given relation. We always want to know about PRIMARY KEY columns.

There are good reasons, for example managing space on diverse storage media, to do an equi-join on two UNIQUE columns of a pair of tables. In which case we essentially have one relation rather than two.

As I mentioned, @TobySpeight raises a good point.

But there are some subtleties to the particular question in the docs. If we wish to revisit the advice given by the docs, maybe we'd rather tackle a simpler question that uses another language, perhaps Rust or Python?

Ignoring whether Toby is correct. You haven't answered the question with what we should change the question to. Lets assume everyone agrees with you how do we move forward? — Peilonrayz
– Peilonrayz Mod, Commented Jun 30 at 17:01
@Peilonrayz Maybe keep an eye out in the coming weeks for a "bad" question or two that can be paraphrased for the docs, and post "bad" + "good" here as an Answer? (It wouldn't be polite to paste someone's question verbatim into the docs.) I feel the plan of making edits for a while in this meta question is a good one, until the dust settles. — J_H
– J_H, Commented Jun 30 at 17:11

guest271314 · Accepted Answer · 2025-08-17 20:10:07Z

In my opinion, I say forget about trying to evaluate a question or answers as "good" or "bad".

They are just questions and answers. Just data.

The whole idea of "curating" "good" questions leads to endless pendantic activity in comments, questions and answers getting closed by question/answer zealots.

And closing questions and answers leads to the state StackExchange is in now, dying or already dead - except for endless conjecture in comments, and mod-squad selectively choosing what is "civil" and not - basically just run of the mill social media.

Individuals can decide for themselves whether to answer a question, or not.

Just because some other user, internally decides that a question is "bad", and externally casts their individual vote for closure, deletion, whatever, doesn't have any impact whatsoever on my interpretation of the question, and prospective answer.

The last place I'm looking for opinions from other people about a question or answer on SE Web sites is from SE users, who actually think there's a such thing as a "community" on SE Web sites, meaning they have some social idea and responsibility they've created for themselves about questions and answers, instead of just answering the question.

(Caveat: My obvious bias in this comment...) UV'd... Hits the nail on the head regarding "community"... Most Q&A that piqued my interest on SO and CR could be dealt with in published literature or doco... Answers could be reduced to RTFM, perhaps augmented with TH... (Try Harder)... Personalities leak in because this is SM... Some prefer cold, "suit & tie" 'crispness', some (like me) lean toward casual, 'collegial'... Questions are imbued with OP's personality. Answers are reviewed (and 'voted') by the personalities of the readers. We're not all "cut from the same cloth" — user272752
– user272752, Commented Aug 25 at 20:15

Stack Exchange Network

Improve the "good" example in "How to ask a good question"

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Improve the "good" example in "How to ask a good question"

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions