Timeline for Why is "Select * from table" considered bad practice
Current License: CC BY-SA 3.0
36 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jul 2, 2020 at 7:25 | review | Suggested edits | |||
| Jul 2, 2020 at 14:20 | |||||
| Apr 10, 2014 at 17:09 | comment | added | user1068 | @Bratch - Your view seems 'academic' to me - ignoring the realities of actual companies and the way work really gets done. Those things you mention are nice, but the reality is almost nobody has that stuff screwed on tight. You should not create maintenance issues when you have a choice, and here there is a clearly better choice. I've been getting paid for this stuff for 15 years, and I've read a lot of comments like yours - "if we were doing it right this would not be a problem" well, in the real world, we do it wrong, and a good developer plans for that. | |
| Apr 10, 2014 at 14:56 | comment | added | Craig Tullis | @gbjbaanb I guess I'm asserting that the performance difference between using ordinals directly and using the names of the columns is a miniscule, unimportant fraction of the overall cost of that call to the database. We're talking micrososeconds vs milliseconds. You're also presuming that the SQL you derive your ordinals from is embedded in your imperative code in the same place, but that is neither a universal best practice nor a guarantee. Using the column names is just a better pattern for writing and maintaining solid code. This goes to code correctness, not mere convenience. | |
| Apr 10, 2014 at 7:33 | comment | added | gbjbaanb | @Craig I agree, I just disagree with the practice of "I'll do it this way solely because it makes my life easier" when the more efficient way is hardly more difficult. DB query code (for example) is tightly coupled between specifying the columns and reading them, why wouldn't you use an integer positional lookup for half a dozen columns you wrote out 2 lines earlier? Its the attitude more than anything, one that gives rise to using select * in the first place. Optimisation: there's a trade off obviously, but I see much "we'll fix it later" that is just an excuse to be sloppy that costs to fix. | |
| Apr 10, 2014 at 3:24 | comment | added | Craig Tullis |
I just had a look at some of the .NET DbDataReader code (which won't be all that different from other providers). The string indexer (this[string]) calls GetOrdinal(string), and then calls GetValue(int) with the result. There's a HashTable involved. HashTable lookups are pretty darn fast. The HashTable is cached for follow-on column ordinal look-ups. All in all, this just plain isn't where your performance issues are going to be. Use the column names. It's a much better pattern for maintainable code, and creates a far less brittle dependency on the database.
|
|
| Apr 10, 2014 at 3:01 | comment | added | Craig Tullis | Also, it would probably be better to use the term "ordinal" to refer to the positions of the columns in the result set. "Index" doesn't necessarily mean quite the same thing, and has additional unrelated meaning related to databases, so it introduces a little confusion in this context. I suppose the term "position" works pretty well, too. | |
| Apr 10, 2014 at 2:54 | comment | added | Craig Tullis | @gbjbaanb "Premature optimization" isn't a compliment. You have much more serious performance problems elsewhere in your code. The only way to know is to follow reasonable practices during development, then profile the running code to find out where the actual performance issues are. I can just about guarantee they won't be where you thought they would be and you'll realize belatedly that you put a lot of effort into optimizing where it didn't matter. As an anti-bonus, you also ended up with code that is much harder to maintain due to use of "magic numbers." Maintenance is hugely important. | |
| Apr 9, 2014 at 22:47 | comment | added | Bratch | @user1068, maybe at companies with poor planning or no change controls, or even worse, poor security practices that allow anyone to make schema changes and break all software accessing the table. Select * or columns wouldn't even matter. Maybe this would be good material for the DBA SE. | |
| Apr 9, 2014 at 9:13 | comment | added | gbjbaanb | String lookups are used everywhere, but I'm not talking about that - I'm saying that if you use a string lookup for picking columns out of your DB query so you can use 'select *' (and pull unnecessary columns down), you're trading convenience for performance. Using a integer index stops this kind of laziness and is a useful tool to ensure good practice. String lookups are much slower than integer indexing, and the mass use of them (for convenience) does have an impact on overall performance as well. It is not something to assume is fair practice because something else is slow. | |
| Apr 9, 2014 at 0:17 | comment | added | Aaronaught | This is utter nonsense. String comparisons are used everywhere in practice, and they're not slow. The fastest databases available today are distributed hash tables that take strings as keys. I can assure you that I'm very fussy about performance and know a whole lot about optimization, and the penalty for string comparisons vs. primitive type comparisons is completely irrelevant in every instance except maybe for operating systems and microcontroller software. The primary factor in performance is I/O; secondary is the computational complexity, e.g. O(N²) vs. O(N). | |
| Apr 8, 2014 at 7:26 | comment | added | gbjbaanb | @Aaronaught seriously, one little bit of bad practice does mean its ok because you do it once. People who use string comparitors use them all over the place, array indexes become keyed by strings and next thing you know, you've slowed everything down - and then you look at benchmarks and say "it must be that big db connection because I profiled the indexes and they look insignificant". Overall performance comes from an eye on how everything works. | |
| Apr 8, 2014 at 0:04 | comment | added | Aaronaught | You've got to be kidding me. The string comparisons are several orders of magnitude less expensive than the actual database operation. The fact that you would even suggest that this matters one whit in practice is proof positive that you've never even attempted to profile it. | |
| Apr 6, 2014 at 21:53 | comment | added | gbjbaanb | @Aaronaught and yet you criticise indexes for being "an error waiting to happen" yet you advocate "select *". String compares v integer compares - they're hugely different, and a program based on string compares is going to be very slow, it may be "a few CPU instructions" (plus a few memory accesses, and cache stalls) but they add up. Sometimes they really add up. | |
| Apr 6, 2014 at 13:52 | comment | added | Aaronaught |
Really? You think accessing columns by index instead of by name is a good idea because it's faster? Talk about micro-optimization... you're talking about queries that are, at minimum, on the order of several milliseconds, and you're saving maybe a dozen CPU instructions at the very high cost of any kind of maintainability. Hard-coding column indexes is "an error waiting to happen" - doing SELECT * is almost never going to affect the correctness of any half-decently designed program.
|
|
| Apr 6, 2014 at 10:50 | comment | added | gbjbaanb | @Aaronaught who says he's using a major framework? who says he isn't using index access? You can assume all you like, but at the end of the day - its an error waiting to happen solely to save someone typing out what they wanted to use - laziness is always poor programming practice. (and index is good - its fast, unlike string compares for everything. There are reasons apps run as fast today as they used to years ago even though we have vastly faster computers. This is one of them) | |
| Apr 6, 2014 at 2:01 | comment | added | Aaronaught |
Terrible answer. Almost none of the major frameworks today care about having extra columns in the result set. Even if you don't use a framework, most of the time SELECT * should work - I mean, you are referencing columns by name, and not by index, aren't you? Index-based mapping is just bad. There are plenty of reasons not to use SELECT *, but this isn't one of them.
|
|
| Apr 4, 2014 at 17:12 | comment | added | Adriano Varoli Piazza | @MooingDuck right, I misread that. | |
| Apr 4, 2014 at 16:30 | comment | added | Mooing Duck |
@AdrianoVaroliPiazza: What he meant was that performance (via being explicit about columns) is a better reason to avoid * than the inability to access by column. He was NOT suggesting that * is a good thing.
|
|
| Apr 4, 2014 at 16:00 | comment | added | Adriano Varoli Piazza | @jwenting I fail to see the performance improvement of receiving all columns of a table or join across the network when I only needed 2 columns. | |
| Apr 4, 2014 at 6:53 | comment | added | jwenting | @MooingDuck he seems to equate correctness with looking pretty, and THAT's what is the problem. Performance should never be sacrificed to make code look "pretty". If it works as it should, what it looks like is not that important (though, as I said, making it look good is rewarding, and it can make maintenance easier, but if it looks good but doesn't perform to specs it's not correct). | |
| Apr 3, 2014 at 23:26 | comment | added | Craig Otis | @immibis I agree. While the order of column data returned by a database query might be maintained, using the order of your columns (especially if you're using * instead of specifying them!) to obtain your data seems extremely error-prone. | |
| Apr 3, 2014 at 22:14 | comment | added | Stack Exchange Broke The Law | @gbjbaanb then access the columns by name. Anything else would be obviously stupid unless you specified the column order in the query. | |
| Apr 3, 2014 at 20:45 | comment | added | gbjbaanb | @TallGuy the point is you do not know which order the columns will appear in (ie if something changes). I prefer to be explicit in these things, so I am not surprised by code that reads columnX yet one day suddenly starts to be populated with columnN. Its sloppy, lazy programming practice. (and its also inefficient if you only use a subset). | |
| Apr 3, 2014 at 19:26 | comment | added | TallGuy | I don't understand the point in this answer. If you add a column to a table, both the SELECT * and the SELECT [Columns] will work, the only difference is that if the code needs to bind to the new column, the SELECT [Columns] will need to be modified whereas the SELECT * will not. If a column is removed from a table, the SELECT * will break at the point of binding, whereas the SELECT [Columns] will break when the query is executed. It seems to me that the SELECT * is the more flexible option as any changes to the table would only require changes to the binding. Am I missing something? | |
| Apr 3, 2014 at 18:35 | comment | added | Mooing Duck | @jwenting: I think gbjbannb misunderstood you as saying that performance matters more than correctness, and then you affirmed this misunderstanding. What you originally said was performance is more correct than convenience, not correctness, which I think is far less disagreeable. | |
| Apr 3, 2014 at 17:13 | comment | added | Nick | @jwenting There's gross performance negligence which borders on or outright crosses over into incorrect code (making something O(n!) that could easily be O(n), for example), and then there's sacrificing correctness for the sake of (often misunderstood) performance "gains" (such as putting NOLOCK on every single query because "it improves performance and prevents deadlocks"). Performance is a tradeoff, but as gbjbaanb said, code that doesn't need to work is easily the fastest code. | |
| Apr 3, 2014 at 16:17 | comment | added | gbjbaanb | @jwenting think for a bit. Fast code that does not work, is code that does not work. If performance matters more than correctness, I have some awesome code that runs in almost 0 milliseconds that I can sell you for a quite reasonable sum :) | |
| Apr 3, 2014 at 15:54 | comment | added | user1068 | @Bratch, in real life production environments, you might have hundreds of applications using the same tables and there's no possible way all those applications can be maintained properly. You are correct in sentiment, but practically, the argument fails just due to the realities of working in copmanies. Schema changes to active tables happens all the time. | |
| Apr 3, 2014 at 15:36 | comment | added | Bratch | Someone shouldn't be making schema changes to the database your program uses without knowing what it impacts. If this is happening, then we could equally ask what if someone renames one of the columns, or changes the data type. See the second answer here stackoverflow.com/questions/3180375/select-vs-select-column | |
| Apr 3, 2014 at 14:51 | comment | added | jwenting | @gbjbaanb yes, it does. Sadly many people forget that you should write working code rather than pretty code (though if you can combine the two, even better). And poorly performant code, code that causes excessive network traffic for example, or overloads the database server, is NOT correct code. Which is exactly what can easily happen when doing "select *", especially if you're pulling in BLOB or CLOB columns you don't need. | |
| Apr 3, 2014 at 14:49 | comment | added | gbjbaanb | @jwenting really? performance matters more than correctness? Anyway, I don't see that "select *" performs better than selecting only the columns you want. | |
| Apr 3, 2014 at 14:38 | comment | added | jwenting | Performance, network load, etc. etc. are far more important than convenience of getting the columns back in the order and with the name you want. | |
| Apr 3, 2014 at 13:51 | comment | added | PlasmaHH | When using APIs where in code you access columns by column name, nothing much will happen, unless a column you want to use isn't there anymore. | |
| Apr 3, 2014 at 12:45 | history | edited | gbjbaanb | CC BY-SA 3.0 |
edited body
|
| Apr 3, 2014 at 12:14 | vote | accept | the baconing | ||
| Apr 3, 2014 at 11:59 | history | answered | gbjbaanb | CC BY-SA 3.0 |