Timeline for Why is "Select * from table" considered bad practice

Current License: CC BY-SA 3.0

36 events

when toggle format	what		by	license	comment
Jul 2, 2020 at 7:25	review	Suggested edits
Jul 2, 2020 at 14:20
Apr 10, 2014 at 17:09	comment	added	user1068		@Bratch - Your view seems 'academic' to me - ignoring the realities of actual companies and the way work really gets done. Those things you mention are nice, but the reality is almost nobody has that stuff screwed on tight. You should not create maintenance issues when you have a choice, and here there is a clearly better choice. I've been getting paid for this stuff for 15 years, and I've read a lot of comments like yours - "if we were doing it right this would not be a problem" well, in the real world, we do it wrong, and a good developer plans for that.
Apr 10, 2014 at 14:56	comment	added	Craig Tullis		@gbjbaanb I guess I'm asserting that the performance difference between using ordinals directly and using the names of the columns is a miniscule, unimportant fraction of the overall cost of that call to the database. We're talking micrososeconds vs milliseconds. You're also presuming that the SQL you derive your ordinals from is embedded in your imperative code in the same place, but that is neither a universal best practice nor a guarantee. Using the column names is just a better pattern for writing and maintaining solid code. This goes to code correctness, not mere convenience.
Apr 10, 2014 at 7:33	comment	added	gbjbaanb		@Craig I agree, I just disagree with the practice of "I'll do it this way solely because it makes my life easier" when the more efficient way is hardly more difficult. DB query code (for example) is tightly coupled between specifying the columns and reading them, why wouldn't you use an integer positional lookup for half a dozen columns you wrote out 2 lines earlier? Its the attitude more than anything, one that gives rise to using select * in the first place. Optimisation: there's a trade off obviously, but I see much "we'll fix it later" that is just an excuse to be sloppy that costs to fix.
Apr 10, 2014 at 3:24	comment	added	Craig Tullis		I just had a look at some of the .NET DbDataReader code (which won't be all that different from other providers). The string indexer (`this[string]`) calls `GetOrdinal(string)`, and then calls `GetValue(int)` with the result. There's a HashTable involved. HashTable lookups are pretty darn fast. The HashTable is cached for follow-on column ordinal look-ups. All in all, this just plain isn't where your performance issues are going to be. Use the column names. It's a much better pattern for maintainable code, and creates a far less brittle dependency on the database.
Apr 10, 2014 at 3:01	comment	added	Craig Tullis		Also, it would probably be better to use the term "ordinal" to refer to the positions of the columns in the result set. "Index" doesn't necessarily mean quite the same thing, and has additional unrelated meaning related to databases, so it introduces a little confusion in this context. I suppose the term "position" works pretty well, too.
Apr 10, 2014 at 2:54	comment	added	Craig Tullis		@gbjbaanb "Premature optimization" isn't a compliment. You have much more serious performance problems elsewhere in your code. The only way to know is to follow reasonable practices during development, then profile the running code to find out where the actual performance issues are. I can just about guarantee they won't be where you thought they would be and you'll realize belatedly that you put a lot of effort into optimizing where it didn't matter. As an anti-bonus, you also ended up with code that is much harder to maintain due to use of "magic numbers." Maintenance is hugely important.
Apr 9, 2014 at 22:47	comment	added	Bratch		@user1068, maybe at companies with poor planning or no change controls, or even worse, poor security practices that allow anyone to make schema changes and break all software accessing the table. Select * or columns wouldn't even matter. Maybe this would be good material for the DBA SE.
Apr 9, 2014 at 9:13	comment	added	gbjbaanb		String lookups are used everywhere, but I'm not talking about that - I'm saying that if you use a string lookup for picking columns out of your DB query so you can use 'select *' (and pull unnecessary columns down), you're trading convenience for performance. Using a integer index stops this kind of laziness and is a useful tool to ensure good practice. String lookups are much slower than integer indexing, and the mass use of them (for convenience) does have an impact on overall performance as well. It is not something to assume is fair practice because something else is slow.
Apr 9, 2014 at 0:17	comment	added	Aaronaught		This is utter nonsense. String comparisons are used everywhere in practice, and they're not slow. The fastest databases available today are distributed hash tables that take strings as keys. I can assure you that I'm very fussy about performance and know a whole lot about optimization, and the penalty for string comparisons vs. primitive type comparisons is completely irrelevant in every instance except maybe for operating systems and microcontroller software. The primary factor in performance is I/O; secondary is the computational complexity, e.g. O(N²) vs. O(N).
Apr 8, 2014 at 7:26	comment	added	gbjbaanb		@Aaronaught seriously, one little bit of bad practice does mean its ok because you do it once. People who use string comparitors use them all over the place, array indexes become keyed by strings and next thing you know, you've slowed everything down - and then you look at benchmarks and say "it must be that big db connection because I profiled the indexes and they look insignificant". Overall performance comes from an eye on how everything works.
Apr 8, 2014 at 0:04	comment	added	Aaronaught		You've got to be kidding me. The string comparisons are several orders of magnitude less expensive than the actual database operation. The fact that you would even suggest that this matters one whit in practice is proof positive that you've never even attempted to profile it.
Apr 6, 2014 at 21:53	comment	added	gbjbaanb		@Aaronaught and yet you criticise indexes for being "an error waiting to happen" yet you advocate "select *". String compares v integer compares - they're hugely different, and a program based on string compares is going to be very slow, it may be "a few CPU instructions" (plus a few memory accesses, and cache stalls) but they add up. Sometimes they really add up.
Apr 6, 2014 at 13:52	comment	added	Aaronaught		Really? You think accessing columns by index instead of by name is a good idea because it's faster? Talk about micro-optimization... you're talking about queries that are, at minimum, on the order of several milliseconds, and you're saving maybe a dozen CPU instructions at the very high cost of any kind of maintainability. Hard-coding column indexes is "an error waiting to happen" - doing `SELECT *` is almost never going to affect the correctness of any half-decently designed program.
Apr 6, 2014 at 10:50	comment	added	gbjbaanb		@Aaronaught who says he's using a major framework? who says he isn't using index access? You can assume all you like, but at the end of the day - its an error waiting to happen solely to save someone typing out what they wanted to use - laziness is always poor programming practice. (and index is good - its fast, unlike string compares for everything. There are reasons apps run as fast today as they used to years ago even though we have vastly faster computers. This is one of them)
Apr 6, 2014 at 2:01	comment	added	Aaronaught		Terrible answer. Almost none of the major frameworks today care about having extra columns in the result set. Even if you don't use a framework, most of the time `SELECT ` should work - I mean, you are* referencing columns by name, and not by index, aren't you? Index-based mapping is just bad. There are plenty of reasons not to use `SELECT *`, but this isn't one of them.
Apr 4, 2014 at 17:12	comment	added	Adriano Varoli Piazza		@MooingDuck right, I misread that.
Apr 4, 2014 at 16:30	comment	added	Mooing Duck		@AdrianoVaroliPiazza: What he meant was that performance (via being explicit about columns) is a better reason to avoid `` than the inability to access by column. He was NOT suggesting that `` is a good thing.
Apr 4, 2014 at 16:00	comment	added	Adriano Varoli Piazza		@jwenting I fail to see the performance improvement of receiving all columns of a table or join across the network when I only needed 2 columns.
Apr 4, 2014 at 6:53	comment	added	jwenting		@MooingDuck he seems to equate correctness with looking pretty, and THAT's what is the problem. Performance should never be sacrificed to make code look "pretty". If it works as it should, what it looks like is not that important (though, as I said, making it look good is rewarding, and it can make maintenance easier, but if it looks good but doesn't perform to specs it's not correct).
Apr 3, 2014 at 23:26	comment	added	Craig Otis		@immibis I agree. While the order of column data returned by a database query might be maintained, using the order of your columns (especially if you're using * instead of specifying them!) to obtain your data seems extremely error-prone.
Apr 3, 2014 at 22:14	comment	added	Stack Exchange Broke The Law		@gbjbaanb then access the columns by name. Anything else would be obviously stupid unless you specified the column order in the query.
Apr 3, 2014 at 20:45	comment	added	gbjbaanb		@TallGuy the point is you do not know which order the columns will appear in (ie if something changes). I prefer to be explicit in these things, so I am not surprised by code that reads columnX yet one day suddenly starts to be populated with columnN. Its sloppy, lazy programming practice. (and its also inefficient if you only use a subset).
Apr 3, 2014 at 19:26	comment	added	TallGuy		I don't understand the point in this answer. If you add a column to a table, both the SELECT * and the SELECT [Columns] will work, the only difference is that if the code needs to bind to the new column, the SELECT [Columns] will need to be modified whereas the SELECT * will not. If a column is removed from a table, the SELECT * will break at the point of binding, whereas the SELECT [Columns] will break when the query is executed. It seems to me that the SELECT * is the more flexible option as any changes to the table would only require changes to the binding. Am I missing something?
Apr 3, 2014 at 18:35	comment	added	Mooing Duck		@jwenting: I think gbjbannb misunderstood you as saying that performance matters more than correctness, and then you affirmed this misunderstanding. What you originally said was performance is more correct than convenience, not correctness, which I think is far less disagreeable.
Apr 3, 2014 at 17:13	comment	added	Nick		@jwenting There's gross performance negligence which borders on or outright crosses over into incorrect code (making something O(n!) that could easily be O(n), for example), and then there's sacrificing correctness for the sake of (often misunderstood) performance "gains" (such as putting NOLOCK on every single query because "it improves performance and prevents deadlocks"). Performance is a tradeoff, but as gbjbaanb said, code that doesn't need to work is easily the fastest code.
Apr 3, 2014 at 16:17	comment	added	gbjbaanb		@jwenting think for a bit. Fast code that does not work, is code that does not work. If performance matters more than correctness, I have some awesome code that runs in almost 0 milliseconds that I can sell you for a quite reasonable sum :)
Apr 3, 2014 at 15:54	comment	added	user1068		@Bratch, in real life production environments, you might have hundreds of applications using the same tables and there's no possible way all those applications can be maintained properly. You are correct in sentiment, but practically, the argument fails just due to the realities of working in copmanies. Schema changes to active tables happens all the time.
Apr 3, 2014 at 15:36	comment	added	Bratch		Someone shouldn't be making schema changes to the database your program uses without knowing what it impacts. If this is happening, then we could equally ask what if someone renames one of the columns, or changes the data type. See the second answer here stackoverflow.com/questions/3180375/select-vs-select-column
Apr 3, 2014 at 14:51	comment	added	jwenting		@gbjbaanb yes, it does. Sadly many people forget that you should write working code rather than pretty code (though if you can combine the two, even better). And poorly performant code, code that causes excessive network traffic for example, or overloads the database server, is NOT correct code. Which is exactly what can easily happen when doing "select *", especially if you're pulling in BLOB or CLOB columns you don't need.
Apr 3, 2014 at 14:49	comment	added	gbjbaanb		@jwenting really? performance matters more than correctness? Anyway, I don't see that "select *" performs better than selecting only the columns you want.
Apr 3, 2014 at 14:38	comment	added	jwenting		Performance, network load, etc. etc. are far more important than convenience of getting the columns back in the order and with the name you want.
Apr 3, 2014 at 13:51	comment	added	PlasmaHH		When using APIs where in code you access columns by column name, nothing much will happen, unless a column you want to use isn't there anymore.
Apr 3, 2014 at 12:45	history	edited	gbjbaanb	CC BY-SA 3.0	edited body
Apr 3, 2014 at 12:14	vote	accept	the baconing
Apr 3, 2014 at 11:59	history	answered	gbjbaanb	CC BY-SA 3.0

toggle format