3

I've been trying to find concise answers to the following questions and I've read a lot but I'm still unsure. Maybe there are no simple answers because scenarios can be so different.

But here are the questions:

1) Should I index a tinyint column where all records are either a 0 or 1?

Example Query where postActive column has every record as a 0 or 1:

SELECT postId, postName, postTitle
FROM postTable
WHERE postDate > Now()
AND postActive = 1

2) Should I index an int column where all records have 0 except records that have an alternate int value that is unique in the table?

For this example, the column orderProcessingId is going to be 100% unique in the table (other than zeros), and the vast majority of records have 0 for orderProcessingId.

SELECT orderId, orderInformation, orderData, orderStuff
FROM orderTable
WHERE orderProcessingId = 38457237

In both these tables there are hundreds of thousands of records.

I guess the reason I'm question things is because it seems like indexing would depend on a relatively distributed frequency of values to improve performance. But in example 1 there is only extremely high frequency of both (and only two) values, and in example 2 there is extremely high frequency of one value (0) and no repeat frequency for the exceptions.

But my ideas are all based pretty much on conjecture... so. What can you tell me?

1 Answer 1

3

I guess the reason I'm question things is because it seems like indexing would depend on a relatively distributed frequency of values to improve performance.

Not necessarily.

Should I index a tinyint column where all records are either a 0 or 1?

Depends – on whether the distribution of those values is largely equal, and if not whether you are mostly interested in the values occurring significantly less.

Should I index an int column where all records have 0 except records that have an alternate int value that is unique in the table?

Since you are trying to search for one of those rare and unique values – yes.

(I hope you are not confusing 0 and NULL here, btw. – because if your integer type entries were unique except for those with a value 0, you could not use a UNIQUE index here.)

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks CBroe. In the first example, most records are 1. But, there are like 700k 1's and like 400k 0's. I'm ALWAYS only interested in looking at the 1's. For example 2, are you saying I should allow Null and use a unique index for better performance? (Right now I'm using 0 and a Normal index)
In the case of 700k 1's and 400k 0's it might even get slower with the index in some cases. If you have 1% 1's and 99% 0's MySQL will do the right thing most of the time (use the index only when you select the 1's).
Well, there’s an “it depends” in almost every answer to a question like this. To surely now, the asker has to give it a try, measure and use EXPLAIN to find out what effects really occur.
@CBroe I agree. Also be warned that EXPLAIN can be very misleading when you have 50% 0's and 50% uniform integers (eg. it shows 50 rows but there are actually 100k). It might be better with NULL's instead of 0's, but I haven't tested that.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.