Tuesday, October 28. 2008
The age old question of why or why is my table index not being used is probably the most common question that ever gets asked even by expert database users.
In this brief article we will cover the most common reasons and try to order by statistical significance.
For the uninitiated, how do you know when an index is not being used? You use EXPLAIN, EXPLAIN ANALYZE or PgAdmin's nifty graphical Explain plan described in Reading PgAdmin Graphical Explain Plans.
Okay your query is not using an index, why and what can you do about it?
Display comments as (Linear | Threaded)
You're right, a varchar_pattern_ops index won't work for IN (or more generally, plain "=") queries. This is because its equality operator is ~=~ ... which, in the original conception, didn't necessarily work like =. pattern_ops uses straight strcmp comparison while regular varchar comparison is based on strcoll, and in non-C locales strcoll can say that two strings are equal even if they aren't bitwise equal.
For the last release or so there's been a hard-wired requirement that varchar = be plain bitwise equality, which is what ~=~ does. So for 8.4 we have gotten rid of ~=~ and made regular = be the equality member of the varchar_pattern_ops opclass.
In short: as of 8.4 a pattern_ops index will be able to service plain "=" and hence IN queries, but in earlier releases you really do need two indexes if you're not using C locale.
#1 Tom Lane on 2008-10-29 00:07
You missed 0, the most common case, which is that the planner is smarter than the question, and using that index wouldn't have been the right thing anyhow.
One of the common causes for using a sequential scan instead of an indexed one you didn't mention is that it will happen if the query is accessing a large portion of the table. Sometimes people wonder why the index isn't being used for a query that is accessing, say 30% of the rows. The reality is that if you're accessing that many, given how typical rows are clustered together you might as well avoid the overhead of using the index and just look at the whole thing.
With default parameters I believe that crossover point (where it's considered faster to just access the whole thing) is normally at 20% of the table--if the planner believes you're going to see more than that, it might as well just fetch the whole thing. Which is normally correct, but can be wrong particularly for very large tables where the data is packed tightly (a typical example is a large historical archive you're pulling a section out of). That sort of thing is where temporary changes to random_page_cost and enable_seqscan can be handy as pseudo-hints to the optimizer.
#3 Greg Smith on 2008-10-30 23:20
"...I tend to think hints are a bad idea and the best solution is just to make the planner better..."
You're kidding, right?
Denying that, in certain, simple cases a hint like "use index_X" is somehow destroying the "beauty" of the database is living in a fantasy world. Mr. Fetter makes an excellent case by talking about how he makes "temporary changes" to system operation variables in order to bend the optimizer to his way of seeing things.
If, in fact, the person writing the query knows about the structure of the table, and the relation returned rows have to a particular index, it seems far more reasonable to let the person give the optimizer a hint on where to start. That's why they're called hints!
Assuming that the optimizer can be made all-knowing and all-seeing seems pretty irrational to me, but that's just my view.
#4 T Cornwell on 2009-03-26 14:54
I still say that hints are a bad idea. In reality I find they are more abused than useful and in some cases they lead to really stupid things if some other dba comes along not knowing there is a hint living in code somewhere and restructures the table such that the hint is more than useless.
The statistics of the database know more about the distribution of your data than you do. Hints require you constantly tweak your queries as new indexes are added or mroe data is added.
Hints should not live in queries. If they must exist, they should live as part of the table definition or someother meta data elsewhere. (which seems even more fantasy thinking I suppose)
#4.1 Regina on 2009-03-26 16:34
first of all, I wanted to say that I have a profund respect for postgre conceptors as it is a hard and complex job to build a solid database software as postgre.
However, I don't understand your strong opposition to SQL Hints which seems quite old (cf. http://archives.postgresql.org/pgsql-performance/2003-12/msg00181.php :
Tom Lane is "Doctor Optimization," and if you look at past discussion threads of this sort, you'll see that he tends to rather strongly oppose the introduction of "hints.")
Of course, making the optimizer as best as possible is the right way.
Of course you should use the right data-type, etc.
Of course hints can "lead to really stupid things".
Of course "The statistics of the database know more about the distribution of your data than you do" ... except in some rare cases where the optimizer cannot go the right way!
A simple case is with a (indexed) column with few values (ex: a "status" column with "new" or "old" possible values only).
If you have well written your application, you should use bind variables. So your query from the planner point of view should look like something as :
SELECT * from table where status = $status;
Let say that data repartition is 1% of "new" lines and 99% of "old" lines.
What's the best for our query? Even the most clever optimizer could not answer!
The best for "old" status would be a FULL TABLE SCAN.
The best for "new" status would be an INDEX RANGE SCAN.
And that's a really simple example and there are many tricky situations where an execution plan should go the wrong way...
Moreover, in production situations, changing the query or the design is a mess! And a hint can sometimes simply fix the problem!
Why this should be so bad to fix a heavy SQL request in 5 minutes instead of struggling during months to convince the development team or whoever to change their design and still have users complaining of the performances?
And no matter for me to change the DB settings (as far as it is well configured) for a single query!
#4.1.1 Sylvain Dussart on 2010-09-14 20:34
Ok, so hints are bad because they go with the query. I can agree with that, but only because they are not more visible within the db.
Perhaps a happy medium is to register queries (a la stored procedures) and then also associate and register hints with parameters, or at least comments, from the designer that can give some context to the reason for the hint.
It seems like the problem is between what the query designer knows about what will be returned vs what the optimizer can guess based on table statistics - with limited statistics, and zero knowledge about the reason for the query. In this scenario, the optimizer should work great *most* of the time, but will not be optimal sometimes. Ah well, enough speculation for now.
#5 T Cornwell on 2009-03-27 08:17
If the planner refuses to use the index, and if you know for sure (tested) the index is good and should be used, you can turn off sequence scan (SET enable_seqscan = off;) then run your query.
#6 Robert Xiong on 2009-11-13 12:43
Syndicate This Blog
Show tagged entries