UPDATE: Thanks all for the suggestions. For now we ended up increasing the
seq_page_cost from 1 to 2 in the database. That has gotten us back to our old much much faster speeds without change in code and seems to have
improved the speeds of other queries as well, without reducing speed of any.
ALTER DATABASE mydb SET seq_page_cost=2;
As Jeff suggested, we'll try to come up with a standalone example that exhibits the behavior. The below example was more to demonstrate the construct. Table names and fields were changed to protect the innocent so that is why we didn't bother showing explain plans. The behavior also seems to do with the distribution of data and gets worse when stats are updated (via vacuum analyze). Didn't see this in PostgreSQL 8.3 and this was a system recently upgraded from 8.3 to 8.4---ORIGINAL ARTICLE HERE --
This is a very odd thing and I think has happened to us perhaps once before. Its a bit puzzling, and we aren't particularly happy with our work around because its something that looks to a casual observer as a bit bizarre. The hack is setting the enable_seqscan setting off for a particular query to force the planner to use indexes available to it.
What is particularly troubling about this problem, is that it wasn't always this way. This is a piece of query code we've had in an application for a while, and its worked shall I say really fast. Response times in 300 ms - 1 sec, for what is not a trivial query against a not so trivially sized hierarchy of tables. Anyrate, one day -- this query that we were very happy with, suddenly started hanging taking 5 minutes to run. Sure data had been added and so forth, but that didn't completely explain this sudden change of behavior. The plan it had taken had changed drastically. It just suddenly decided to stop using a critical index it had always used. Well it was still using it but just on the root table, not the children. Though querying a child directly proved that it still refused to use it, so it didn't seem to be the hierarchy at fault here.
We checked our dev server which has the same PostgreSQL 8.4.2 install as production (both running on Windows though production is a Windows 2008 and dev Windows 2003), and data that was probably about two weeks older on dev. Dev was working perfectly and production was not.
Then we vacuum analyzed some critical tables on Dev and Dev started to behave in the same slow stupid way with the same stupid plan. Vacuum analyzing is supposed to help the situation? After analyzing the plans, we realized what had changed before vacuum analyze and after was that, it stopped using this critical index on the child nodes.
Our query reduced down to what we think is the problem area. In a very watered down form it looked something like this:
SELECT item_name, geom FROM items WHERE items.feature_type IN(SELECT f.feature_type FROM lufeature_types As f WHERE f.grouping IN('G','K') );
That used to use the btree index we had on feature_type for all the tables but then suddenly stopped. Changing it to an explicit IN list where the IN list is what the subquery would return made it use the btree index again, but that was a very undesirable thing to do because we like our code as short as possible, and such a stupid solution would require writing a query -- parsing it into an array and then feeding it to a second query. And no even making the subquery a CTE did not help.
SELECT item_name, geom FROM items WHERE items.feature_type IN('A', 'B','C');
We finally ended up changing the code to do this which allowed us to keep the SQL statement as we liked it and still use the index. But the idea of giving the planner these kinds of hints especially when it never needed it before, I've always found a bit unpleasant.
set enable_seqscan = off; SELECT item_name, geom FROM items WHERE items.feature_type IN(SELECT f.feature_type FROM lufeature_types As f WHERE f.grouping IN('G','K') );
For those unfamiliar with the enable_seqscan setting, it doesn't disable sequential scanning completely to set it to off since well in some cases there is no alternative. It does make the planner use sequential scanning as a last resort when there is no index it can possibly use. Sequential scans are not always a bad thing even if you do have indexes since an index scan is not a completely free strategy.