Manually setting table column statistics
Question: How do you deal with bad stat counts?

You have a large table and the default planner stats collector underestimates distinct counts of a critical query column thus resulting in much less than optimal query plans. How can you manually set this?

PostgreSQL 9.0 introduced ability to set two settings on table columns: n_distinct and n_distinct_inherited which are described a bit in ALTER TABLE help.

The n_distinct is the estimated number of distinct values for that column with -1 or any negative number representing a percentage of estimated table count instead of a true count.

n_distinct_inherited is a setting useful for parent tables that denotes the estimated distinct count sum of all a parent's child tables.

By tweaking these settings when they are less than optimal, you can influence the query planner to produce better plans. Why this is necessary is mostly for large tables where the stat collector will not query the whole table to determine stats. The stats collector generally queries at most 10-30% of a table.

Determine If you need to set counts

It's always nice to have the stat collector do all these things for you especially if you have a table that is constantly updated and distinct counts can fluctuate a lot. For static tables you may just want to set them manually. So how do you know whether you should bother or not. Well you can check the current values the stats collector has with this query:


-- determine if your stats are fine - compare estimates with what you know they are --
SELECT tablename, schemaname, attname As colname, n_distinct,
array_to_string(most_common_vals, E'\n') AS common_vals,
array_to_string(most_common_freqs, E'\n') As dist_freq
FROM pg_stats
WHERE tablename = 'table_of_interest'
ORDER BY schemaname, tablename, attname;

You would then compare with your actuals

SELECT count(DISTINCT column_of_interest) FROM table_of_interest;

Will give you the current count.

Setting n_distinct and n_distinct_inherited
You may want to bump this up or down when you set the value. Next to set the column distinct count stats you would do something like below replacing 50 with the count you computed:

-- set stats (non parent tables ) --
ALTER TABLE table_of_interest
   ALTER COLUMN column_of_interest
   SET (n_distinct=50);
   
-- set stats (parent tables in an inheritance hierarchy ) --
ALTER TABLE table_of_interest
   ALTER COLUMN column_of_interest
   SET (n_distinct_inherited=50);