How to force PostgreSQL to use a pre-calculated value

This question is one that has come up a number of times in PostGIS newsgroups worded in many different ways. The situation is that if you use a function a number of times not changing the arguments that go into the function, PostgreSQL still insists on recalculating the value even when the function is marked IMMUTABLE. I have tested this on 8.2 and 8.3 with similarly awful results.

This issue is not so much a problem if function calculations are fast, but spatial function calculations relative to most other functions you will use are pretty slow especially when dealing with large geometries. As a result your query could end up twice as slow. Even setting the costs of these functions to relatively high does not help the situation.

To demonstrate here is a non-PostGIS version of the issue that everyone should be able to run and demonstrates its not a PostGIS only issue.

CREATE OR REPLACE FUNCTION fn_very_slow(IN param_sleepsecs numeric) RETURNS numeric AS
$$
BEGIN
	PERFORM pg_sleep(param_sleepsecs);
	RETURN param_sleepsecs;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE STRICT ;

--runs in 4524 ms
SELECT fn_very_slow(i*0.5) As firstcall
FROM generate_series(1,5,2) As i;

--runs in 9032 ms - no cache, but in spatial functions (say ST_Distance)
-- we have tried this does sometimes cache and return in 4524ms
SELECT fn_very_slow(i*0.5) As firstcall,
	fn_very_slow(i*0.5)  As secondcallsame
FROM generate_series(1,5,2) As i;


--runs in 9032 ms - no cache
SELECT firstcall,
	firstcall + 1 As secondcalldifferent
FROM (SELECT fn_very_slow(i*0.5) As firstcall
FROM generate_series(1,5,2) As i
) As foo;
Solution:

Our solution to this problem I find kind of ugly, hard to explain, and not ideal. The solution we use is to wrap in a subquery and put an ORDER BY in the subselect. It doesn't seem to matter what that ORDER BY is. You could do ORDER BY 1 and it works though if you have a preferred order, you should use that. The ORDER BY seems to trick the planner into materializing the subselect with the costly function so by the time it hits the main one, it sees the costly calculation as a constant.

Watch what happens when we throw in a meaningless ORDER BY clause

--runs in 4524 ms - caches
SELECT firstcall,
	firstcall + 1 As secondcalldifferent
FROM (SELECT fn_very_slow(i*0.5) As firstcall
FROM generate_series(1,5,2) As i
ORDER BY 1 OFFSET 0) As foo;

Note if you leave out the ORDER BY the planner may or may not materialize the subquery, but ORDER BY seems to almost guarantee it.

Note this is not optimal because for large datasets, you just want the cached result to be reused. You don't want the result to be materialized since you loose the usefulness of indexes.

If anyone has any thoughts on the matter I would love to hear them since this is a big cause of some frustration when you are trying to run spatial queries that have to return in 3 seconds or less.