Trojan SQL Function Hack - A PL Lemma in Disguise

Have you ever noticed that in PostgreSQL you can put set returning functions in the SELECT part of an sql statement if the function is written in language SQL or C. Try the same trick for PL written functions such as plpgsql, plperl, plr etc, and you get a slap on the wrist of the form ERROR: set-valued function called in context that cannot accept a set. For Plpgsql and other PL languages you must put the set returning function in the FROM clause.

Below is a simple example:


--Build test data
CREATE TABLE test
(
  test_id serial NOT NULL,
  test_date date,
  CONSTRAINT pk_test PRIMARY KEY (test_id)
)
WITH (OIDS=FALSE);

INSERT INTO test(test_date)
	SELECT current_date + n
		FROM generate_series(1,1000) n;
		
--test function with sql
CREATE OR REPLACE FUNCTION fnsqltestprevn(id integer, lastn integer)
  RETURNS SETOF test AS
$$
	SELECT *
	FROM test 
	WHERE test_id < $1 ORDER BY test_id
	LIMIT $2
$$
  LANGUAGE 'sql' VOLATILE;
  
--Test example 1 works fine
SELECT (fnsqltestprevn(6,5)).*;

--Test example 2 works fine
SELECT *
	FROM fnsqltestprevn(6,5);
	
--Same test function written as plpgsql
CREATE OR REPLACE FUNCTION fnplpgsqltestprevn(id integer, prevn integer)
  RETURNS SETOF test AS
$$
DECLARE
    rectest test;
BEGIN
	FOR rectest 
		IN(SELECT * 
			FROM test 
			WHERE test_id < id 
			ORDER BY test_id LIMIT prevn)  
								LOOP
		RETURN NEXT rectest;
	END LOOP;
END;
$$
LANGUAGE 'plpgsql' VOLATILE;

--Test example 1 - gives error
-- ERROR:  set-valued function called in context that cannot accept a set
SELECT (fnplpgsqltestprevn(6,5)).*;

--Test example 2 works fine
SELECT *
	FROM fnplpgsqltestprevn(6,5);

So it appears that PostgreSQL is not quite as democratic as we would like.

--But what if we did this?
CREATE OR REPLACE FUNCTION fnsqltrojtestprevn(id integer, prevn integer)
	RETURNS SETOF test AS
$$
	SELECT * FROM fnplpgsqltestprevn($1, $2);
$$
LANGUAGE 'sql' VOLATILE;

--Test example 1 - works fine
SELECT (fnsqltrojtestprevn(6,5)).*;

--Test example 2 works fine
SELECT *
	FROM fnsqltrojtestprevn(6,5);

All interesting, but so what? you may ask. It is bad practice to put set returning functions in a SELECT clause. Such things are commonly mistakes and should be avoided.

Functional Row Expansion

It turns out that there are a whole class of problems in SQL where the simplest way to achieve the desired result is via a technique we shall call Functional Row Expansion. By that, we mean that for each record in a given set, we want to return another set of records that can not be expressed as a constant join expression. Basically the join expression is different for each record or the function we want to apply is too complicated to be expressed as a static join statement or join at all.

Taking the above example. Lets say for each record in test, you want to return the 4 records preceding including the current one. So basically you want to explode each row into 5 or fewer rows. Your general gut reaction would be do something as follows:


these give error: ERROR: function expression in FROM cannot refer to other relations of same query level
	
		SELECT test.test_id As ref_id, test.test_date as ref_date, targ.*
			FROM test , 
				(SELECT tinner.* 
					FROM test as tinner 
						WHERE tinner.test_id <= test.test_id 
					ORDER BY tinner.test_id LIMIT 5) As targ;
					
		SELECT test.test_id As ref_id, test.test_date as ref_date, targ.*
			FROM test,fnsqltrojtestprevn(test.test_id, 5) As targ;
	


--But this does what you want
	
SELECT test.test_id As ref_id, test.test_date as ref_date, 
	(fnsqltrojtestprevn(test.test_id, 5)).*
  FROM test
	

Keep in mind what makes the above tricky is that you want to return at most 4 of the preceding plus current. If you want to return all the preceding plus current, then you can do a trivial self join as follows:

	
SELECT test.test_id As ref_id, test.test_date as ref_date, targ.*
  FROM test INNER JOIN
	 test As targ ON targ.test_id <= test.test_id
  ORDER BY test.test_id, targ.test_id
	

So as you can see - its sometimes tricky to tell when you need to use this technique and when you don't.

For this trivial example, writing the function as an SQL only function works fine and is the best to use. SQL functions unfortunately lack the ability to define dynamic sql statements, among other deficiencies so resorting to using a pl language is often easier which means you lose this useful feature of sql functions. Stuffing a pl function in an SQL function just might do the trick. We haven't tried this on other pl languages except plpgsql, but we suspect it should work the same.