Tuesday, August 12. 2008More Aggregate Fun: Who's on First and Who's on LastPrinter FriendlyComments
Display comments as
(Linear | Threaded)
This aggregate really helpful, and is exactly what I needed to find "hot spot" records.
However, it is also useful to know how many "first" matches have been found (using the examples above, "how many jones' have an age of two"; the example above will result with one). I've been wracking my brain on this one (aggregate functions are relatively new to me), and I've come up with these functions to solve this issue: CREATE OR REPLACE FUNCTION explode_array(anyarray) RETURNS SETOF anyelement AS $$ SELECT ($1)[s] from generate_series(1,array_upper($1, 1)) AS s; $$ LANGUAGE 'sql' IMMUTABLE ROWS 1000; CREATE OR REPLACE FUNCTION count_first_element(anyarray) RETURNS integer AS $$ SELECT COUNT(*)::integer FROM explode_array($1) AS e WHERE e=$1[1]; $$ LANGUAGE 'sql' IMMUTABLE; CREATE AGGREGATE count_first(anyelement) ( SFUNC=array_append, STYPE=anyarray, FINALFUNC=count_first_element ); --and a similar (but modified) test: SELECT max(age) As oldest_age, min(age) As youngest_age, count_first(age) As youngest_count, count(*) As numinfamily, family, first(name) As youngest_name, last(name) as oldest_name FROM (SELECT * FROM (SELECT 2 As age , 'jimmy' As name, 'jones' As family UNION ALL SELECT 2 As age, 'c' As name , 'jones' As family UNION ALL SELECT 3 As age, 'aby' As name, 'jones' As family UNION ALL SELECT 35 As age, 'Bartholemu' As name, 'Smith' As family ) As foo ORDER BY age, family, name) as foo2 WHERE age is not null GROUP BY family; I haven't prepared a similar "count_last" aggregate, nor have I thoroughly tested this function. Also, I don't think that I handled NULL records gracefully as first_element_state above. ------------------ Also of importance is that the ORDER BY needs to have the first/last index before anything else (i.e., age needs to appear first). Failure to sort on this first will yield errors (e.g., try "ORDER BY family, name, age" which will incorrectly place aby as the youngest). The above example should have sorted in this order to reinforce this point.
#1
on
2008-09-12 03:36
I Liked this articles very much. It is very much helpful to me.
Thanks a lot, Regards, Shamsu Zoha
#2
on
2008-11-25 08:06
in order to get the first NOT NULL element:
CREATE OR REPLACE FUNCTION firstnotnull_element_state(anyarray, anyelement) RETURNS anyarray AS $$ SELECT CASE WHEN $2 is not null AND $1[1] IS NULL THEN array_prepend($2, $1) ELSE $1 END; $$ LANGUAGE 'sql' IMMUTABLE CREATE AGGREGATE firstnotnull(anyelement) ( SFUNC=firstnotnull_element_state, STYPE=anyarray, FINALFUNC=first_element ) HTH, M. Mamin
#3
on
2010-04-19 04:13
If you are looking for the first not null value, then there is no need to carry around an array of the information to deal with the initial state making the aggregate much simpler.
CREATE OR REPLACE FUNCTION first_notnull_state(anyelement,anyelement) RETURNS anyelement AS $$ SELECT COALESCE($1,$2); $$ LANGUAGE 'sql' IMMUTABLE; CREATE AGGREGATE first_notnull(anyelement) ( SFUNC=first_notnull_state, STYPE=anyelement ) ;
#3.1
on
2011-04-13 11:46
This post really saved me today Regina, thanks!
To be able to get the first (or any in the group) I think is very useful for things like this: Select ST_Difference(first(a.the_geom), ST_Union(b.the_geom)) from table1 a inner join table2 b on ST_Intersects(a.the_geom, b.the_geom) group by a.gid; When it is nessecary to union the geometries in the second table before difference to get the wanted difference result. From my understanding it is a more robust way of doing it than grouping on a.the_geom which is the alternative. Am I right? Thanks Nicklas
#4
on
2010-08-25 17:31
Nicklas,
Haven't thought much of using it in that way. I would think the speed wouldn't be much different or it might be faster to do the way without first. e.g. GROUP BY a.gid, a.the_geom Remember the grouping by a.the_geom will just group by the bounding box so its a pretty light grouping anyway.
#4.1
on
2010-08-25 21:49
You are right that grouping the geometries together with gid is a better idea. I didn't think about that when I read about first and last :-)
But I think I have had some quite similar problem recently when first or last would have been the only option. But I cannot recall it. Anyway, it is a nice functionality. /Nicklas
#4.1.1
on
2010-08-27 18:27
Nicklas,
In 9.1 -- you can just group by gid without need of grouping by the geometry because in 9.1 you can leave out the grouping by geometry if gid is a primary key. Check out depesz article: http://www.depesz.com/index.php/2010/08/08/waiting-for-9-1-recognize-functional-dependency-on-primary-keys/
#4.1.1.1
on
2010-08-28 16:16
Might be worth while to add that this workaround is no longer necessary as PG now has windowing functions (first_value, last_value) which can do exactly that.
http://www.postgresql.org/docs/current/static/functions-window.html
#6
on
2012-01-08 16:17
The work around is still needed. Window functions serve a different purpose. In this case we are using it as an aggregate to consolidate a number of records. Window functions unless you throw in a limit or something is going to return the same number of records as what you started out with, which is not desirable in this case.
#6.1
on
2012-01-08 20:32
The author does not allow comments to this entry
|
Entry's LinksQuicksearchCalendar
Categories
ArchivesBlog Administration |
One of the coolest features of PostgreSQL is the ability to write functions using plain old SQL. This feature it has had for a long time. Even before PostgreSQL 8.2. No other database to our knowledge has this feature. By SQL we mean sans procedural m
Tracked: Jun 02, 05:07
Tracked: Aug 14, 22:09
Tracked: Jan 08, 13:31