Monday, July 03. 2017
Printer Friendly
This weekend we spent sometime moving PostGIS/pgRouting windows buildbot Winnie to new hardware.
Leo did the hardware and I handled installing and reconfiguring stuff.
While I was at it, I upgraded to new Jenkins.
Vicky Vergara has been bugging me to setup pgTap
so she can run her pgRouting pgTap tests to make sure they work on windows.
She's got 22488 tests. She just loves pgTap.
Last time I tried installing pgTap I gave up, but I was in mood for experimentation so gave it another chance.
Continue reading "Installing pgTap in windows with msys2 and mingw64"
Monday, January 27. 2014
Printer Friendly
I've read from many that PL/V8 mathematic operations are generally faster than what you get with SQL functions and PL/pgsql functions. One area where I thought
this speed would be really useful was for writing Map Algebra call-back functions. A PostGIS 2.1+ map algebra callback function signature looks like:
func_name(double precision[][][] value, integer[][] pos, text[] VARIADIC userargs)
Continue reading "Writing PostGIS raster Map Algebra Callback Functions in PLV8"
Sunday, August 18. 2013
Printer Friendly
Two of the big changes in PostGIS 2.1 raster are the improved speed and functionality of the raster ST_Union function and ST_Clip. Aside from speed, the big thing with ST_Union in 2.1 is that it applies operations to all bands by default.
These are our most favorite funcitons of all. This is a continuation of Word Play with spatial SQL, except we'll be generating rasters instead of geometries and exercising some raster functions in addition to geometry functions.
Although these SQL statements look long and somewhat complicated, they are easily wrappable in an SQL function. We have, for example, an sql function to write letters on parcels that just takes as input the parcel id and the words to write.
This uses the postgis_letters extension, which we've finally put up on github postgis_letters. For rendering the images, we used our quickie viewer which relies on ASP.NET or PHP and JQuery. We'll be putting that up on github as well once
we've dusted it off a bit.
If you are interested in the aerial and parcel geometry data we are using here, we grabbed it from MassGIS for Cambridge, Massachusetts area. You might recognize the base query in our upcoming DZone PostGIS refCard.
So here it goes. An exercise in raster expression.
Continue reading "Raster Words using PostGIS 2.1"
Thursday, February 14. 2013
Printer Friendly
A while back I mentioned to Bborie (aka dustymugs) and Sandro (aka strk): We're missing ability to label our images with text.
Picture this: What if someone has got a road or a parcel of land and they want to stamp
the parcel id or the road name on it and they don't want to have to depend on some piece of mapping software. Many report writers and databases can talk to each other directly
such as we demonstrated in Rendering PostGIS raster graphics with LibreOffice
and we've got raster functionality in the database already. Can we cut out the middleman?
Some ideas came to mind. One we could embed a font lib into PostGIS thus having the ultimate bloatware minus the built-in coffee maker.
strk screeched. He's still recovering from my constant demands of having him upgrade his PostgreSQL version. Okay fair enough.
Bborie suggested why don't you import your letters as rasters and then vectorize them. So that's exactly what we are going to demonstrate in this article
and in doing so we will combine some of the new features coming in PostGIS 2.1 we've been talking about in Waiting for PostGIS 2.1 series.
Warning this article will have a hint of Rube Goldbergishness in it.
If you are easily offended by seeing stupid tricks done with SQL, stop reading now.
We are going to take a natural vector product and rasterize it just so
we can vectorize it again so we can then rasterize it again. Don't think about it too much. It may trigger activity in parts of your brain you didn't know were there
thus resulting in stabbing pains similar to what you experience by quickly guplping down a handful of Wasabi peas.
So here are the steps for creating your own font set you can
overlay on your geometries and rasters.
See if you can spot the use of window functions and CTEs in these examples.
Continue reading "Saying Happy Valentine in PostGIS"
Monday, July 16. 2012
Printer Friendly
I have updated instructions on my gist page for building with PostgreSQL 9.4 Build v8 and plv8
As mentioned in our previous article Building on MingW deploying on VC we often build on MingW and deploy on Windows servers running EDB distributed VC PostgreSQL builds
for extensions we want that don't come packaged. One of the new ones we are really excited about is the PL/V8 and PL/Coffee ones. Could we do it
and would it actually work on a VC build. YES WE CAN and yes it does. I HAZ Coffee and a V8: .
Here are some instructions we hope others will find useful. Even if you aren't on
Windows, you might still find them useful since MingW behaves much like other Unix environments.
If you are on windows, and just want to start using PLV8 and PLCoffee. We have binary builds for both PostgreSQL 9.2 Windows 32-bit (pg92plv8jsbin_w32.zip) and PostgreSQL 9.2 Windows 64-bit (pg92plv8jsbin_w64.zip) which you should be able to just extract into your PostgreSQL 9.2 beta windows install. We quickly tested with EDB VC++ builds and they seem to work fine
on standard VC++ PostgreSQL 9.2beta2 installs. We haven't bothered building for lower PostgreSQL, but if there is some interest, we'd be happy to try.
Continue reading "Building PLV8JS and PLCoffee for Windows using MingW64 w64-w32"
Sunday, July 01. 2012
Printer Friendly
In last article Finding Contiguous primary keys we detailed one of many ways of finding continuous ranges in data, but the approach would only work on higher-end dbs like Oracle 11G, SQL Server 2012, and PostgreSQL 8.4+. Oracle you'd have to replace the EXCEPT I think with MINUS. It wouldn't work on lower Oracle because of use of CTEs. It wouldn't work on lower SQL Server because it uses window LEAD function which wasn't introduced into SQL Server until SQL Server 2012. Someone on reddit provided a Microsoft SQL Server implementation which we found particularly interesting because - it's a bit shorter and it's more cross-platform. You can make it work with minor tweaks on any version of PostgreSQL, MySQL, SQL Server and even MS Access. The only downside I see with this approach is that it uses correlated subqueries which tend to be slower than window functions. I was curious which one would be faster, and to my surprise, this version beats the window one we described in the prior article. It's in fact a bit embarrassing how well this one performs. This one finished in 462 ms on this dataset and the prior one we proposed took 11seconds on this dataset. Without further ado. To test with we created a table:
CREATE TABLE s(n int primary key);
INSERT INTO s(n)
SELECT n
FROM generate_series(1,100000) As n
WHERE n % 200 != 0;
Continue reading "Contiguous Ranges of primary keys: a more cross platform and faster approach"
Tuesday, October 18. 2011
Printer Friendly
One of the great lessons learned in building PostGIS extensions is my rediscovery of SED. SED turned out to be
mighty useful in this regard and I'll explain a bit in this article. Unfortunately there is still a lot I need to learn about it
to take full advantage of it and most of my use can be summed up as monkey see, monkey scratch head, monkey do. In addition I came across what I shall refer to as Pain points with using
the PostgreSQL Extension model. Part of which has a lot to do with the non-granular management of changes in PostGIS,
the day to day major flux of changes happening in PostGIS 2.0 space, and my attempt at trying to creat upgrade freeze points amidst these changes.
When PostGIS 2.0 finally arrives, the freeze points will be better defined and not change from day to day. So some of these issues
may not be that big of a deal.
Continue reading "Lessons learned Packaging PostGIS Extensions: Part 2"
Saturday, April 30. 2011
Printer Friendly
We like to enforce business rules at the database level wherever
we can, for the simple reason, particularly the business we are in, most database update happens
outside the end-user application layer.
That is not to say you shouldn't enforce at the application level too, but that the database is the last
line of defense, is usually more self-documenting than application code can be, and also protects you from your
programmers, even when that your programmers is you.
Domains are objects that you will find in many high-end
standards-compliant databases. They exist in SQL Server, Oracle, IBM Db2, Firebird, and PostgreSQL to name a few.
Domains have existed for a really long time in PostgreSQL. In PostGIS topology, Sandro Santilli (usually known as strk), takes advantage of them for fleshing out the topology support, and I got turned on to them by him.
With that said - let's dive into domains.
What are domains?
Domains are essentially a reusable packaging of check constraints. You use them as if they were a custom data type.
The nice thing about them is that they are usually transparent to applications that
don't understand them.
Example 1: Enforce pay ending/pay day happens only on certain days of the week
Here is an example -- suppose you had a payment system, and you had a rule that the pay thru end date has to
fall on a Friday. You could create a domain such as the following:
CREATE DOMAIN dom_payday
AS date
CONSTRAINT check_dow CHECK (trim(to_char(VALUE, 'day')) = 'friday');
COMMENT ON DOMAIN dom_payday IS 'Company payday rules';
Continue reading "Using Domains to Enforce Business Rules"
Monday, October 05. 2009
Printer Friendly
In our prior story about allocating people with the power of window aggregation, we saw our valiant hero and heroine trying
to sort people into elevators
to ensure that each elevator ride was not over capacity. All was good in the world until someone named Frank came along and spoiled the party.
Frank rightfully pointed out that our algorithm was flawed because should Charlie double his weight, then we could have one elevator ride over capacity.
We have a plan.
Continue reading "Allocating People into Groups with SQL the Sequel"
Wednesday, July 01. 2009
Printer Friendly
PostgreSQL 8.4 has ANSI SQL:2003 window functions support. These are often classified under the umbrella terms of basic Analytical or Online Application Processing (OLAP) functions.
They are used most commonly for producing cumulative sums, moving averages and generally rolling calculations that need to look at a subset of the overall dataset (a window frame of data) often relative to a particular row.
For users who use SQL window constructs extensively, this may have been one reason in the past to not to give PostgreSQL a second look. While you may not
consider PostgreSQL as a replacement for existing projects because of the cost of migration, recoding and testing, this added new feature is definitely a selling point
for new project consideration.
If you rely heavily on windowing functions, the things you probably want to know most about the new PostgreSQL 8.4 offering are:
- What SQL window functionality is supported?
- How does PostgreSQL 8.4 offering compare to that of the database you are currently using?
- Is the subset of functionality you use supported?
To make this an easier exercise we have curled thru the documents of the other database vendors to distill what the SQL Windowing functionality they provide in their core product.
If you find any mistakes or ambiguities in the below please don't hesitate to let us know and we will gladly amend.
For those who are not sure what this is and what all the big fuss is about, please read our rich commentary on the topic of window functions.
Continue reading "Window Functions Comparison Between PostgreSQL 8.4, SQL Server 2008, Oracle, IBM DB2"
Friday, March 13. 2009
Printer Friendly
In our PLPython Part 4: PLPython meets aggregates we demonstrated the power of PostgreSQL in combination
with PLPython to create a sequence of ascii plots by using the power of aggregation. Our friend Simon Greener over at Spatial Db Advisor told me the example
was clever but ascii plots was so 70ish and ugly, why didn't I do SVG plots? He felt people being predominantly visual would be caught up in the ugliness of Ascii plots and miss the point.
At least Bob found them neat and can appreciate the elegance in simplicity, thanks Bob..
To make it up
to Simon, we shall demonstrate 2 features we left out in our prior article.
- Pretty SVG graphs -- so I don't have to hear more about SVG from Simon and how I'm caught in the 70s.
- Ability to plot a schedule -- schedule item, start date, end date using a multi-column aggregate function. Here is a good use for multi-column aggregates or at least I think so. (Yeh for Multi-column aggregates)
If you are into spatial processing in the database or how many ways you can play with XML particularly Oracle Locator, Oracle Spatial, PostGIS and SQL Server 2008, we highly recommend Simon's articles.
Continue reading "PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots"
Sunday, September 07. 2008
Printer Friendly
One thing I'm really looking forward to have in the upcoming PostgreSQL 8.4 is the introduction
of the WITH RECURSIVE feature that IBM DB2 and SQL Server 2005 already have. Oracle has it too but in a non-standard CONNECT BY so is much less portable.
This is a feature that is perhaps more important to
us for the kind of work we do than the much complained about lack of windowing functions.
I was recently taking a snoop at IBM DB2 newsletter. Why I read magazines and newsletters on databases I don't even use I guess is to see what I'm missing out on
and to sound remotely educated on the topic when I run into one of those people. I also have a general fascination with magazines.
In it their latest newsletter they had examples of doing Fibonacci and Graphs with Common Table Expressions (CTEs).
Robert Mala's Fibonacci CTE
Robert Mala's Graph CTE
Compare the above to David Fetter's Fibonacci Memoizing
example he posted in our comments way back when.
I'd be interested in seeing what solutions David and others come out with using new features of 8.4. We can see a before 8.4 and after 8.4 recipe.
As a slightly off-topic side note - of all the Database magazines I have read - Oracle Magazine is the absolute worst. SQL Server Magazine and IBM DB2 are pretty decent.
The real problem is that Oracle's magazine is not even a database magazine.
Its a mishmash of every Oracle offering known to man squashed into a compendium that can satisfy no one. You would think that Oracle as big as their database is
would have a magazine dedicated to just that.
Perhaps there is another magazine besides Oracle Magazine, but haven't found it so I would be interested to know if I missed something.
Monday, December 31. 2007
Printer Friendly
Have you ever noticed that in PostgreSQL you can put set returning functions in the SELECT part
of an sql statement if the function is written in language SQL or C. Try the same trick for PL written functions
such as plpgsql, plperl, plr etc, and you get a slap on the wrist of the form
ERROR: set-valued function called in context that cannot accept a set. For Plpgsql and other PL languages you must put the set returning function in the FROM clause.
Below is a simple example:
--Build test data
CREATE TABLE test
(
test_id serial NOT NULL,
test_date date,
CONSTRAINT pk_test PRIMARY KEY (test_id)
)
WITH (OIDS=FALSE);
INSERT INTO test(test_date)
SELECT current_date + n
FROM generate_series(1,1000) n;
--test function with sql
CREATE OR REPLACE FUNCTION fnsqltestprevn(id integer, lastn integer)
RETURNS SETOF test AS
$$
SELECT *
FROM test
WHERE test_id < $1 ORDER BY test_id
LIMIT $2
$$
LANGUAGE 'sql' VOLATILE;
--Test example 1 works fine
SELECT (fnsqltestprevn(6,5)).*;
--Test example 2 works fine
SELECT *
FROM fnsqltestprevn(6,5);
--Same test function written as plpgsql
CREATE OR REPLACE FUNCTION fnplpgsqltestprevn(id integer, prevn integer)
RETURNS SETOF test AS
$$
DECLARE
rectest test;
BEGIN
FOR rectest
IN(SELECT *
FROM test
WHERE test_id < id
ORDER BY test_id LIMIT prevn)
LOOP
RETURN NEXT rectest;
END LOOP;
END;
$$
LANGUAGE 'plpgsql' VOLATILE;
--Test example 1 - gives error
-- ERROR: set-valued function called in context that cannot accept a set
SELECT (fnplpgsqltestprevn(6,5)).*;
--Test example 2 works fine
SELECT *
FROM fnplpgsqltestprevn(6,5);
So it appears that PostgreSQL is not quite as democratic as we would like.
--But what if we did this?
CREATE OR REPLACE FUNCTION fnsqltrojtestprevn(id integer, prevn integer)
RETURNS SETOF test AS
$$
SELECT * FROM fnplpgsqltestprevn($1, $2);
$$
LANGUAGE 'sql' VOLATILE;
--Test example 1 - works fine
SELECT (fnsqltrojtestprevn(6,5)).*;
--Test example 2 works fine
SELECT *
FROM fnsqltrojtestprevn(6,5);
All interesting, but so what? you may ask. It is bad practice to put
set returning functions in a SELECT clause. Such things are commonly mistakes and should be avoided.
Functional Row Expansion
It turns out that there are a whole class of problems in SQL where the simplest way to achieve the desired result is via
a technique we shall call Functional Row Expansion. By that, we mean that for each record in a given set, we want to
return another set of records that can not be expressed as a constant join expression. Basically the join expression is different
for each record or the function we want to apply is too complicated to be expressed as a static join statement or join at all.
Taking the above example. Lets say for each record in test, you want to return the 4 records preceding including the current one. So basically you want to
explode each row into 5 or fewer rows. Your general gut reaction would be do something as follows:
these give error: ERROR: function expression in FROM cannot refer to other relations of same query level
SELECT test.test_id As ref_id, test.test_date as ref_date, targ.*
FROM test ,
(SELECT tinner.*
FROM test as tinner
WHERE tinner.test_id <= test.test_id
ORDER BY tinner.test_id LIMIT 5) As targ;
SELECT test.test_id As ref_id, test.test_date as ref_date, targ.*
FROM test,fnsqltrojtestprevn(test.test_id, 5) As targ;
--But this does what you want
SELECT test.test_id As ref_id, test.test_date as ref_date,
(fnsqltrojtestprevn(test.test_id, 5)).*
FROM test
Keep in mind what makes the above tricky is that you want to return at most 4 of the preceding plus current. If you want to return all the preceding plus current, then
you can do a trivial self join as follows:
SELECT test.test_id As ref_id, test.test_date as ref_date, targ.*
FROM test INNER JOIN
test As targ ON targ.test_id <= test.test_id
ORDER BY test.test_id, targ.test_id
So as you can see - its sometimes tricky to tell when you need to use this technique and when you don't.
For this trivial example, writing the function as an SQL only function works fine and is the best to use. SQL functions unfortunately
lack the ability to define dynamic sql statements, among other deficiencies so resorting to using a pl language is often easier which means you lose this useful feature of sql functions.
Stuffing a pl function in an SQL function just might do the trick. We haven't tried this on other pl languages except plpgsql, but we suspect it should work the same.
Friday, December 07. 2007
Printer Friendly
One of the annoying things about PostgreSQL unlike some other databases we have worked with is that simple views
are not automatically updateable. There is some work involved to make views updateable. For simple views,
this is annoying, but for more complex views it is a benefit to be able to control how things are updated. In a later version of PostgreSQL perhaps 8.4 or 8.5 this will
be ratified and PostgreSQL will enjoy the same simplicity of creating simple updateable views currently offered by MySQL and SQL Server and other DBMSs, but
still allow for defining how things should be updated for more complex views. For this exercise we are using PostgreSQL 8.2.5, but most of it should work
for lower versions with slight modification.
For this exercise, we shall create a fairly complex updateable view to demonstrate how one goes about doing this.
Here is a scenario where being able to control how a view is updated comes in very handy.
We all know relational databases are great because they give you great
mobility on how you slice and dice information. At times for data entry purposes, the good old simple flat file is just more user-friendly.
Problem: You are developing an inventory application for a molecular biology lab and they have the following requirements:
- They want to keep track of how much of each supply they use for each project grant for funding purposes and report on that monthly or daily.
- They want to keep track of how much of each supply they ordered, what they have left and their usage over time.
- They however want data entry to be as simple as possible. They want a simple flat file structure to input data that has columns for each project usage and column for purchase quantity.
They have 2 projects going on. One on Multiple Sclerosis Research (MS) and one on Alzheimer's. Each is funded by different grants and for grant cost allocation purposes, they need to keep track of the supplies they use on each project.
How do you present a flat file inventory entry screen, but behind the scenes have a inventory and inventory transaction scheme so you can run period reports and aggregate summaries and have automatic totaling?
Possible Solution: One way to do it is with a crosstab summary view that is updateable. Views are incredibly useful abstraction tools.
You do that in PostgreSQL by creating insert, update, and delete rules on your views. For our particular case, we will not be allowing deletion so we will not have a delete rule.
In our system we have 2 tables for simplicity. inventory and inventory_flow. I know we should have a project lookup table or in 8.3 possibly use an ENUM, but to make this short, we are skipping that.
CREATE TABLE inventory
(
item_id serial NOT NULL,
item_name varchar(100) NOT NULL,
CONSTRAINT pk_inventory PRIMARY KEY (item_id),
CONSTRAINT inventory_item_name_idx UNIQUE (item_name)
)
WITH (OIDS=FALSE);
CREATE TABLE inventory_flow
(
inventory_flow_id serial NOT NULL,
item_id integer NOT NULL,
project varchar(100),
num_used integer,
num_ordered integer,
action_date timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT pk_inventory_flow PRIMARY KEY (inventory_flow_id),
CONSTRAINT fk_item_id FOREIGN KEY (item_id)
REFERENCES inventory (item_id)
ON UPDATE CASCADE ON DELETE RESTRICT
)
WITH (OIDS=FALSE);
CREATE VIEW vwinventorysummary As
SELECT i.item_id, i.item_name,
SUM(CASE WHEN iu.project = 'Alzheimer''s'
THEN iu.num_used ELSE 0 END) As total_num_used_altz,
SUM(CASE WHEN iu.project = 'MS' THEN iu.num_used ELSE 0 END) As total_num_used_ms,
CAST(NULL As integer) As add_num_used_altz,
CAST(NULL As integer) As add_num_used_ms,
CAST(NULL As integer) As add_num_ordered,
SUM(COALESCE(iu.num_ordered,0)) - SUM(COALESCE(iu.num_used,0)) As num_remaining
FROM inventory i LEFT JOIN inventory_flow iu ON i.item_id = iu.item_id
GROUP BY i.item_id, i.item_name;
CREATE RULE updinventory AS
ON UPDATE TO vwinventorysummary
DO INSTEAD (
UPDATE inventory
SET item_name = NEW.item_name WHERE inventory.item_id = NEW.item_id;
INSERT INTO inventory_flow(item_id, project, num_used, num_ordered)
SELECT NEW.item_id, 'Alzheimer''s', NEW.add_num_used_altz, 0
WHERE NEW.add_num_used_altz IS NOT NULL;
INSERT INTO inventory_flow(item_id, project, num_used, num_ordered)
SELECT NEW.item_id, 'MS', NEW.add_num_used_ms, 0
WHERE NEW.add_num_used_ms IS NOT NULL;
INSERT INTO inventory_flow(item_id, project, num_used, num_ordered)
SELECT NEW.item_id, 'Resupply', 0, NEW.add_num_ordered
WHERE NEW.add_num_ordered IS NOT NULL;);
CREATE RULE insinventory AS
ON INSERT TO vwinventorysummary
DO INSTEAD (
INSERT INTO inventory (item_name) VALUES (NEW.item_name);
INSERT INTO inventory_flow (item_id, project, num_used)
SELECT i.item_id AS new_itemid, 'Altzeimer''s', NEW.add_num_used_altz
FROM inventory i
WHERE i.item_name = NEW.item_name
AND NEW.add_num_used_altz IS NOT NULL;
INSERT INTO inventory_flow (item_id, project, num_used)
SELECT i.item_id AS new_item_id, 'MS', NEW.add_num_used_ms
FROM inventory i
WHERE i.item_name = NEW.item_name AND
NEW.add_num_used_ms IS NOT NULL;
INSERT INTO inventory_flow (item_id, project, num_ordered)
SELECT i.item_id AS new_item_id, 'Initial Supply', NEW.add_num_ordered
FROM inventory i
WHERE i.item_name = NEW.item_name AND
NEW.add_num_ordered IS NOT NULL;
);
Now look at what happens when we insert and update our view
--NOTE: here we are using the new multi-row valued insert feature introduced in 8.2
INSERT INTO vwinventorysummary(item_name, add_num_ordered)
VALUES ('Phenol (ml)', 1000), ('Cesium Chloride (g)', 20000),
('Chloroform (ml)', 10000), ('DNA Ligase (ml)', 100);
UPDATE vwinventorysummary
SET add_num_used_ms = 5, add_num_used_altz = 6 WHERE item_name = 'Cesium Chloride (g)';
UPDATE vwinventorysummary SET add_num_used_ms = 2 WHERE item_name = 'Phenol (ml)';
UPDATE vwinventorysummary SET item_name = 'CSCL (g)' WHERE item_name = 'Cesium Chloride (g)';
The slick thing about this is if you were to create a linked table in something like say Microsoft Access and designated item_id as the primary key,
then the user could simply open up the table and update as normally and behind the scenes the rules would be working to do the right thing.
|