Friday, July 01. 2016
Continue reading "regexp_split_to_table and string_to_array unnest performance"
Whenever you need to split a text into multiple records breaking by some delimeter, there are two common options that PostgreSQL provides. The first is
regpexp_split_to_table and then next popular is using the unnest function in combination with string_to_array.
Here is an example using regexp_split_to_table:
FROM regexp_split_to_table('john,smith,jones', ',') AS a;
You can achieve the same result by using the construct:
FROM unnest(string_to_array('john,smith,jones', ',')) AS a;
With short text you won't notice much perfomance difference. But what happens if we pass in a humungous text?
Sunday, March 17. 2013
We have all the setup executables for PostgreSQL 8.4-9.2 32-bit and
PostgreSQL 9.1-9.2 x64-bit for PostGIS 2.0.3. We'll have the 9.0x64-bit
ones up shortly but do have the binaries.
We are planning to release these to stackbuilder sometime next week, but
would like some feedback first with any issues people run into.
One person already reported a serious bug in our original packaging which we
fixed. If you already installed 2.0.3 the x64-bit version or any of the 9.2
versions (both x32 and x64) that was not marked as -2.zip, -2.exe, you'll
want to reinstall. The 8.4-9.1 32-bits were not affected so just have a -1.
Packages can be downloaded from:
General details on what is included:
As always if you are installing for PostgreSQL x64 make sure to pick a zip
or installer with 64 in the name. The unmarked or marked with 32 are just
for PostgreSQL 32-bit.
Friday, June 08. 2012
Continue reading "Finding contiguous primary keys"
I recently had the need to figure out which ranges of my keys were contiguously numbered. The related exercise is finding gaps in data as well.
Reasons might be because you need to determine what data did not get copied or what records got deleted. There are lots of ways of accomplishing this, but this is the
first that came to mind. This approach uses window aggregates
lead function and common table expressions, so requires PostgreSQL 8.4+
Monday, January 16. 2012
Continue reading "Table Inheritance and the tableoid"
If I could name a number one feature I love most about PostgreSQL, it's the table inheritance feature which we described in How to Inherit and Uninherit. A lot of people use it for table partitioning using CONSTRAINT EXCLUSION. Aside from that, in combination with PostgreSQL schema search_path (customizable by user and/or database) it makes for a very flexible abstraction tool. For example, for many of our web apps that service many departments where each department/client wants to keep a high level of autonomy, we have a schema set aside for each
that inherits from a master template schema. Each department site uses a different set of accounts with the primary schema being that of the department/client so that they are hitting their own tables.
Inheritance allows us to keep data separate,do roll-up reports if we need to, use the same application front-end, and yet allows us the ability to add new columns in just one place (the master template schema). It is more flexible than other approaches because for example we may have a city organization that need to share tables, like for example a system loaded list of funding source shared across the agency. We can set aside these shared tables in a separate schema visible to all or have some have their own copy they can change if they don't want to use the shared one.
Every once in a while, we find ourselves needing to query the whole hierarchy and needing to know which table the results of the query are coming from. To help
solve that issue, we employ the use of the system column tableoid which all user tables have. The tableoid is the the object id of a table. PostgreSQL has many system columns that you have to explicitly select
and can't be accessed with a SELECT * with the tableoid being one of them. These are: tableoid, cmax,cmin, xmin,xmax,ctid which are all described in System Columns. The PostgreSQL docs on inheritance have examples of using it, but we thought it worthwile to repeat the exercise since it's not that common knowledge and is unique enough feature of PostgreSQL that others coming from other relational databases, may miss the treat. I've often demonstrated
it to non-PostgreSQL users who use for example SQL Server or MySQL, and they literally fall out of their chair when I show the feature to them and its endless possibilities.
Sunday, January 08. 2012
Continue reading "The wonders of Any Element"
PostgreSQL has this interesting placeholder called anyelement which it has had for a long time and its complement anyarray. They are used when you want to define a function that can handle many types arguments or can output many types of outputs. They are particularly useful for defining aggregates, which we demonstrated in
Who's on First and Who's on Last and several other aggregate articles.
Anyelement / anyarray can be used just as conveniently in other functions. The main gotcha is that when you pass in the first anyelement/anyarray all subsequent anyelement / anyarray must match the same data type as the first anyelement / anyarray.
Friday, November 11. 2011
Continue reading "XPathing XML data with PostgreSQL"
One of my favorite tools and I think that of many folks working with GIS and other kinds of Multimedia is the GDAL/OGR suite.
Though I started using it to conquer GIS ETL activities, I found myself using it for problems that are inherently not GIS at all. I talked
about the GDAL OGR2OGR component a while ago in GDAL OGR2OGR for Data Loading
and this time I'll talk tangentially about its raster capabilities. It is a fantastic tool for converting between various raster formats and applying various raster operations.
In PostGIS world the new 2.0 raster functionality puts an SQL wrapper around much of its power. I'm not going to talk about that though except as a fleeting comment to explore later (we've got cool 2 band Map Algebra in PostGIS 2.0 to flaunt its stuff).
So what does this have to do with XPathing XML data with PostgreSQL? Well that's what I'm going to talk about what to do with machine generated data that comes at you in XML format.
A lot of machine generated data is hitting us in an XML like form. I talked about GPX data and navigating that in Which by the way GDAL/OGR can load and export easily into/out of a PostGIS enabled database.
GDAL exposes another kind of machine generated data in XML format which turns out to be very useful for all kinds of things. This is Exchangeable image file format (EXIF) data. There are all kinds
of random text information embedded in pictures and this varies depending on what camera is taking it. Newer cameras like the ones you have built into your iphone or android
embed location based information into them sometimes like where you were standing when you took the picture. Most cameras these days embed the time the picture was taken.
This information is important because if you are taking electronic notes while you are snapping your pictures, it provides an easy way to match up your notes with the picture about the object. So what does this EXIF info look like when you point GDAL at it? We'll see.
Tuesday, November 01. 2011
Ever have the need to create a holding table say spreadsheet data with say 100 columns. You need to create a table to hold this stuff. Or perhaps you were feeling in a sadist mood and wanted to abuse your PostgreSQL database to see how many columns you can create in a table of a specific data type.
Here is a quick script to do it:
SELECT 'CREATE TABLE data_import('
|| array_to_string(array_agg('field' || i::text || ' varchar(255)'), ',') || ');'
FROM generate_series(1,10) As i;
SELECT 'CREATE TABLE data_import('
|| string_agg('field' || i::text || ' varchar(255)', ',') || ');'
FROM generate_series(1,10) As i;
Both variants will return output that looks like this:
CREATE TABLE data_import(field1 varchar(255),field2 varchar(255),field3 varchar(255),field4 varchar(255)
,field5 varchar(255),field6 varchar(255),field7 varchar(255)
,field8 varchar(255),field9 varchar(255),field10 varchar(255));
Now if you want it to also execute because you are running it as part of an sql script, you could wrap it in an anonymous function.
DO language 'plpgsql'
DECLARE var_sql text := 'CREATE TABLE data_import('
|| string_agg('field' || i::text || ' varchar(255)', ',') || ');'
FROM generate_series(1,10) As i;
Wednesday, October 05. 2011
In most release notices, it's the big shiny sexy features that get all the glamor, but in reality on day to day use
it's the small usability enhancements that make the most difference. I'm reminded about this now that I'm working
on upgrade scripts and extensions for PostGIS. There are a couple of new features that make application upgrades easier that I
regret not having in older versions of PostgreSQL we support and additional ones I had in other databases that I find lacking in PostgreSQL. PostgreSQL 8.2 for example brought us
DROP IF EXISTS ...
and all I can say is thank goodness we dropped support of prior versions of PostgreSQL in PostGIS 1.4 otherwise developing upgrade scripts would have been more of a nightmare.
PostgreSQL 8.4 introduced the ability to add additional columns to a view using
CREATE OR REPLACE VIEW as
long as those columns were at the end of the view which Gabrielle Roth demonstrates an example of in This week’s find: CREATE OR REPLACE VIEW
If you were a MySQL user or application developer not having such features would be one reason to frown on PostgreSQL
and MySQL users and other database converts still have reasons to frown for lack of usability features they had
in their other database that they feel naked without in PostgreSQL.
In 9.1 we got two new DDL commands not much talked about that I am very excited about.
CREATE TABLE .. IF NOT EXISTS. I can't tell you how many times I've heard MySQL users whine about the lack of this in PostgreSQL
and I felt their pain. It would be really nice to have this feature for other things such as TYPES or even possibly a CREATE OR REPLACE TYPE which would allow
some alteration of types like adding attributes at the end.
- ALTER TYPE ..[ADD ATTRIBUTE] [RENAME ATTRIBUTE] [ADD VALUE].
The [ADD VALUE] is a clause specific to ENUM types which allows you to add new enum values before or after an existing. The lack of that feature
in prior versions was the major reason I stayed away from enums.
- And of cause my favorite CREATE EXTENSTION ALTER EXTENSION family which admittedly do get talked about a lot more often and which I'll discuss more in a later
I know it sounds like I'm complaining. That's because I am. Honestly though, I think the first step to caring about something is really taking notice of its
flaws and wanting to change them. The strength of an open source project is the ease with which it allows its developers and users to have a great impact on its direction. This is something I do think PostgreSQL excels much much better than most open source projects. I find a ton of flaws in PostGIS I'd like to change and have and I am greatful that PostGIS, like PostgreSQL is not resistant to change if the community wants it. If you are going to take notice of flaws in other products without admitting to your own or admitting that some things are easier in other products and learning from them, then you are a hypocrite or living in a closet. Now getting back to my complaining. Things I miss in PostgreSQL that I had in others which I'm sure I'm not alone.
- Being able to change a table column type of a table column that is used in a VIEW and have PostgreSQL just correct the type in the view
or allow me the option to change it later. This is something we had in SQL Server which Leo whines about often. Actually Leo's whining is more annoying than
the actual problem itself. The notice is at least very descriptive which is more than I can say for other databases.
- Being able to reorder columns in a table. Again something fairly trivial to do in SQL Server and MySQL but not possible in PostgreSQL.
Monday, September 26. 2011
Continue reading "Bulk Revoke of Permissions for Specific Group/User role"
UPDATE Turns out there is a simpler way of getting rid of roles that have explicit permissions to objects as Tom Lane pointed out in the comments.
DROP OWNED BY some_role;
Will drop the permissions to objects a user has rights to even if they don't own the objects. Of course this needs to be applied with caution since it will drop tables
and other things you don't want necessarily dropped. So it is best to first run a:
REASSIGN OWNED BY some_role TO new_role;
And then run the DROP OWNED BY.
The REASSIGN OWNED BY which is what we did originally is not sufficient since it doesn't drop the permissions or reassign
them as we assumed it would. This is noted in the docs.
And then you will be allowed to
DROP ROLE some_role
One of the things that is still tricky in PostgreSQL is permission management. Even though 9.0 brought us default privileges and the like, these permissions aren't
retroactive so still a pain to deal with if you already have objects defined in your database.
One of the annoyances we come across with is deleting roles. Lets say you have a role and it has explicit permissions to an object.
PostgreSQL won't allow you to delete this role if it owns objects or has explicit permissions to objects. In order to delete it seems you have
to go in and clear out all those permissions. To help with that -- we wrote a quickie script that will generate a script to revoke all permissions on objects
for a specific role. It looks like this:
Monday, June 27. 2011
Continue reading "Using PgAdmin PLPgSQL Debugger"
I'm one of those old-fashioned folks that debugs with print lines and raise notices. They're nice.
They always work, you can put clock time stops in there and don't require any fancy configuration.
At a certain point you do have to pull out a real debugger to see what is going on. This often
happens when your one-liners are no longer good enough and now you have to write 20 liners of plpgsql code.
Such is the case with geocoding and the PostGIS tiger geocoder specifically. Lots of interest has revived
on that with people submitting bug reports and we've got paying clients in need of a fairly easy and speedy drop-in geocoder
that can be molded to handle such things as road way locations, badly mis-spelled real estate data, or just simply
to get rid of their dependency on Google, Yahoo, MapQuest, ESRI and other online or pricey geocoding tools.
So I thought I'd take this opportunity to supplement our old-fashioned debugging with plpgsqldebugger goodness.
In this article, we'll show you how to configure the plpgsql debugger integrated in PgAdmin and run with it.
Thursday, June 16. 2011
Continue reading "State of PostGIS and PostGIS Windows binaries for 9.1 9.0 and 8.4"
PostGIS 2.0.0 has inched a lot closer to completion. This past week, Paul enabled his gserialization work which changed the on disk-format of PostGIS and in return I think we'll have a much better platform to grow on. With this change we now have the 3D index and bounding box bindings in place. Say hello to the &&& operator which is like &&, but is 3D aware and comes with its own companion 3D spatial indexes. This will allow you to do true 3D bounding box searches with any of the new 2.5/3D geometries we have in place for PostGIS 2.0.0. We are still noodling out the semantics of boxes. Read Paul's call for action on the The Box Plan?, if you are interested. PostgreSQL 8.4 is the lowest supported version for PostGIS 2.0.0. It took a bit of squabbling between PSC members to make that decision, but I put my foot down and I think in the end was for the best to allow us to use new features, less platforms to test, and get rid of some unnecessary code.
PostGIS Windows 32-bit Experimental builds fresh off the presses
With all these changes, if you are running an earlier alpha release of PostGIS 2.0.0, you'll need to do a dump restore since the on disk format is now changed.
If you are on windows and want to give some of this all a test drive, you can download one of our PostGIS 2.0.0 Windows experimental builds. We still only have 32-bit builds. We have builds
for PostgreSQL 8.4, PostgreSQL 9.0, and PostgreSQL 9.1 beta 2. The problems we faced in PostgreSQL 9.1 beta 1 were resolved in beta 2 so that most regress tests past except some minor ones involving stupid things like difference in line number marking of errors. Complement your PostgreSQL 9.1 beta 2 meal with a yummy large helping of PostGIS 2.0.0 goodness.
Friday, June 03. 2011
Continue reading "Variadic Functions in PostgreSQL"
PostgreSQL 8.4 introduced the ability to create user-defined variadic functions. These are basically
functions that take as input an undefined number of arguments where the argument that is an undefined number are all of the same type and are the last input arguments. Depesz went over it two years ago in Waiting for 8.4 variadic functions,
so we are a bit late to the party. In a nutshell -- variadic functions are syntactic sugar for functions that would otherwise take arrays. In this article we'll provide some more demonstrations of them to supplement Depesz article.
I was reminded that I had never explored this feature, when recently documenting one of the
new PostGIS 2.0 Raster functions - ST_Reclass which employs this feature.
I think ST_Reclass is a superb function and one of my favorite raster functions thus far that I hope to put to good use soon. Our new PostGIS family member,Bborie Park, is running thru our
PostGIS Raster milestones much faster than I had dreamed. He's already implemented a good chunk of stuff we discussed in Chapter 13 - PostGIS Raster and had stated you probably won't see in PostGIS 2.0. He's
going a bit faster than I can catalog them, so the documentation is already embarrassingly behind the fantastic functionality that is already present in PostGIS 2.0.
Tuesday, May 10. 2011
What is the difference between CURRENT_TIMESTAMP and clock_timestamp()
CURRENT_TIMESTAMP is an ANSI-SQL Standard variable you will find in many relational databases including PostgreSQL, SQL Server, Firebird, IBM DB2 and MySQL to name a few
that records the start of the transaction. The important thing to keep in mind about it is there is only one entry per transaction so if you have a long running transaction,
you won't be seeing it changing as you go along.
clock_timestamp() is a PostgreSQL function that always returns the current clock's timestamp. I don't think I'm alone in using it for doing simple benchmarking and other things
where for example I need to record the timings of each part of a function within the function using pedestrian RAISE NOTICE debug print statements.
There is another cool way I like using it, and that is for a batch of records each with an expensive function call, benchmarking how long it takes to process each record.
One of the things I'm working on is improving the speed of the tiger_geocoder packaged in PostGIS 2.0. The first root of attack seemed to me would be the normalize_address function
which I was noticing was taking anywhere from 10% to 50% of my time in the geocode process. That's a ton of time if you are trying to batch geocode a ton of records. The thing is
the function is very particular to how badly formed the address is so a whole batch could be held up by one bad apple and since the batch doesn't return until all are processed, it makes
the whole thing seem to take a while.
So rather than looping thru each, I thought it would be cool if I could run the batch, but for each record have it tell me how long it took to process relative to the rest so I could get
a sense of what a problem address looks like. So I wrote this query:
the_time - COALESCE(lag(the_time) OVER(ORDER BY the_time), CURRENT_TIMESTAMP) As process_time,
the_time - CURRENT_TIMESTAMP As diff_from_start
FROM (SELECT address_1, city, state, zip,
pprint_addy(normalize_address(coalesce(address_1,'') || ', ' || coalesce(city || ' ','') || state || ' ' || zip)) As pp_addr,
clock_timestamp() As the_time
FROM testgeocode LIMIT 1000) As foo )
WHERE process_time > '00:00:00.016'::interval;
Which returned an output something like this:
address_1 | city | state | zip | pp_addr | the_time | process_time | diff_from_start
48 MAIN ST .. | S.. | MA | 021.. | 48 MAIN .. | 2011-05-10 03:24:43.078-04 | 00:00:00.032 | 00:00:00.032
15 ... | | MA | 018... | 15 GREN... | 2011-05-10 03:24:50.796-04 | 00:00:00.031 | 00:00:07.75
Friday, April 08. 2011
Continue reading "Using RETURNS TABLE vs. OUT parameters"
In a prior article Use of Out and InOut Parameters
we demonstrated how to use OUT parameters and INOUT parameters to return a set of records from a PostgreSQL function.
There is another approach to doing this, and that is to use the ANSI Standard RETURNS TABLE construct.
If you come from a SQL Server or IBM DB2 background, the RETURNS TABLE construct is probably most familiar, but still
how you use it and what is legal in it is a little different than it is in SQL Server or IBM DB2. We'll save the
contrast compare as a topic for another article.
In terms of performance between using OUT vs. RETURNS TABLE, we haven't noticed much of a difference. The main thing that is
nice about RETURNS TABLE is just that it's syntactically more pleasing in the sense that its clearer the structure of what you are returning.
In these next examples, we'll demonstrate similar examples we showed in the aforementioned article except using the
Be warned that the
RETURNS TABLE construct is only available for PostgreSQL 8.4+, while the OUT approach
has existed since PostgreSQL 8.1. With that said, if you need your code to work on 8.3 or lower, you can't use RETURNS TABLE.
When in doubt about a feature and you are creating code that needs to support earlier versions of PostgreSQL
(as we have to in the PostGIS development group),
or you want to get stubborn users off old antiquated versions of PostgreSQL and need a bit of ammunition
(as we have to (on PostGIS development including our own developers - and you know who you are :) ) )
PostgreSQL feature matrix.
It will save you a lot of grief.
Wednesday, March 30. 2011
I am happy to report, that the final proof of the PostGIS in Action E-Book got released today
and the printed version is scheduled for release Aprill 11th, 2011 and should be available on Amazon and other locations around then. The other e-Reader formats will come after that.
You can buy from here or download the two free chapters, if you haven't already.
Each hard-copy purchase comes with a free E-Book version. There is a coupon in the back of the book when you get it to get the E-Book versions.
Yes, I know it's been a really really long time.
On the bright side, we produced twice as much content as we had set out to do and that was with keeping things as concise as we
could get away with, still managing to cover more than we set out to cover, and stripping out as many unnecessary words as we could muster.
So 520 pages and almost 2 years later, this is where we are.
A good chunk of the additional bulk of the book was the appendices which are about 150 pages
total and focus strictly on PostgreSQL and SQL. After many comments from early reviewers, we thought it unfair not to have a good chunk of PostgreSQL
and just general relational database content to familiarize programmers and GIS folks with the RDBMS that PostGIS lives in. Most GIS folk unfortunately
have the hardest time with getting up to speed with SQL and just standard RDBMS management.
Two free chapters and accompanying code for all chapters
The two free chapters we selectively picked because we thought they would be most beneficial to newcomers and people new to relational databases.
So the free chapters are:
- Chapter 1: What is a spatial database? Which provides a fast paced history of PostGIS, PostgreSQL, Spatial Databases and moves into
an even faster journey into converting flat file restaurant locations to spatial point geometries, loading in an ESRI shapefile of roads. Then shows you how to write standard
spatial queries and render the results.
- Appendix C: SQL Primer -- goes through querying information_schemas, the common points of writing SELECT, INSERT, UPDATE, DELETE SQL statements and the finer points of using aggregate functions, Windowing constructs and common table expressions as well
as a brief overview of how PostgreSQL stacks up with other relational databases (SQL Server, Oracle, IBM DB2, MySQL, Firebird) in SQL features.
- All the chapter code and accompanying data. It's a bit hefty at 57 MB.
So even if you don't buy our book, we hope you find the free chapters useful.
You can get a more detailed listing of all the chapters from the PostGIS in Action book site.
We'd like to thank all those who supported us through this long and unpredictable journey. Hopefully we'll have several more, though hopefully
a bit less nerve-racking than this first one.