Sunday, December 11. 2011
Recommended Books: SQL and Relational Theory: How to write accurate SQL code SQL Pocket guide
In our article The Pure Relational database is dead there were a lot of misunderstandings as a result of our poor choice of words. People thought we were bashing the relational model because in their mind that was what pure meant. I got hit with a lot of poetic insults. I still can't think of an alternative word to use for what I meant. Simple doesn't really do it as even relational databases with just standard types were far from simple when you consider the planner and all the other stuff going on under the hood to protect you from the underlying storage structure. What I was trying to say is that in the beginning most relational databases just supported a standard set of types which you could not expand on and most people when they think relational today still think just that. That type of relational database is in my book dead or almost dead.
How did this all start. Well whenever we use something like PostgreSQL to store anything complex -- take your pick: geometry data, tree like structures which we use ltree for, full-text query constructs, and Yes XML we get bashed by some know-it-all who has a very narrow view of what a relational database should be doing and suggesting we use a NoSQL database, a graph engine or a full text engine or normalize our data more. I have also learned XML is a dirty word to many people. I mistakenly thought XML was a complex type people could relate to, but turns out they can relate to it so well that it brings up tragic memories I can only equate to Post Traumatic Stress Disorder suffered by war veterans or (early or wrong) technology adopters. That was not my intent either. XML was just merely an example. I will not say you should use XML in your tables, but I will also not say you should stay clear of it as many people wanted me to say. I will say its use is rare, but it has its place. It has its place just as any other complex type and it has its own special needs for navigation, indexing etc. which many relational databases handle fine enough.
Continue reading "The Relational Model is very much alive"
Sunday, December 04. 2011
What happens when you take a bunch of auto-generated tests and test results which you burn into rasters with SQL? You get a code fingerprint of sorts. Well that was the dream I wanted to explore. Now that PostGIS 2.0 has introduced a new spatial type called raster and lots of functionality including some very cool Map Algebra functionality to go with it, this is now possible without too much fuss. My first attempt at it didn't produce quite what I expected. I have to admit it is absolutely gorgeous and will be hopefully useful once I figure out how to read it, tweak the rendering process to achieve more of a breakout of results, or patent it as a fabric style.
Now for the story of how a computer nerd is born and how rasters can be useful to nerds.
Continue reading "Code Profiling with PostGIS raster"
Saturday, December 03. 2011
A lot of redditers took offense at our article XPathing XML data with PostgreSQL with the general consensus, if you are going to be stuffing XML in a relational database where will you stop? That is not what relational databases are designed for. We had comitted a sacrilegious sin and worsed yet encouraging bad habits by forcing people to think more about different options they have for storing data in a relational database and god forbid demonstrating querying such columns with xml specific functions. What were we thinking? How dare we try to query XML data with SQL? Perhaps we were thinking like this guy or this guy, both equally misguided spatial relational database folk. Of course we stepped one foot further by actually defining a column as xml and dare storing data in it for later consumption rather than just an intermediary step.
If I want to store documents, that are navigateable I should be using a document database like MongoDb, CouchDB etc designed for that kind of stuff. If I've got graphs I should be using a graph database. This got me thinking that the "Pure Relational Database" is dead, and I'm surprised most people don't seem to realize it.
So while "Relational databases" have changed over the last 25 years, most people's notions of them have not kept up with the pace of its change.
First let me define what I mean by Pure. A pure relational database is one with standard meat and potato types like text, dates, numbers well suited for counting money and computing how close the world is to total bankruptcy which you store as fields in a row of a table and that you then define foreign keys / constraints / primary keys on to relate them to other tables. You reconstitute your real world objects by stitching these tables together with joins etc and return sets using where conditions, summarize by using group bys or other mathy like constructs. Don't get me wrong; these are very beautiful things because they allow for easy slicing of dimensions and not having to think about all the dimensions that make up an object all at once. In fact it was so beautiful that some people thought, "wow that's cool, but it would be even cooler if I could store more complex objects in those columns with their own specific needs for querying." and so was born the object relational database as some people refer to them that thought relational but also understood that different types had their own unique needs for querying, storage, indexing etc.
Nowadays most, if not all, relational like databases have standardized on some variant of SQL. In essence though, the pure relational database doesn't allow you to define new types or have exotic types such as arrays, xml, graphs, geometries, rasters, sparse matrices etc. Much less thinking involved and less likely you will shoot yourself in the foot by dumping a bunch of xml in a field and trying to do something with it. When it is used to store more complex things such as spreadsheets and other user documents, these are stored as blobs and just retrieved. Even such use is frowned upon.
Well most relational databases I can think of nowadays have richer types: e.g. PostgreSQL, Oracle and Firebird all support arrays as a column type. Some even allow you to define custom types and functions to support your custom types e.g. PostgreSQL (I could go on forever), Oracle has rich user defined type support too, and SQL Server 2005+ with each version getting better and better for user defined custom types and introducing more exotic types and support infrastructure. Even MySQL/Drizzle (mostly in the form of different storage engines). Even my favorite light-weight SQLite under the hood has some tricks that aren't what I would call relational. E.g. Spatialite/RasterLite has a whole geometry type library built on SQLite with functions you can call from SQL and I'm sure there are lots of middleware tools you don't know about using the SQLite and Firebird engine for more than relational tasks (e.g. HTML5 anyone/ CAD anyone).
Monday, November 21. 2011
In this day and age of clouds and what not, it is still important to keep tabs on your database growth to ensure it doesn't outgrow your disks or so that you can be prepared when it does. A common question that is asked is how big your database is or all your databases combined. Luckily PostgreSQL has had for a long time functions to help you answer this easily. In this little article, I'll demonstrate how to answer these questions and more with some PostgreSQL built-in functions
Continue reading "How big is my database and my other stuff"
Friday, November 11. 2011
Recommended Books: PostgreSQL 9.0 Reference Manual - Volume 1A: The SQL Language
One of my favorite tools and I think that of many folks working with GIS and other kinds of Multimedia is the GDAL/OGR suite. Though I started using it to conquer GIS ETL activities, I found myself using it for problems that are inherently not GIS at all. I talked about the GDAL OGR2OGR component a while ago in GDAL OGR2OGR for Data Loading and this time I'll talk tangentially about its raster capabilities. It is a fantastic tool for converting between various raster formats and applying various raster operations. In PostGIS world the new 2.0 raster functionality puts an SQL wrapper around much of its power. I'm not going to talk about that though except as a fleeting comment to explore later (we've got cool 2 band Map Algebra in PostGIS 2.0 to flaunt its stuff). So what does this have to do with XPathing XML data with PostgreSQL? Well that's what I'm going to talk about what to do with machine generated data that comes at you in XML format.
A lot of machine generated data is hitting us in an XML like form. I talked about GPX data and navigating that in Which by the way GDAL/OGR can load and export easily into/out of a PostGIS enabled database.
GDAL exposes another kind of machine generated data in XML format which turns out to be very useful for all kinds of things. This is Exchangeable image file format (EXIF) data. There are all kinds of random text information embedded in pictures and this varies depending on what camera is taking it. Newer cameras like the ones you have built into your iphone or android embed location based information into them sometimes like where you were standing when you took the picture. Most cameras these days embed the time the picture was taken. This information is important because if you are taking electronic notes while you are snapping your pictures, it provides an easy way to match up your notes with the picture about the object. So what does this EXIF info look like when you point GDAL at it? We'll see.
Continue reading "XPathing XML data with PostgreSQL"
Thursday, November 03. 2011
As mentioned in Lessons Learned Packaging PostGIS extensions, I am working on PostGIS 2.0.0 extension packaging. One of the things I wanted to know was what objects, types, functions etc were installed by my extension. The new packaging system allows for cataloging this relatively easily, but I couldn't find a function or view for this and didn't see one mentioned in the manual, so I created this query which seems to work pretty well as far as I can tell. The basic idea being that any object that an extension depends on that is not an extension is part of the extension package.
The output looks like:
Continue reading "What objects are in a PostgreSQL installed extension"
Tuesday, November 01. 2011
Ever have the need to create a holding table say spreadsheet data with say 100 columns. You need to create a table to hold this stuff. Or perhaps you were feeling in a sadist mood and wanted to abuse your PostgreSQL database to see how many columns you can create in a table of a specific data type. Here is a quick script to do it:
Both variants will return output that looks like this:
CREATE TABLE data_import(field1 varchar(255),field2 varchar(255),field3 varchar(255),field4 varchar(255) ,field5 varchar(255),field6 varchar(255),field7 varchar(255) ,field8 varchar(255),field9 varchar(255),field10 varchar(255));
Now if you want it to also execute because you are running it as part of an sql script, you could wrap it in an anonymous function.
Friday, October 28. 2011
FOSS 4G 2011 Videos. My resident PostGIS developer strk says he can't see them because blip.tv is using some sort of proprietary video swf format. I can't really tell what he is talking about. Does anyone know if fosslic videos are available in other formats like ogg or a Gnash swf viewer compatible format?
We mentioned in prior article Our FOSS4G 2011 the new Stuff and provided the slides in that article. Now we have the video to go with it.Here is a partial list of PostGIS videos:
Continue reading "Many PostGIS FOSS4G 2011 videos have landed"
Thursday, October 27. 2011
One of the main features I love about PostgreSQL is its array support. This is a feature you won't find in most relational databases, and even databases that support some variant of it, don't allow you to use it as easily. It is one of the features that makes building aggregate functions wicked easy in PostgreSQL with no messy compiling required. Aside from building aggregate functions, it has some other common day uses. In this article, I'll cover two common ways we use them which I will refer to as the ANY and Contains tricks.
I like to think of this approach as YeSQL programming style: how SQL can be augmented by more complex data types and index retrieval mechanisms. Arrays and many other data types (spatial types, keyvalue (hstore), ltree etc) are far from relational structures, yet we can query them easily with SQL and can even relate them.
Continue reading "PostgreSQL Array: The ANY and Contains trick"
Tuesday, October 18. 2011
One of the great lessons learned in building PostGIS extensions is my rediscovery of SED. SED turned out to be mighty useful in this regard and I'll explain a bit in this article. Unfortunately there is still a lot I need to learn about it to take full advantage of it and most of my use can be summed up as monkey see, monkey scratch head, monkey do. In addition I came across what I shall refer to as Pain points with using the PostgreSQL Extension model. Part of which has a lot to do with the non-granular management of changes in PostGIS, the day to day major flux of changes happening in PostGIS 2.0 space, and my attempt at trying to creat upgrade freeze points amidst these changes. When PostGIS 2.0 finally arrives, the freeze points will be better defined and not change from day to day. So some of these issues may not be that big of a deal.
Continue reading "Lessons learned Packaging PostGIS Extensions: Part 2"
Friday, October 14. 2011
It is with sadness that I learned of the passing of Dennis Ritchie - inventor of C and who made much of Unix, other operating systems, and many software (including Postgres) possible. More details at Remembering Dennis Ritchie: Software Pioneer and Dennis Ritchie, in Memoriam.
Dennis Ritchie co-authored the book, The C Programming Language, a classic, which many of my peers grew up with. It was one of the textbooks at MIT for Civil Engineering 1.00 when we were attending when the course was essentially an introduction to programming with C.
The harsh irony is that when Steve Jobs passed away I was probably the only one around me who felt no remorse and hoped the curve fanaticism Jobs fueled would die with him. When Dennis Ritchie passed away I was probably one of the few around me who knew who he was and appreciated the great contributions he made to the computer industry.
Wednesday, October 12. 2011
This is about improvements to GIST indexes that I hope to see in PostgreSQL 9.2. One is a patch for possible inclusion in PostgreSQL 9.2 called SP-GiST, Space-Partitioned GiST created by Teodor Sigaev and Oleg Bartunov whose basic technique is described in SP-GiST: An Extensible Database Index for Supporting Space Partitioning Trees. For those who don't know Teodor and Oleg, they are the great fellows that brought us many other GiST and GIN goodnesses that many specialty PostgreSQL extensions enjoy -- e.g. PostGIS, trigrams, ltree, pgsphere, hstore, full-text search to name a few.
Another is a recent one just committed by Alexander Korotkov which I just recently found out about on New node splitting algorithm for GIST and admit I don't know enough about to judge. I have to admit to being very clueless when it comes to the innards of index implementations so don't ask me any technical details. It's one of those short-comings among the trillion others I have that I have learned to accept will probably never change.
What the SP-GIST patch will provide in terms of performance and speed was outlined in PGCon 2011: SP-GiST - a new indexing infrastructure for PostgreSQL Space-Partitioning trees in PostgreSQL.
What it provides specifically for PostGIS is summarized in Paul's call for action noted below. As a passionate user of PostGIS ,ltree, tsearch, and hstore, I'm pretty excited about these patches and other GIST and general index enhancements and there potential use in GIST dependent extensions. I'm hoping to see these spring to life in PostgreSQL 9.2 and think it will help to further push the envelope of where PostgreSQL can go as a defacto platform for cutting-edge technology and scientific research. I think one of PostgreSQL's greatest strength is its extensible index API.
Paul's PostGIS newsgroup note about seeking funding for faster GIST indexes , work done so far on SP-GIST and call for further action is rebroadcast in it's entirety here.
Thanks to the sponsorship of Michigan Technological University, we now have 50% of the work complete. There is a working patch at the commitfest https://commitfest.postgresql.org/action/patch_view?id=631 which provides quad-tree and kd-tree indexes. However, there is a problem: unless the patch is reviewed and goes through more QA/QC, it'll never get into PostgreSQL proper. In case you think I am kidding: we had a patch for KNN searching ready for the 9.0 release, but it wasn't reviewed in time, so we had to wait all the way through the 9.1 cycle to get it. I am looking for sponsors in the $5K to $10K range to complete this work. If you use PostgreSQL in your business, this is a chance to add a basic capability that may help you in all kinds of ways you don't expect. We're talking about faster geospatial indexes here, but this facility will also radically speed any partitioned space. (For example, the suffix-tree, which can search through URLs incredibly fast. Another example, you can use a suffix tree to very efficiently index geohash strings. Interesting.) If you think there's a possibility, please contact me and I will send you a prospectus you can take to your manager. Let's make this happen folks! Paul
Continue reading "Improving speed of GIST indexes in PostgreSQL 9.2"
Thursday, October 06. 2011
In prior articles we talked about the new PostgreSQL 9.1 extension model and upcoming PostGIS 2.0 extensions which we have experimental builds of so far. In this article and other's to follow, we shall provide a documentary of our venture into this new extensions world. We'll discuss some of the obstacles we had with building extensions, lessons learned, and foolishness exposed, with the hope that others can learn from our experience.
First off, the upcoming PostGIS 2.0 extensions will be packaged as at least two extensions -- postgis which will contain both PostGIS geometry/geography types, functions, meta views and tables as well as raster type and associated functions and tables. Topology support, while a part of upcoming PostGIS 2.0, will be packaged as a separate extension called postgis_topology. The main reason for breaking topology out as a separate extension is that it is always stored in a schema called topology and is not relocatable to another schema. The way the current extension model works, all the parts of your extension should live in the same schema. Later we plan to package tiger geocoder as an extension, but this one probably makes more sense to live on http://pgxn.org/ since it is only of interest to United States users, , is purely plpgsql with dependency on PostGIS, and we had beefed it up as part of a consulting contract for a company running PostGIS 1.5. It's the only piece documented in PostGIS 2.0 that works on 1.5 as well (aside from the tiger toplogy loader which has dependency on toplogy), although it has always lived as an extra in the PostGIS code base.
We'll probably package postgis_legacy_functions as an extension too for those people who badly need those 500 alias functions I chucked.
We mentioned in our prior article that we ran into some issues with how our extension worked -- e.g. topology referencing the postgis extension. Most of these turned out just to be ignorance on my part as to how the different pieces fit together and I'll elaborate on these.
Much of what will be described here is also documented in Packaging Related Objects into an Extension.
In the future I'm hoping we'll also see plr and pgrouting packaged as extensions which are common favorites of PostGIS users.
Continue reading "Lessons learned Packaging PostGIS Extensions: Part 1"
Wednesday, October 05. 2011
In most release notices, it's the big shiny sexy features that get all the glamor, but in reality on day to day use
it's the small usability enhancements that make the most difference. I'm reminded about this now that I'm working
on upgrade scripts and extensions for PostGIS. There are a couple of new features that make application upgrades easier that I
regret not having in older versions of PostgreSQL we support and additional ones I had in other databases that I find lacking in PostgreSQL. PostgreSQL 8.2 for example brought us
In 9.1 we got two new DDL commands not much talked about that I am very excited about.
I know it sounds like I'm complaining. That's because I am. Honestly though, I think the first step to caring about something is really taking notice of its flaws and wanting to change them. The strength of an open source project is the ease with which it allows its developers and users to have a great impact on its direction. This is something I do think PostgreSQL excels much much better than most open source projects. I find a ton of flaws in PostGIS I'd like to change and have and I am greatful that PostGIS, like PostgreSQL is not resistant to change if the community wants it. If you are going to take notice of flaws in other products without admitting to your own or admitting that some things are easier in other products and learning from them, then you are a hypocrite or living in a closet. Now getting back to my complaining. Things I miss in PostgreSQL that I had in others which I'm sure I'm not alone.
Thursday, September 29. 2011
Lots of people have been asking the never ending question of when PostGIS is going to get on the band wagon and support KNN GIST like other GIST based types trigrams, full text search etc. Well it's happened in PostGIS 2.0 and now committed. More of the gory details at Indexed Nearest Neighbour Search in PostGIS. In short this will make point / point distance searches and rankings way way faster and help also with other distance searches by providing approximations to start with.
We are still preparing the PostgreSQL 9.1 2.0 32-bit windows builds that will have this functionality and should have that ready in the next day or so.
To summarize what you can expect. We spent a lot of time discussing and were torn between a box distance operator <#> and a centroid box distance operator <->, so we ended up having both. The reason being is that for some kinds of geometries e.g. streets that aren't diagonal a box distance operator seems to be a much better approximation of distance than a centroid box distance operator. For points of course the two concepts are the same and not an approximation so point / point distance you'd be better off using the new KNN sorting than ST_Distance + ST_DWithin as we have suggested in past. I'll be doing some benchmarks in the coming weeks comparing the old way and speed differences you can expect and perhaps throwing together box and centroid cocktails that combine the two weapons into thought provoking WTFs (or as Dave Fetter would say "That's very Rube Goldberg of you").
I suspect I'll probably be sticking with <#> because I like the symbol better and I was one of the ones fighting for it :).
Syndicate This Blog
Show tagged entries