Sunday, August 12. 2012
Printer Friendly
There has been a lot of talk lately about schemaless models touted by NoSQL groups and how PostgreSQL fits into this New world order.
Is PostgreSQL Object-Relational? Is it Multi-Model. We tend to think of PostgreSQL as type liberal and it's liberalness gets more liberal with each new release. PostgreSQL is fundamentally relational, but has little bias about what data types define each column of related tables. One of PostgreSQL great strengths is the ease with which different types can coexist in the same table and the flexible index plumbing and plan optimizer it provides that allows each type, regardless of how wild, to take full advantage of various index strategies and custom index bindings. Our 3 favorite custom non-built-in types we use in our workflow are
PostGIS (of course), LTree (Hierarchical Type), and HStore (Key-Value type). In some cases, we may use all 3 in the same database and sometimes the same table - where we use PostGIS for spatial location, LTree for logical location, and Hstore just to keep track of random facts about an object that are easier to access than having a separate related table and are too random to warrant devoting a separate column for each. Sometimes we are guilty of using xml as well when we haven't figured out what schema model best fits a piece of data and hstore is too flat of a type to work. The advent of JSON in PostgreSQL 9.2 does provide for a nested schema-less model similar to what the XML type offers, but more JavaScript friendly. I personally see JSON as more of a useful transport type than one I'd build my business around or a type you'd use when you haven't figured out what if any structure is most suitable for your data. When you have no clue what structure a piece of data should be stored, you should let the data tell you what structure it wants to be stored in and only then will you discover by storing it in a somewhat liberal fashion how best to retrofit in a more structural self-descriptive manner. Schemas are great because they are self-describing, but they are not great when your data does not want to sit in a self-described bucket. You may find in the end that some data is just wild and refuses to stay between the lines and then by all means stuff it in xml or json or create a whole new type suit it feels comfortable in.
Saturday, July 21. 2012
Printer Friendly
Today's modern web application workflow in its simplest form looks something like this:
- Get dataset as JSON object usually using yet another JSON query object to pass the request using a javascript framework like JQuery/ExtJS/OpenLayers/Leaflet
etc.
- Make changes to JSON dataset object and send back to the web server.
- On webserver unravel the JSON object and save to respective database tables. This part is really yucky as it often involves the web application
server side language doing the unraveling and then yet another step of setting up stored procedures or other update logic to consume it.
We hate the way people build tiers
for the same reason Cartman hates lines at the amusement park.
Sure tiers are great for certain things like building connected microcosms, but most of the time they are overkill
and if applied too early make your application needlessly complicated. In the end all we care about is data: serving data, analyzing data, getting good data and everything else is just peacock feathers.
The introduction of JSON type support in PostgreSQL 9.2 and languages PL/V8 (PL/Javascript) and its Pythoness-like twin PL/Coffee
provides several options for bringing your data and application closer together since they have native support for JSON.
In this first part we'll demonstrate one: An upsert stored procedure that takes a single JSON object instead of separate args and updates existing data and adds missing records.
In later articles we'll show
you the front end app and also add a sprinkle of PostGIS in there to demonstrate working with custom types.
Continue reading "PLV8JS and PLCoffee Part 1: Upserting"
Monday, July 16. 2012
Printer Friendly
I have updated instructions on my gist page for building with PostgreSQL 9.4 Build v8 and plv8
As mentioned in our previous article Building on MingW deploying on VC we often build on MingW and deploy on Windows servers running EDB distributed VC PostgreSQL builds
for extensions we want that don't come packaged. One of the new ones we are really excited about is the PL/V8 and PL/Coffee ones. Could we do it
and would it actually work on a VC build. YES WE CAN and yes it does. I HAZ Coffee and a V8: .
Here are some instructions we hope others will find useful. Even if you aren't on
Windows, you might still find them useful since MingW behaves much like other Unix environments.
If you are on windows, and just want to start using PLV8 and PLCoffee. We have binary builds for both PostgreSQL 9.2 Windows 32-bit (pg92plv8jsbin_w32.zip) and PostgreSQL 9.2 Windows 64-bit (pg92plv8jsbin_w64.zip) which you should be able to just extract into your PostgreSQL 9.2 beta windows install. We quickly tested with EDB VC++ builds and they seem to work fine
on standard VC++ PostgreSQL 9.2beta2 installs. We haven't bothered building for lower PostgreSQL, but if there is some interest, we'd be happy to try.
Continue reading "Building PLV8JS and PLCoffee for Windows using MingW64 w64-w32"
Sunday, July 15. 2012
Printer Friendly
As we discussed in file_textarray_fdw Foreign Data Wrapper, Andrew Dunstan's text array foreign data wrapper works great for bringing in a delimited file and not having to worry about the column names until they are in.
We had demonstrated one way to tag the field names to avoid having to keep track of index locations, by using hstore and the header column in conjunction.
The problem with that is it doesn't work for jagged arrays. Jagged arrays are when not all rows have the same number of columns. I've jury rigged a small example
to demonstrate the issue. Luckily with the power of PostgreSQL arrays you can usually get around this issue and still have nice names for your columns. We'll demonstrate that too.
Continue reading "Foreign Data Wrap (FDW) Text Array, hstore, and Jagged Arrays"
Tuesday, July 10. 2012
Printer Friendly
Our new book PostgreSQL: Up and Running is officially out. It's available in hard-copy and e-Book version directly from O'Reilly,
Safari Books Online and available from Amazon in Kindle store. It should be available in hard-copy within the next week or so from other distributors.
Sadly we won't be attending OSCON this year, but there are several PostgreSQL talks going on. If you are speaking at a talk or other PostgreSQL related get together, and would like
to give out some free coupons of our book or get a free e-book copy for yourself to see if it's worth effort mentioning, please send us an e-mail: lr at pcorp.us .
Our main focus in writing the book is demonstrating features that make PostgreSQL uniquely poised for newer kinds of workflows with particular focus on PostgreSQL 9.1 and 9.2.
Part of the reason for this focus is our roots and that we wanted to write a short book to get a feel for the audience. We started to use PostgreSQL in 2001 because of
PostGIS, but were still predominantly SQL Server programmers. At the time SQL Server did not have a spatial component that integrated seamlessly with SQL.
As die-hard SQLers, PostGIS really turned us on. As years went by, we began to use PostgreSQL
not just for our spatial apps, but predominantly non-spatial ones as well that had heavy reporting needs and that we had a choice of platform.
So we came for PostGIS but stayed because of all the other neat features PostgreSQL had that we found lacking in SQL Server. Three off the bat
are arrays, regular expressions, and choice of procedural languages. Most other books on the market just treat PostgreSQL like it's any other relational database.
In a sense that's good because it demonstrates
that using PostgreSQL does not require a steep learning curve if you've used another relational database. We didn't spend as much time on these common features as we'd like to
in the book because it's a short book and we figure most users familiar with relational databases
are quite knowledgeable of common features from other experience. It's true that a lot of people coming to PostgreSQL are looking for cost savings,
ACID compliance, cross-platform support and decent speed
, but as PostgreSQL increases in speed, ease of features, and unique features, we think we'll be seeing more people migrating
just because its simply better than any other databases
for the new kinds of workflows we are seeing today -- e.g. BigData analysis, integration with other datasources, leveraging of domain specific languages in a more seamless way with data.
So what's that creature on the cover? It's an elephant shrew (sengi) and is neither an elephant nor a shrew, but closest in ancestry to the elephant, sea cow, and aardvark.
It is only found
in Africa (mostly East Africa around Kenya) and in zoos. It gets its name from its unusually long nose which it uses for sniffing out insect prey and keeping tabs on its mate. It has some other unusual habits:
it's a trail blazer building trails it uses to scout insect prey and also builds escape routes on the trail it memorizes to escape from predators. It's monogamous, but prefers to keep separate quarters from its mate. Males
will chase off other males and females will chase off other females. It's fast and can usually out-run its predators.
Friday, June 08. 2012
Printer Friendly
I recently had the need to figure out which ranges of my keys were contiguously numbered. The related exercise is finding gaps in data as well.
Reasons might be because you need to determine what data did not get copied or what records got deleted. There are lots of ways of accomplishing this, but this is the
first that came to mind. This approach uses window aggregates lead function and common table expressions, so requires PostgreSQL 8.4+
Continue reading "Finding contiguous primary keys"
Thursday, June 07. 2012
Printer Friendly
There is another new feature in 9.2 that doesn't get much press, and probably because it's hard to explain. It is a pretty useful feature if you are working with the new json type or the existing hstore type. In prior versions if you used a subquery and converted the rows to hstore or json the column names were not preserved. Andrew mentioned a back-port path for this issue in Upgradeable JSON. We described a workaround for this issue in Mail merging using hstore. The workaround for including PostGIS geometry in json record output as described in Native JSON type support wouldn't work as nicely without this enhancement. Here is an example to demonstrate.
Continue reading "PostgreSQL 9.2: Preserving column names of subqueries"
Monday, May 21. 2012
Printer Friendly
One new welcome feature in PostgreSQL 9.2 is the native json support and companion row_as_json and array_as_json functions. PostGIS also has a json function for outputting geographies and geometries in GeoJSON format which is almost a standard in web mapping.
Here is an example of how you'd use the new feature - create our test table
CREATE TABLE test(gid serial PRIMARY KEY, title text, geog geography(Point, 4326));
INSERT INTO test(title, geog)
VALUES('a'
, ST_GeogFromText('POINT(-71.057811 42.358274)'));
INSERT INTO test(title, geog)
VALUES('b'
, ST_GeogFromText('POINT(42.358274 -71.057811 )'));
Now with a command like this we can output all data as a single json object.
SELECT array_to_json(array_agg(t))
FROM test As t;
But there is a tincy little problem. Our geog outputs don't look anything like GeoJSON format. Our output looks like this:
[{"gid":1,"title":"a","geog":"0101000020E61000005796E82CB3C3
51C0E98024ECDB2D4540"}
,{"gid":2,"title":"b","geog":"0
101000020E6100000E98024ECDB2D45405796E82CB3C351C0"}]
To follow the GeoJSON standard, our geography object should output like this:
"geog":{"type":"Point","coordinates":[-71.057811000000001,42.358274000000002]}
Continue reading "PostgreSQL 9.2 Native JSON type support"
Printer Friendly
One of the things I'm excited about in PostgreSQL 9.2 are the new pg_dump section - pre-data, data, and post-data options and the exclude-table-data option. Andrew Dunstan blogged about this briefly in pg_dump exclude table data. What is also nice is that pgAdmin III 1.16 supports the section option via the graphical interface . I was a bit disappointed not to find the exclude-table-data option in pgAdmin III interface though.
The other nice thing about this feature is that you can use the PostgreSQL 9.2 dump even against a 9.1 or lower db and achieve the same benefit.
The 9.2 pg_restore has similar functionality for restoring specific sections of a backup too.
So what is all this section stuff for. Well it comes in particularly handy for upgrade scripts. I'll first explain what the sections mean and a concrete example of why you want this.
- pre-data - this would be the table structures, functions etc without the constraints such as check and primary key and indexes.
- data -- it's uhm the data
- post-data - This is all constraints, primary keys, indexes etc.
Continue reading "PostgreSQL 9.2 pg_dump enhancements"
Monday, January 16. 2012
Printer Friendly
If I could name a number one feature I love most about PostgreSQL, it's the table inheritance feature which we described in How to Inherit and Uninherit. A lot of people use it for table partitioning using CONSTRAINT EXCLUSION. Aside from that, in combination with PostgreSQL schema search_path (customizable by user and/or database) it makes for a very flexible abstraction tool. For example, for many of our web apps that service many departments where each department/client wants to keep a high level of autonomy, we have a schema set aside for each
that inherits from a master template schema. Each department site uses a different set of accounts with the primary schema being that of the department/client so that they are hitting their own tables.
Inheritance allows us to keep data separate,do roll-up reports if we need to, use the same application front-end, and yet allows us the ability to add new columns in just one place (the master template schema). It is more flexible than other approaches because for example we may have a city organization that need to share tables, like for example a system loaded list of funding source shared across the agency. We can set aside these shared tables in a separate schema visible to all or have some have their own copy they can change if they don't want to use the shared one.
Every once in a while, we find ourselves needing to query the whole hierarchy and needing to know which table the results of the query are coming from. To help
solve that issue, we employ the use of the system column tableoid which all user tables have. The tableoid is the the object id of a table. PostgreSQL has many system columns that you have to explicitly select
and can't be accessed with a SELECT * with the tableoid being one of them. These are: tableoid, cmax,cmin, xmin,xmax,ctid which are all described in System Columns. The PostgreSQL docs on inheritance have examples of using it, but we thought it worthwile to repeat the exercise since it's not that common knowledge and is unique enough feature of PostgreSQL that others coming from other relational databases, may miss the treat. I've often demonstrated
it to non-PostgreSQL users who use for example SQL Server or MySQL, and they literally fall out of their chair when I show the feature to them and its endless possibilities.
Continue reading "Table Inheritance and the tableoid"
Wednesday, December 28. 2011
Printer Friendly
For those who aren't familiar with hstore, it's a key/value
storage type that is packaged as an extension or contrib in PostgreSQL 8.2+. In PostgreSQL 9.0 it got a little extra loving in several ways one of which was the introduction
of the hstore(record) casting function that converts a record to an hstore. In this article, I'll demonstrate how you can use this new casting function to do very sleek mail merges right in the database. The only caveat is that it seems to only correctly name the keys if it is fed a real table or view. Derived queries such as aggregates etc get keys named f1, f2, etc.
If you are on PostgreSQL 9.1 or above installing -- hstore is just a CREATE EXTENSION hstore; sql command away. If you are on a lower version of PostgreSQL,
you can usually find the hstore.sql in share/contribs.
Continue reading "Mail Merging using Hstore"
Friday, November 11. 2011
Printer Friendly
One of my favorite tools and I think that of many folks working with GIS and other kinds of Multimedia is the GDAL/OGR suite.
Though I started using it to conquer GIS ETL activities, I found myself using it for problems that are inherently not GIS at all. I talked
about the GDAL OGR2OGR component a while ago in GDAL OGR2OGR for Data Loading
and this time I'll talk tangentially about its raster capabilities. It is a fantastic tool for converting between various raster formats and applying various raster operations.
In PostGIS world the new 2.0 raster functionality puts an SQL wrapper around much of its power. I'm not going to talk about that though except as a fleeting comment to explore later (we've got cool 2 band Map Algebra in PostGIS 2.0 to flaunt its stuff).
So what does this have to do with XPathing XML data with PostgreSQL? Well that's what I'm going to talk about what to do with machine generated data that comes at you in XML format.
A lot of machine generated data is hitting us in an XML like form. I talked about GPX data and navigating that in Which by the way GDAL/OGR can load and export easily into/out of a PostGIS enabled database.
GDAL exposes another kind of machine generated data in XML format which turns out to be very useful for all kinds of things. This is Exchangeable image file format (EXIF) data. There are all kinds
of random text information embedded in pictures and this varies depending on what camera is taking it. Newer cameras like the ones you have built into your iphone or android
embed location based information into them sometimes like where you were standing when you took the picture. Most cameras these days embed the time the picture was taken.
This information is important because if you are taking electronic notes while you are snapping your pictures, it provides an easy way to match up your notes with the picture about the object. So what does this EXIF info look like when you point GDAL at it? We'll see.
Continue reading "XPathing XML data with PostgreSQL"
Tuesday, November 01. 2011
Printer Friendly
Ever have the need to create a holding table say spreadsheet data with say 100 columns. You need to create a table to hold this stuff. Or perhaps you were feeling in a sadist mood and wanted to abuse your PostgreSQL database to see how many columns you can create in a table of a specific data type.
Here is a quick script to do it:
SELECT 'CREATE TABLE data_import('
|| array_to_string(array_agg('field' || i::text || ' varchar(255)'), ',') || ');'
FROM generate_series(1,10) As i;
SELECT 'CREATE TABLE data_import('
|| string_agg('field' || i::text || ' varchar(255)', ',') || ');'
FROM generate_series(1,10) As i;
Both variants will return output that looks like this:
CREATE TABLE data_import(field1 varchar(255),field2 varchar(255),field3 varchar(255),field4 varchar(255)
,field5 varchar(255),field6 varchar(255),field7 varchar(255)
,field8 varchar(255),field9 varchar(255),field10 varchar(255));
Now if you want it to also execute because you are running it as part of an sql script, you could wrap it in an anonymous function.
DO language 'plpgsql'
$$
DECLARE var_sql text := 'CREATE TABLE data_import('
|| string_agg('field' || i::text || ' varchar(255)', ',') || ');'
FROM generate_series(1,10) As i;
BEGIN
EXECUTE var_sql;
END;
$$ ;
Wednesday, October 12. 2011
Printer Friendly
This is about improvements to GIST indexes that I hope to see in PostgreSQL 9.2. One is a patch for possible inclusion in PostgreSQL 9.2 called SP-GiST, Space-Partitioned GiST created by
Teodor Sigaev and Oleg Bartunov whose basic technique is described in SP-GiST: An Extensible Database Index for Supporting Space Partitioning Trees. For those who don't know Teodor and Oleg, they are the great fellows that brought us many other GiST and GIN goodnesses that many specialty PostgreSQL
extensions enjoy -- e.g. PostGIS, trigrams, ltree, pgsphere, hstore, full-text search to name a few.
Another is a recent one just committed by Alexander Korotkov which I just recently found out about on New node splitting algorithm for GIST and admit I don't know enough about to judge. I have to admit to being very clueless when it comes to the innards of index implementations so don't ask me any technical details. It's one of those short-comings among the trillion others I have that I have learned to accept will probably never change.
What the SP-GIST patch will provide in terms of performance and speed was outlined in
PGCon 2011: SP-GiST - a new indexing infrastructure for PostgreSQL
Space-Partitioning trees in PostgreSQL.
What it provides specifically for PostGIS is summarized in Paul's call for action noted below. As a passionate user of PostGIS
,ltree, tsearch, and hstore, I'm pretty excited about these patches and other GIST and general index enhancements and there potential use in GIST dependent extensions. I'm hoping to see
these spring to life in PostgreSQL 9.2 and think it will help to further push the envelope of where PostgreSQL can go as a defacto platform
for cutting-edge technology and scientific research. I think one of PostgreSQL's greatest strength is its extensible index API.
Paul's PostGIS newsgroup note about seeking funding for faster GIST indexes , work done so far on SP-GIST and call for further action is rebroadcast in it's entirety here.
Thanks to the sponsorship of Michigan Technological University, we now
have 50% of the work complete. There is a working patch at the
commitfest https://commitfest.postgresql.org/action/patch_view?id=631
which provides quad-tree and kd-tree indexes.
However, there is a problem: unless the patch is reviewed and goes
through more QA/QC, it'll never get into PostgreSQL proper. In case
you think I am kidding: we had a patch for KNN searching ready for the
9.0 release, but it wasn't reviewed in time, so we had to wait all the
way through the 9.1 cycle to get it.
I am looking for sponsors in the $5K to $10K range to complete this
work. If you use PostgreSQL in your business, this is a chance to add
a basic capability that may help you in all kinds of ways you don't
expect. We're talking about faster geospatial indexes here, but this
facility will also radically speed any partitioned space. (For
example, the suffix-tree, which can search through URLs incredibly
fast. Another example, you can use a suffix tree to very efficiently
index geohash strings. Interesting.)
If you think there's a possibility, please contact me and I will send
you a prospectus you can take to your manager. Let's make this happen
folks!
Paul
Continue reading "Improving speed of GIST indexes in PostgreSQL 9.2"
|