Wednesday, May 09. 2012
Last time we demonstrated how to query delimited text files using the fdw_file that comes pacakged with PostgreSQL 9.1+, this time we'll continue our journey into Flat file querying Foreign Data Wrapper using an experimental foreign data wrapper designed for also querying delimited data, but outputting it as a single column text array table.
This one is called file_textarray_fdw and developed by Andrew Dunstan. It's useful if you are dealing with for example jagged files, where not all columns are not properly filled in for each record or there are just a ton of columns you don't want to bother itemizing before you bring in. The benefit is you can still query and decide how you want to break it apart. You can grab the source code from file_text_array_fdw source code. If you are on windows, we have compiled binaries in our Bag o' FDWs for both PostgreSQL 9.1 32-bit FDW for Windows bag and PostgreSQL 9.1 64-bit FDW for Windows bag that should work fine with the EDB installed windows binaries.
For other systems, the compile is fairly easy if you have the postgresql development libraries installed.
Continue reading "File FDW Family: Part 2 file_textarray_fdw Foreign Data Wrapper"
Thursday, May 03. 2012
Last time we demonstrated how to use the ODBC Foreign Data wrapper, this time we'll continue our journey into Foreign Data Wrapper land by demonstrating what I'll call the File FDW family of Foreign Data Wrappers. There is one that usually comes packaged with PostgreSQL 9.1 which is called fdw_file but there are two other experimental ones I find very useful which are
developed by Andrew Dunstan both of which Andrew demoed in PostgreSQL Foreign Data Wrappers and talked
about a little bit Text files from a remote source. As people who have to deal with text data files day in and out, especially ones from mainframes, these satisfy a certain itch.
- file_fdw - for querying delimited text files.
- file_fixed_length_fdw - this one deals with fixed length data. We discussed methods of importing fixed length data in Import Fixed width data. This is yet another approach but has the benefit that you can also use it to import just a subset of a file.
- file_text_array_fdw - this one queries a delimited file as if each delimiete row was a text array. It is ideal for those less than perfect moments when someone gives you a file with a 1000 columns and you don't have patience to look at what the hell those columns mean just yet.
In this article, we'll just cover the file_fdw one, but will follow up in subsequent articles, demonstrating the array and fixed length record ones.
Continue reading "File FDW Family: Part 1 file_fdw"
Sunday, April 15. 2012
As promised in our prior article: ODBC Foreign Data wrapper on windows, we'll demonstrate how to query SQL Server using the Foreign Data Wrapper. This we are testing on windows.
As far as querying SQL Server / PostgreSQL goes, the Foreign Data Wrapper still lacks many features that the SQL Server Linked Server approach provides.
The key ones we find currently lacking: ability to do updates and reference a table directly from server without knowing underlying structure. That said
the Foreign data Wrapper approach has possiblity to support a lot more data sources with ease. We'll demonstrate in subsequent articles using the www_fdw to query
web services which we've been playing a lot with and the often packaged in file_fdw. Enough of that let's start with a concrete example.
Warning, this is not production ready, but seems like a very promising start and with more testing can become very robust. Although we are demonstrating odbc_fdw on windows,
it is supported on Unix via the UnixODBC, but the data sources you can query will probably be different.
I'm really looking forward to how the FDW technology in PostgreSQL will push the envelop. I've been playing around with the www_fdw as well and been impressed how easily it is to
query webservices with SQL. A very ah-hah moment.
Continue reading "ODBC Foreign Data wrapper to query SQL Server on Window - Part 2"
Tuesday, March 27. 2012
One of the new features in PostgreSQL 9.1 that we've been meaning to try is the new foreign data wrapper support.
Now that we are in compile mode gearing up for releasing PostGIS 2.0.0 for windows (both 32 and 64-bit), we thought we'd give the odbc_fdw a try trying to compile on windows. Last we tried we weren't successful because
we couldn't get past the -lodbc required step.
It turns out there is an easy fix to the ODBC dependency issue and I'm not sure I changed the line right. In the makefile we changed -lodbc to -lodbc32. This was needed for both compiling 32-bit as well as the 64-bit. We compiled the 64-bit version under our Mingw-64 chain
and 32-bit under our old Mingw gcc 3.4.5. Sadly we still don't have our mingw64 (compile for windows 32-bit compile up yet). Our ming64 for windows 32 can compile the 9.2 development branch but not the 9.1.3. Go figure. Anyrate to make a long story short -- we have 32-bit binaries for PostgreSQL (you can use in VC++ builds) and 64-bit binaries as well that you can use for the VC++ EDB builds for those who are interested in experimenting.
PostgreSQL 9.1 Windows 32-bit ODBC FDW PostgreSQL 9.1 Windows 64-bit ODBC FDW
So far we've tried the PostgreSQL 64-bit data wrapper against a SQL Server 2005 DSN and it seems to work fine. Have yet to try it on other ODBC sources.
We'll write up a more detailed article describing how to make the connections.
There is one trick to getting Mingw64 compiled PostgreSQL extensions to work with the Windows 64-bit EDB builds, and that is that when you compile your PostgreSQL under mingw64,
you have to configure with option ----disable-float8-byval as we noted in our PostGIS Window 64 build instructions.
Wednesday, December 28. 2011
For those who aren't familiar with hstore, it's a key/value
storage type that is packaged as an extension or contrib in PostgreSQL 8.2+. In PostgreSQL 9.0 it got a little extra loving in several ways one of which was the introduction
of the hstore(record) casting function that converts a record to an hstore. In this article, I'll demonstrate how you can use this new casting function to do very sleek mail merges right in the database. The only caveat is that it seems to only correctly name the keys if it is fed a real table or view. Derived queries such as aggregates etc get keys named f1, f2, etc.
If you are on PostgreSQL 9.1 or above installing -- hstore is just a CREATE EXTENSION hstore; sql command away. If you are on a lower version of PostgreSQL,
you can usually find the hstore.sql in share/contribs.
Continue reading "Mail Merging using Hstore"
Sunday, December 04. 2011
What happens when you take a bunch of auto-generated tests and test results
which you burn into rasters with SQL?
You get a code fingerprint of sorts. Well that was the dream I wanted to explore. Now that PostGIS 2.0 has introduced a new spatial type called raster
and lots of functionality including some very cool Map Algebra functionality to go with it,
this is now possible without too much fuss. My first attempt at it didn't produce quite what I expected. I have to admit it is absolutely gorgeous
and will be hopefully useful once I figure out how to read it, tweak the rendering process to achieve more of a breakout of results, or patent it as a fabric style.

Now for the story of how a computer nerd is born and how rasters can be useful to nerds.
Continue reading "Code Profiling with PostGIS raster"
Tuesday, October 18. 2011
One of the great lessons learned in building PostGIS extensions is my rediscovery of SED. SED turned out to be
mighty useful in this regard and I'll explain a bit in this article. Unfortunately there is still a lot I need to learn about it
to take full advantage of it and most of my use can be summed up as monkey see, monkey scratch head, monkey do. In addition I came across what I shall refer to as Pain points with using
the PostgreSQL Extension model. Part of which has a lot to do with the non-granular management of changes in PostGIS,
the day to day major flux of changes happening in PostGIS 2.0 space, and my attempt at trying to creat upgrade freeze points amidst these changes.
When PostGIS 2.0 finally arrives, the freeze points will be better defined and not change from day to day. So some of these issues
may not be that big of a deal.
Continue reading "Lessons learned Packaging PostGIS Extensions: Part 2"
Wednesday, October 12. 2011
This is about improvements to GIST indexes that I hope to see in PostgreSQL 9.2. One is a patch for possible inclusion in PostgreSQL 9.2 called SP-GiST, Space-Partitioned GiST created by
Teodor Sigaev and Oleg Bartunov whose basic technique is described in SP-GiST: An Extensible Database Index for Supporting Space Partitioning Trees. For those who don't know Teodor and Oleg, they are the great fellows that brought us many other GiST and GIN goodnesses that many specialty PostgreSQL
extensions enjoy -- e.g. PostGIS, trigrams, ltree, pgsphere, hstore, full-text search to name a few.
Another is a recent one just committed by Alexander Korotkov which I just recently found out about on New node splitting algorithm for GIST and admit I don't know enough about to judge. I have to admit to being very clueless when it comes to the innards of index implementations so don't ask me any technical details. It's one of those short-comings among the trillion others I have that I have learned to accept will probably never change.
What the SP-GIST patch will provide in terms of performance and speed was outlined in
PGCon 2011: SP-GiST - a new indexing infrastructure for PostgreSQL
Space-Partitioning trees in PostgreSQL.
What it provides specifically for PostGIS is summarized in Paul's call for action noted below. As a passionate user of PostGIS
,ltree, tsearch, and hstore, I'm pretty excited about these patches and other GIST and general index enhancements and there potential use in GIST dependent extensions. I'm hoping to see
these spring to life in PostgreSQL 9.2 and think it will help to further push the envelope of where PostgreSQL can go as a defacto platform
for cutting-edge technology and scientific research. I think one of PostgreSQL's greatest strength is its extensible index API.
Paul's PostGIS newsgroup note about seeking funding for faster GIST indexes , work done so far on SP-GIST and call for further action is rebroadcast in it's entirety here.
Thanks to the sponsorship of Michigan Technological University, we now
have 50% of the work complete. There is a working patch at the
commitfest https://commitfest.postgresql.org/action/patch_view?id=631
which provides quad-tree and kd-tree indexes.
However, there is a problem: unless the patch is reviewed and goes
through more QA/QC, it'll never get into PostgreSQL proper. In case
you think I am kidding: we had a patch for KNN searching ready for the
9.0 release, but it wasn't reviewed in time, so we had to wait all the
way through the 9.1 cycle to get it.
I am looking for sponsors in the $5K to $10K range to complete this
work. If you use PostgreSQL in your business, this is a chance to add
a basic capability that may help you in all kinds of ways you don't
expect. We're talking about faster geospatial indexes here, but this
facility will also radically speed any partitioned space. (For
example, the suffix-tree, which can search through URLs incredibly
fast. Another example, you can use a suffix tree to very efficiently
index geohash strings. Interesting.)
If you think there's a possibility, please contact me and I will send
you a prospectus you can take to your manager. Let's make this happen
folks!
Paul
Continue reading "Improving speed of GIST indexes in PostgreSQL 9.2"
Thursday, October 06. 2011
In prior articles we talked about the new PostgreSQL 9.1 extension model
and upcoming PostGIS 2.0 extensions which we have experimental builds of so far.
In this article and other's to follow, we shall provide a documentary of our venture into this new extensions world. We'll discuss some of the obstacles we had with building
extensions, lessons learned, and foolishness exposed, with the hope that others can learn from our experience.
First off, the upcoming PostGIS 2.0 extensions will be packaged as at least two extensions -- postgis which will contain both PostGIS geometry/geography types, functions, meta views and tables as well as raster type and associated functions and tables. Topology support, while a part of upcoming PostGIS 2.0, will be packaged as a separate extension called postgis_topology. The main reason for breaking topology out as a separate extension is that it is always stored in a schema called topology and is not relocatable
to another schema. The way the current extension model works, all the parts of your extension should live in the same schema. Later we plan to package tiger geocoder as an extension, but this one probably makes more sense to live on http://pgxn.org/ since it is only of interest to United States users,
, is purely plpgsql with dependency on PostGIS, and we had beefed it up as part of a consulting contract for a company running PostGIS 1.5. It's the only piece documented in PostGIS 2.0 that works on 1.5 as well (aside from the tiger toplogy loader which has dependency on toplogy), although it has always lived as an extra in the PostGIS code base.
We'll probably package postgis_legacy_functions as an extension too for those people who badly need those 500 alias functions I chucked.
We mentioned in our prior article that we ran into some issues with how our extension worked -- e.g. topology referencing the postgis extension. Most of these turned out just to be ignorance on my part as to how the different pieces fit together and I'll elaborate on these.
Much of what will be described here is also documented in Packaging Related Objects into an Extension.
In the future I'm hoping we'll also see plr and pgrouting packaged as extensions which are common favorites of PostGIS users.
Continue reading "Lessons learned Packaging PostGIS Extensions: Part 1"
Sunday, August 14. 2011
One of the new features I'm excited about in upcoming PostgreSQL 9.1 are extensions. It is also my hope
that for PostGIS 2.0, we'll be able to package PostGIS 2.0 as an extension.
Reinspired in my mission by David Wheeler's recent post and video on Building and Distributing Extensions without C, I decided to take some time to investigate how all the extension pieces fit together.
The three things I like most about extensions
are: - It has a companion sql CREATE EXTENSION and catalog of what's installed and available right from the db, which makes installing/uninstalling relatively painless
- Installed functions don't get backed up with data, which is really a bit of a nightmare for PostGIS folks and relief be much welcome as you can tell in Paul's PostGIS backup and restore
which gets a bit more of an adventure in PostGIS 2.0 now that we have raster and topology and many more fun fun GEOS functions.
- The ease with which you can uninstall,migrate your extension to another schema, and upgrade (in theory). There will be caveats here of course
with changes that require on disk format and involve data.
Of course the ease is all in the thoughtfulness of the packaging. To get some ideas of how we would go about packaging
PostGIS 2.0 as an extension (it could very well be 3 extensions if we decide to package the core postgis, raster, and topology (and even tiger geocoder) as separate extensions), I thought I would take a look at how others have packaged theirs, and how one goes about registering one of these packages
to make it available in CREATE EXTENSION.
Figuring out the extensions you have available ready to install
First I decided to start by doing a little snooping, by applying some lessons from our previous article Querying table, view, column and function descriptions
I wrote this query to figure out what useful functions are available to learn about extensions.
SELECT p.proname AS funcname, d.description
FROM pg_proc p
INNER JOIN pg_namespace n ON n.oid = p.pronamespace
LEFT JOIN pg_description As d ON (d.objoid = p.oid )
WHERE n.nspname = 'pg_catalog'
AND (d.description ILIKE '%extension%' or p.proname ILIKE '%extension%')
ORDER BY n.nspname, p.proname ;
-- which output this --
funcname | description
---------------------------------+-----------------------------------------------------
pg_available_extension_versions | list available extension versions
pg_available_extensions | list available extensions
pg_extension_config_dump | flag an extension's table contents to be emitted by pg_dump
pg_extension_update_paths | list an extension's version update paths
Continue reading "PostgreSQL 9.1 Exploring Extensions"
Monday, June 06. 2011
There once existed programmers who were asked to explain this snippet of code: 1 + 2
- The C programmer explained "It's a common mathematical expression."
- The C++, Java, C# and other impure object-oriented programmers said "We concur. It's a common mathematical expression."
- The Smalltalk programmer explained "1 adds 2."
- The Lisp programmer stood up, a bit in disgust, and said, "No no! You are doing it all wrong!"
The Lisp Programmer then pulled out
a Polish calculator, punched in + 1 2
,and with a very serious face, explained "+ should be pushing those other two around."
I find this episode interesting because while the Lisp programmer I feel is more right, the Smalltalk programmer has managed to follow the rest of the crowd and still stick
to her core principle. This brings us to what does this have to do with trigrams
in PostgreSQL 9.1. Well just like 1 + 2 being a common mathematical expression, abc LIKE '%b%' is a common logical relational database expression that we have long taken for granted as not an indexable operation in most
databases (not any other database to I can think of) until PostgreSQL 9.1, which can utilize trigram indices (the Lisp programmer behind the curtain) to make it fast.
There are 2 main enhancements happening with trigrams in PostgreSQL 9.1
both of which depesz has already touched on in FASTER LIKE/ILIKE
and KNNGIST. This means you can have an even faster trigram search than you ever
have had before and you can do it in such a fashion that doesn't require any PostgreSQL trigram specific syntactical expressions. So while PostgreSQL 9.1 might be understanding LIKE much like all the other databases
you work with, if you have a trigram index in place, it will just be doing it a little faster and sometimes a lot faster using the more clever PostgreSQL 9.1 planner.
This is one example of how you can use applications designed for many databases and still be able to utilize advanced features in
your database of choice. In this article we'll demonstrate.
For this example we'll use a table of 490,000 someodd records consisting of Massachusetts street segments and their names excerpted from TIGER 2010 data. You can
download the trimmed data set from here if you want to play along.
Continue reading "PostgreSQL 9.1 Trigrams teaching LIKE and ILIKE new tricks"
Thursday, October 07. 2010
Universal Unique Identifiers are 16-byte / 32-hexadecimal digit (with 4 -s for separation)
identifiers standardized by the Open Software Foundation.
The main use as far as databases go is to ensure uniqueness of keys across databases. This is important if you have multiple servers or disperate systems that need to replicate or share
data and each can generate data on its own end. You want some non-centralized mechanism to ensure the ids generated from each server will never overlap.
There are various open standards
for generating these ids and each standard will tie the id based on some unique identifier of the computer or a namespace or just a purely random generator algorithm not tied to anything.
Since this is a question often asked by users coming from Microsoft SQL Server, we will demonstrate in this article the same concept in Microsoft SQL Server and how you would achieve similar functionality in PostgreSQL.
Continue reading "Universal Unique Identifiers PostgreSQL SQL Server Compare"
Sunday, August 22. 2010
PostgreSQL offers several options for displaying and querying tree like structures.
In Using Recursive Common Table Expressions (CTE) to represent tree structures
we demonstrated how to use common table expressions to display a tree like structure. Common Table Expressions required PostgreSQL 8.4 and above but was fairly ANSI standards compliant. In addition to that
approach you have the option of using recursive functions. There is yet another common approach for this which is specific to PostgreSQL. This is using the ltree contrib datatype
that has been supported for sometime in PostgreSQL. For one of our recent projects, we chose ltree over the other approaches because the performance is much better when you need to do ad-hoc queries over the tree since it can take advantage of btree and gist indexes
and also has built-in tree query expressions that make ad-hoc queries simpler to do; similar in concept to the tsearch query syntax for querying text.
In this article we'll demonstrate how to use ltree and along the way also show the PostgreSQL 9.0 new features conditional triggers and ordered aggregates.
Continue reading "Using LTree to Represent and Query Hierarchy and Tree Structures"
Thursday, July 29. 2010
One of the new features of PostgreSQL 9.0 is the ability to specify the format of an explain plan.
In prior versions your only choice was text (and graphic explain with tools like PgAdmin III and other GUIS), but in 9.0 on, you have the additional options of
Javascript Object Notation (JSON) which some people have a thing or two to say about them, YAML Ain't Markup Language (YAML) or eXtended Markup Language (XML). The new explain options are itemized in PostgreSQL 9.0 EXPLAIN.
The main benefit of the JSON, XML, YAML formats is that they are easier
to machine parse than the default text version. This will allow for creative renderings of planner trees with minimal coding.
In Part 1 of this series, we'll demonstrate how to output the plans in these various formats and what they look like.
In later parts of this series -- we'll demonstrate how to use Javascript, XSL and other scripting/markup languages
to transform these into works of art you can hang on your wall.
-- START POSTGIS IN ACTION ASIDE --
We just submitted the third major revision of Chapter 3 Data Modeling
of our upcoming PostGIS in Action book.
The second major revision we never submitted and threw it out because it wasn't worldly enough and was too involved. We may
use it later on for an example.
Chapter 3 should be up on Manning Early Access Program (MEAP) soon. If you haven't bought the book yet Buy now.
You don't want to miss out on a major masterpiece in the making. Okay we exaggerate a bit.
-- END POSTGIS IN ACTION ASIDE --
Continue reading "Explain Plans PostgreSQL 9.0 Text, JSON, XML, YAML - Part 1: You Choose"
Wednesday, July 21. 2010
In an earlier article Where is Soundex and other Fuzzy string things we covered the PostgreSQL contrib module fuzzstrmatch which contains the very popular function
soundex that is found in other popular relational databases. We also covered the more powerful levenshtein distance, metaphone and
dmetaphone functions included in fuzzstrmatch, but rarely found in other relational databases.
As far as fuzzy string matching goes, PostgreSQL has other functions up its sleeves. This time we will cover
the contrib module pg_trgm which was introduced in PostgreSQL 8.3. pgtrgm uses a concept called trigrams for doing string comparisons. The pg_trgm module has several functions and gist/gin operators.
Like other contrib modules, you just need to run the /share/contrib/pg_trgm.sql file packaged in your PostgreSQL install to enable it in your database.
For this set of exercises, we'll use trigrams to compare words using the same set of data we tested
with soundex and metaphones. For the next set of exercises, we will be using the places dataset we created in Importing Fixed width data into PostgreSQL with just PSQL.
The most useful are the similarity function and the
% operator. The % operator allows for using a GIST/GIN index and the similarity function allows for narrowing your filter similar to what
levenshtein did for us in fuzzstrmatch.
Continue reading "Fuzzy string matching with Trigram and Trigraphs"
|