Wednesday, November 23. 2016
Printer Friendly
A while ago when Foreign Data Wrappers in PostgreSQL was a fairly new thing, we talked about the ODBC_FDW foreign data wrapper. Since then, people have been asking us how to get the ODBC FDW to work on newer PostgreSQL.
Sadly the ODBC_FDW was stuck in time not having updated to newer FDW API standards.
Our recommendation was just to use OGR_FDW, which many distributions both Linux and Windows have compiled OGR_FDW with ODBC support.
True that OGR_FDW is coined as a spatial data wrapper, but the reality is spatial data rarely lives apart from regular attribute data so a good spatial vector driver supports both vector data and bread and butter data types. OGR_FDW is still our go to for working with spreadsheets and folders of CSV files.
Recently the fine folks at Carto patched the ODBC FDW to work with PostgreSQL 9.5. I do hope they accept my modest patch to make it work with PostgreSQL 9.6 as well.
So now 2 FDWs to choose from for connecting to ODBC datasources. Which one is better? The answer as most always is IT DEPENDS.
Continue reading "ODBC FDW now supports 9.5 and 9.6"
Saturday, November 09. 2013
Printer Friendly
Alexander Kuznetsov on SQLblog.com
has an interesting series going entitled with Learning PostgreSQL. In the series he focuses on what it takes to move
a SQL Server database and App to PostgreSQL and highlights some of the key differences between the two platforms that you should watch out for.
I recommend it to any SQL Server developer planning to make the switch to PostgreSQL or any PostgreSQL consultant tasked with the job
and not familiar with the intricacies of SQL Server.
His PostgreSQL feed can be found here here
So far on his list of articles in the series:
Continue reading "Migrating from SQL Server to PostgreSQL"
Sunday, January 27. 2013
Printer Friendly
This exercise is a continuation of our How to bulk export tables from MS Access. Now while this approach will work for other databases besides PostgreSQL, you'll probably need to fiddle with the subroutine to make it work for some other databases. PostgreSQL is fairly ANSI-SQL standard so not too much fiddling should be required to port to SQL Server, MySQL, Oracle etc.
Unlike the prior Visual basic subroutine we showed that exports the tables, this just creates an SQL script that you run on the already created PostgreSQL database that contains the exported data. We didn't test the quote option though we coded it in the subroutine, since like we said we hate having to quote fields. If perchance you are one of those folks that likes to put spaces in your field names to make it more englishy, then you'll need to quote or revise the other routine to convert your spaces to _ or some other thing.
Continue reading "How to recreate MS Access primary keys and indexes in PG"
Printer Friendly
Despite all the nasty things people say about MS Access, it does have fairly database agnostic, easy to use import and export tools and a not too shabby query wizard and query tool. But of course, it's not a server side database so at a certain point won't scale for database storage. You can still use it as a front-end to a server-side database such as PostgreSQL or SQL Server. So once you outgrow it for data storage, you'll probably want a quick way to export your data out. Unfortunately, well at least in the 2003 version, while you can bulk link tables from an ODBC connection, you can only export one table at a time using the "select table" right-click export. In this article we'll show a quick and dirty export MS Access Visual basic sub routine we use to export all tables. This we've only tested on MS Access 2003, so if you are using higher, you might need to customize it a bit. This script should work fine for exporting to any database connection with modification of the connection string.
Continue reading "How to bulk export tables from MS Access"
Sunday, January 06. 2013
Printer Friendly
In the last article we said you can't have a LEFT JOIN with LATERAL. Turns out we were mistaken and YES indeed you can and when you do it is equivalent or more powerful than SQL Server's OUTER APPLY.
Let us say that in our query we wanted all my zips returned but for ones that had keys where the value is in a certain range, we want those keys returned. we'd do this. The fact we need all even if they have no such keys necessitates us putting the condition in the ON rather than the WHERE.
SELECT zip, (h).key, (h).value As val
FROM zcta5 AS foo
LEFT JOIN LATERAL each(hstore(foo) - 'zip'::text) As h
ON ((h).value BETWEEN '12345' and '14567')
ORDER BY zip
limit 5;
Output would be
zip | key | val
------+--------+----------
00601 | |
00602 | |
00603 | |
00606 | awater | 12487.00
00610 | hu10 | 12618
Friday, January 04. 2013
Printer Friendly
A while ago we demonstrated how to create cross tabulation tables using tablefunc extension aka (Pivot Tables) (basically collapsing
rows into columns).
Recently someone asked me how to do the reverse (convert columns to rows). He found a solution to the problem here: http://stackoverflow.com/questions/1128737/unpivot-and-postgresql
using a combination of array and unnest. That approach is very similar to SQL Server's built-in Pivot SQL predicate.
The solution seemed
nice enough except similar to the SQL Server Unpivot, it required knowing the column names beforehand so very hard to genericize.
So would it be possible to accomplish this feat without knowing the columns names (except for the key) and be able to do it with one SQL statement. I realized that the
PostgreSQL hstore extension fit the bill nicely. In this article I'll demonstrate both approaches by creating a view
using both.
Continue reading "Unpivoting data in PostgreSQL"
Saturday, November 10. 2012
Printer Friendly
We've been fighting to get clients we have thinking of upgrading or creating new apps to also choose PostgreSQL in the process.
Here I'll just itemize some of the obstacles we've run into in making the sale. All of these fall under the It is not Database X line item.
By database X I mean SQL Server, MySQL, and Oracle and for us in exactly that order. Our obstacle focus is probably a bit different from others since
we are consultants to mostly Windows shops or consultants to ISVs who have to sell their applications to U.S. government agencies or units of agencies.
Continue reading "Biggest Obstacle to PostgreSQL Adoption: It is not Database X"
Sunday, July 01. 2012
Printer Friendly
In last article Finding Contiguous primary keys we detailed one of many ways of finding continuous ranges in data, but the approach would only work on higher-end dbs like Oracle 11G, SQL Server 2012, and PostgreSQL 8.4+. Oracle you'd have to replace the EXCEPT I think with MINUS. It wouldn't work on lower Oracle because of use of CTEs. It wouldn't work on lower SQL Server because it uses window LEAD function which wasn't introduced into SQL Server until SQL Server 2012. Someone on reddit provided a Microsoft SQL Server implementation which we found particularly interesting because - it's a bit shorter and it's more cross-platform. You can make it work with minor tweaks on any version of PostgreSQL, MySQL, SQL Server and even MS Access. The only downside I see with this approach is that it uses correlated subqueries which tend to be slower than window functions. I was curious which one would be faster, and to my surprise, this version beats the window one we described in the prior article. It's in fact a bit embarrassing how well this one performs. This one finished in 462 ms on this dataset and the prior one we proposed took 11seconds on this dataset. Without further ado. To test with we created a table:
CREATE TABLE s(n int primary key);
INSERT INTO s(n)
SELECT n
FROM generate_series(1,100000) As n
WHERE n % 200 != 0;
Continue reading "Contiguous Ranges of primary keys: a more cross platform and faster approach"
Sunday, April 15. 2012
Printer Friendly
As promised in our prior article: ODBC Foreign Data wrapper on windows, we'll demonstrate how to query SQL Server using the Foreign Data Wrapper. This we are testing on windows.
As far as querying SQL Server / PostgreSQL goes, the Foreign Data Wrapper still lacks many features that the SQL Server Linked Server approach provides.
The key ones we find currently lacking: ability to do updates and reference a table directly from server without knowing underlying structure. That said
the Foreign data Wrapper approach has possiblity to support a lot more data sources with ease. We'll demonstrate in subsequent articles using the www_fdw to query
web services which we've been playing a lot with and the often packaged in file_fdw. Enough of that let's start with a concrete example.
Warning, this is not production ready, but seems like a very promising start and with more testing can become very robust. Although we are demonstrating odbc_fdw on windows,
it is supported on Unix via the UnixODBC, but the data sources you can query will probably be different.
I'm really looking forward to how the FDW technology in PostgreSQL will push the envelop. I've been playing around with the www_fdw as well and been impressed how easily it is to
query webservices with SQL. A very ah-hah moment.
Continue reading "ODBC Foreign Data wrapper to query SQL Server on Window - Part 2"
Sunday, December 11. 2011
Printer Friendly
In our article The Pure Relational database is dead there
were a lot of misunderstandings as a result of our poor choice of words. People thought we were bashing the relational model because in their mind that was what
pure meant. I got hit with a lot of poetic insults. I still can't think of an alternative word to use for what I meant. Simple doesn't really do it as even relational databases with just standard types
were far from simple when you consider the planner and all the other stuff going on under the hood to protect you from the underlying storage structure. What I was trying to say is that in the beginning most relational databases
just supported a standard set of types which you could not expand on and most people when they think relational today still think just that. That type of relational database is in my book dead or almost dead.
How did this all start. Well whenever we use something like PostgreSQL to store anything complex -- take your pick: geometry data, tree like structures which we use
ltree for, full-text query constructs, and Yes XML we get bashed by some know-it-all who has a very narrow view of what a relational database should be doing and suggesting we use a NoSQL database, a graph engine or a full text engine or normalize our data more. I have also learned XML is a dirty word to many people. I mistakenly thought XML was a complex type people could
relate to, but turns out they can relate to it so well that it brings up tragic memories I can only equate to Post Traumatic Stress Disorder suffered by war veterans or (early or wrong) technology adopters. That was not
my intent either. XML was just merely an example. I will not say you should use XML in your tables, but I will also not say you should stay clear of it as many people wanted me to say. I will say its use is rare, but it has its place. It has its place just as any other complex type and it has its own special needs for navigation, indexing etc. which many relational databases handle fine enough.
Continue reading "The Relational Model is very much alive"
Saturday, December 03. 2011
Printer Friendly
A lot of redditers took offense at our article XPathing XML data with PostgreSQL
with the general consensus, if you are going to be stuffing XML in a relational database where will you stop? That is not what relational databases are designed for.
We had comitted a sacrilegious sin and worsed yet encouraging bad habits by forcing people to think more about different options they have for storing data in a relational database and god forbid demonstrating querying such columns with xml specific functions. What were we thinking? How dare we try to query XML data with SQL? Perhaps we were thinking like this guy or this guy,
both equally misguided spatial relational database folk.
Of course we stepped one foot further by actually defining a column as xml and dare storing data in it for later consumption rather than just an intermediary step.
If I want to store documents, that are navigateable I should be using a document database like MongoDb, CouchDB etc designed for that kind of stuff. If I've got graphs I should be using a graph database. This got me thinking
that the "Pure Relational Database" is dead, and I'm surprised most people don't seem to realize it.
So while "Relational databases" have changed over the last 25 years, most people's notions of them have not kept up with the pace of its change.
First let me define what I mean by Pure. A pure relational database is one with standard meat and potato types like text, dates, numbers well suited for counting money and computing how close the world is to total bankruptcy which you store as fields in a row of a table and that you then define foreign keys / constraints / primary keys on to relate them to other tables.
You reconstitute your real world objects by stitching these tables together with joins etc and return sets using where conditions, summarize by using group bys or other mathy like constructs. Don't get me wrong; these are very beautiful things because they allow for easy slicing of dimensions and not having to think about all the dimensions that make up an object all at once. In fact it was so beautiful that some people thought, "wow that's cool, but it would be even cooler if I could
store more complex objects in those columns with their own specific needs for querying." and so was born the object relational database as some people refer to them that thought relational but also understood that different types had their own unique needs for querying, storage, indexing etc.
Nowadays most, if not all, relational like databases have standardized on some variant of SQL.
In essence though, the pure relational database doesn't allow you to define new types or have exotic types such as arrays, xml, graphs, geometries, rasters, sparse matrices etc.
Much less thinking involved and less likely you will shoot yourself in the foot by dumping a bunch of xml in a field and trying to do something with it.
When it is used to store more complex things such as spreadsheets and other user documents, these are stored as blobs and just retrieved. Even such use is frowned upon.
Well most relational databases I can think of nowadays have richer types: e.g. PostgreSQL, Oracle and Firebird all support arrays as a column type. Some even allow you to define custom types and functions to support your custom types e.g. PostgreSQL (I could go on forever), Oracle has rich user defined type support too, and SQL Server 2005+ with each version getting better and better for user defined custom types and introducing more exotic types and support infrastructure. Even MySQL/Drizzle (mostly in the form of different storage engines). Even my favorite light-weight SQLite under the hood has some tricks that aren't what I would call relational. E.g. Spatialite/RasterLite has a whole geometry type library built on SQLite with functions you can call from SQL and I'm sure there are lots of middleware tools you don't know about
using the SQLite and Firebird engine for more than relational tasks (e.g. HTML5 anyone/ CAD anyone).
Monday, September 26. 2011
Printer Friendly
UPDATE Turns out there is a simpler way of getting rid of roles that have explicit permissions to objects as Tom Lane pointed out in the comments.
DROP OWNED BY some_role;
Will drop the permissions to objects a user has rights to even if they don't own the objects. Of course this needs to be applied with caution since it will drop tables
and other things you don't want necessarily dropped. So it is best to first run a:
REASSIGN OWNED BY some_role TO new_role;
And then run the DROP OWNED BY.
The REASSIGN OWNED BY which is what we did originally is not sufficient since it doesn't drop the permissions or reassign
them as we assumed it would. This is noted in the docs.
And then you will be allowed to
DROP ROLE some_role
One of the things that is still tricky in PostgreSQL is permission management. Even though 9.0 brought us default privileges and the like, these permissions aren't
retroactive so still a pain to deal with if you already have objects defined in your database.
One of the annoyances we come across with is deleting roles. Lets say you have a role and it has explicit permissions to an object.
PostgreSQL won't allow you to delete this role if it owns objects or has explicit permissions to objects. In order to delete it seems you have
to go in and clear out all those permissions. To help with that -- we wrote a quickie script that will generate a script to revoke all permissions on objects
for a specific role. It looks like this:
Continue reading "Bulk Revoke of Permissions for Specific Group/User role"
Sunday, September 04. 2011
Printer Friendly
We've been working on converting some of our SQL Server apps to PostgreSQL. In this
article we'll describe some things to watch out for and provide a function we wrote to automate some of
the conversion.
Although both databases are fairly ANSI-SQL compliant, there are still differences
with their CREATE TABLE statements, data types, and how they handle other things that makes porting applications not so trivial.
Continue reading "SQL Server to PostgreSQL: Converting table structure"
Tuesday, May 10. 2011
Printer Friendly
Question: What is the difference between CURRENT_TIMESTAMP and clock_timestamp()
Answer: CURRENT_TIMESTAMP is an ANSI-SQL Standard variable you will find in many relational databases including PostgreSQL, SQL Server, Firebird, IBM DB2 and MySQL to name a few
that records the start of the transaction. The important thing to keep in mind about it is there is only one entry per transaction so if you have a long running transaction,
you won't be seeing it changing as you go along.
clock_timestamp() is a PostgreSQL function that always returns the current clock's timestamp. I don't think I'm alone in using it for doing simple benchmarking and other things
where for example I need to record the timings of each part of a function within the function using pedestrian RAISE NOTICE debug print statements.
There is another cool way I like using it, and that is for a batch of records each with an expensive function call, benchmarking how long it takes to process each record.
One of the things I'm working on is improving the speed of the tiger_geocoder packaged in PostGIS 2.0. The first root of attack seemed to me would be the normalize_address function
which I was noticing was taking anywhere from 10% to 50% of my time in the geocode process. That's a ton of time if you are trying to batch geocode a ton of records. The thing is
the function is very particular to how badly formed the address is so a whole batch could be held up by one bad apple and since the batch doesn't return until all are processed, it makes
the whole thing seem to take a while.
So rather than looping thru each, I thought it would be cool if I could run the batch, but for each record have it tell me how long it took to process relative to the rest so I could get
a sense of what a problem address looks like. So I wrote this query:
WITH ctbenchmark
AS
(SELECT *,
the_time - COALESCE(lag(the_time) OVER(ORDER BY the_time), CURRENT_TIMESTAMP) As process_time,
the_time - CURRENT_TIMESTAMP As diff_from_start
FROM (SELECT address_1, city, state, zip,
pprint_addy(normalize_address(coalesce(address_1,'') || ', ' || coalesce(city || ' ','') || state || ' ' || zip)) As pp_addr,
clock_timestamp() As the_time
FROM testgeocode LIMIT 1000) As foo )
SELECT *
FROM ctbenchmark
WHERE process_time > '00:00:00.016'::interval;
Which returned an output something like this:
address_1 | city | state | zip | pp_addr | the_time | process_time | diff_from_start
------------------+------------+-------+------- +-------------------------------------------+--------------+------------------
48 MAIN ST .. | S.. | MA | 021.. | 48 MAIN .. | 2011-05-10 03:24:43.078-04 | 00:00:00.032 | 00:00:00.032
15 ... | | MA | 018... | 15 GREN... | 2011-05-10 03:24:50.796-04 | 00:00:00.031 | 00:00:07.75
Saturday, April 30. 2011
Printer Friendly
We like to enforce business rules at the database level wherever
we can, for the simple reason, particularly the business we are in, most database update happens
outside the end-user application layer.
That is not to say you shouldn't enforce at the application level too, but that the database is the last
line of defense, is usually more self-documenting than application code can be, and also protects you from your
programmers, even when that your programmers is you.
Domains are objects that you will find in many high-end
standards-compliant databases. They exist in SQL Server, Oracle, IBM Db2, Firebird, and PostgreSQL to name a few.
Domains have existed for a really long time in PostgreSQL. In PostGIS topology, Sandro Santilli (usually known as strk), takes advantage of them for fleshing out the topology support, and I got turned on to them by him.
With that said - let's dive into domains.
What are domains?
Domains are essentially a reusable packaging of check constraints. You use them as if they were a custom data type.
The nice thing about them is that they are usually transparent to applications that
don't understand them.
Example 1: Enforce pay ending/pay day happens only on certain days of the week
Here is an example -- suppose you had a payment system, and you had a rule that the pay thru end date has to
fall on a Friday. You could create a domain such as the following:
CREATE DOMAIN dom_payday
AS date
CONSTRAINT check_dow CHECK (trim(to_char(VALUE, 'day')) = 'friday');
COMMENT ON DOMAIN dom_payday IS 'Company payday rules';
Continue reading "Using Domains to Enforce Business Rules"
|