Saturday, November 09. 2013
Recommended Books: PostgreSQL: Up and Running (1st Edition)
Alexander Kuznetsov on SQLblog.com has an interesting series going entitled with Learning PostgreSQL. In the series he focuses on what it takes to move a SQL Server database and App to PostgreSQL and highlights some of the key differences between the two platforms that you should watch out for. I recommend it to any SQL Server developer planning to make the switch to PostgreSQL or any PostgreSQL consultant tasked with the job and not familiar with the intricacies of SQL Server.
His PostgreSQL feed can be found here here
So far on his list of articles in the series:
Continue reading "Migrating from SQL Server to PostgreSQL"
Monday, January 28. 2013
This exercise is a continuation of our How to bulk export tables from MS Access and How to recreate MS Access primary keys and indexes in PostgreSQL. As mentioned in the first article, when you use the built-in export feature of MS Access, it exports autonumber fields as integers instead of the more appropriate PostgreSQL pseudo serial type.
The serial type in PostgreSQL is not really a type though, so this behavior is expected. The serial type is really short-hand for defining an integer column and a corresponding sequence object
with a specific naming convention and setting the default value to the next value of the sequence. So this is what we will do in this exercise after we have already exported our data to PostgreSQL. The script we are about to demonstrate will generate an sQL script containing all the
As a side note, Mark mentioned a similar approach to what we've been describing here, but builds all the table , key structures utilizing python. You can check out the python script at http://code.activestate.com/recipes/52267-reverse-engineer-ms-accessjet-databases
Continue reading "How to map MS Access auto number to PostgreSQL serial"
Sunday, January 27. 2013
This exercise is a continuation of our How to bulk export tables from MS Access. Now while this approach will work for other databases besides PostgreSQL, you'll probably need to fiddle with the subroutine to make it work for some other databases. PostgreSQL is fairly ANSI-SQL standard so not too much fiddling should be required to port to SQL Server, MySQL, Oracle etc.
Unlike the prior Visual basic subroutine we showed that exports the tables, this just creates an SQL script that you run on the already created PostgreSQL database that contains the exported data. We didn't test the quote option though we coded it in the subroutine, since like we said we hate having to quote fields. If perchance you are one of those folks that likes to put spaces in your field names to make it more englishy, then you'll need to quote or revise the other routine to convert your spaces to _ or some other thing.
Continue reading "How to recreate MS Access primary keys and indexes in PG"
Despite all the nasty things people say about MS Access, it does have fairly database agnostic, easy to use import and export tools and a not too shabby query wizard and query tool. But of course, it's not a server side database so at a certain point won't scale for database storage. You can still use it as a front-end to a server-side database such as PostgreSQL or SQL Server. So once you outgrow it for data storage, you'll probably want a quick way to export your data out. Unfortunately, well at least in the 2003 version, while you can bulk link tables from an ODBC connection, you can only export one table at a time using the "select table" right-click export. In this article we'll show a quick and dirty export MS Access Visual basic sub routine we use to export all tables. This we've only tested on MS Access 2003, so if you are using higher, you might need to customize it a bit. This script should work fine for exporting to any database connection with modification of the connection string.
Continue reading "How to bulk export tables from MS Access"
Sunday, January 06. 2013
Let us say that in our query we wanted all my zips returned but for ones that had keys where the value is in a certain range, we want those keys returned. we'd do this. The fact we need all even if they have no such keys necessitates us putting the condition in the ON rather than the WHERE.
Output would be
zip | key | val ------+--------+---------- 00601 | | 00602 | | 00603 | | 00606 | awater | 12487.00 00610 | hu10 | 12618
Friday, January 04. 2013
A while ago we demonstrated how to create cross tabulation tables using tablefunc extension aka (Pivot Tables) (basically collapsing rows into columns). Recently someone asked me how to do the reverse (convert columns to rows). He found a solution to the problem here: http://stackoverflow.com/questions/1128737/unpivot-and-postgresql using a combination of array and unnest. That approach is very similar to SQL Server's built-in Pivot SQL predicate. The solution seemed nice enough except similar to the SQL Server Unpivot, it required knowing the column names beforehand so very hard to genericize. So would it be possible to accomplish this feat without knowing the columns names (except for the key) and be able to do it with one SQL statement. I realized that the PostgreSQL hstore extension fit the bill nicely. In this article I'll demonstrate both approaches by creating a view using both.
Continue reading "Unpivoting data in PostgreSQL"
Saturday, November 10. 2012
We've been fighting to get clients we have thinking of upgrading or creating new apps to also choose PostgreSQL in the process. Here I'll just itemize some of the obstacles we've run into in making the sale. All of these fall under the It is not Database X line item. By database X I mean SQL Server, MySQL, and Oracle and for us in exactly that order. Our obstacle focus is probably a bit different from others since we are consultants to mostly Windows shops or consultants to ISVs who have to sell their applications to U.S. government agencies or units of agencies.
Continue reading "Biggest Obstacle to PostgreSQL Adoption: It is not Database X"
Sunday, July 01. 2012
In last article Finding Contiguous primary keys we detailed one of many ways of finding continuous ranges in data, but the approach would only work on higher-end dbs like Oracle 11G, SQL Server 2012, and PostgreSQL 8.4+. Oracle you'd have to replace the EXCEPT I think with MINUS. It wouldn't work on lower Oracle because of use of CTEs. It wouldn't work on lower SQL Server because it uses window LEAD function which wasn't introduced into SQL Server until SQL Server 2012. Someone on reddit provided a Microsoft SQL Server implementation which we found particularly interesting because - it's a bit shorter and it's more cross-platform. You can make it work with minor tweaks on any version of PostgreSQL, MySQL, SQL Server and even MS Access. The only downside I see with this approach is that it uses correlated subqueries which tend to be slower than window functions. I was curious which one would be faster, and to my surprise, this version beats the window one we described in the prior article. It's in fact a bit embarrassing how well this one performs. This one finished in 462 ms on this dataset and the prior one we proposed took 11seconds on this dataset. Without further ado. To test with we created a table:
Continue reading "Contiguous Ranges of primary keys: a more cross platform and faster approach"
Sunday, April 15. 2012
As promised in our prior article: ODBC Foreign Data wrapper on windows, we'll demonstrate how to query SQL Server using the Foreign Data Wrapper. This we are testing on windows. As far as querying SQL Server / PostgreSQL goes, the Foreign Data Wrapper still lacks many features that the SQL Server Linked Server approach provides. The key ones we find currently lacking: ability to do updates and reference a table directly from server without knowing underlying structure. That said the Foreign data Wrapper approach has possiblity to support a lot more data sources with ease. We'll demonstrate in subsequent articles using the www_fdw to query web services which we've been playing a lot with and the often packaged in file_fdw. Enough of that let's start with a concrete example.
Warning, this is not production ready, but seems like a very promising start and with more testing can become very robust. Although we are demonstrating odbc_fdw on windows, it is supported on Unix via the UnixODBC, but the data sources you can query will probably be different. I'm really looking forward to how the FDW technology in PostgreSQL will push the envelop. I've been playing around with the www_fdw as well and been impressed how easily it is to query webservices with SQL. A very ah-hah moment.
Continue reading "ODBC Foreign Data wrapper to query SQL Server on Window - Part 2"
Sunday, December 11. 2011
Recommended Books: SQL and Relational Theory: How to write accurate SQL code SQL Pocket guide
In our article The Pure Relational database is dead there were a lot of misunderstandings as a result of our poor choice of words. People thought we were bashing the relational model because in their mind that was what pure meant. I got hit with a lot of poetic insults. I still can't think of an alternative word to use for what I meant. Simple doesn't really do it as even relational databases with just standard types were far from simple when you consider the planner and all the other stuff going on under the hood to protect you from the underlying storage structure. What I was trying to say is that in the beginning most relational databases just supported a standard set of types which you could not expand on and most people when they think relational today still think just that. That type of relational database is in my book dead or almost dead.
How did this all start. Well whenever we use something like PostgreSQL to store anything complex -- take your pick: geometry data, tree like structures which we use ltree for, full-text query constructs, and Yes XML we get bashed by some know-it-all who has a very narrow view of what a relational database should be doing and suggesting we use a NoSQL database, a graph engine or a full text engine or normalize our data more. I have also learned XML is a dirty word to many people. I mistakenly thought XML was a complex type people could relate to, but turns out they can relate to it so well that it brings up tragic memories I can only equate to Post Traumatic Stress Disorder suffered by war veterans or (early or wrong) technology adopters. That was not my intent either. XML was just merely an example. I will not say you should use XML in your tables, but I will also not say you should stay clear of it as many people wanted me to say. I will say its use is rare, but it has its place. It has its place just as any other complex type and it has its own special needs for navigation, indexing etc. which many relational databases handle fine enough.
Continue reading "The Relational Model is very much alive"
Saturday, December 03. 2011
A lot of redditers took offense at our article XPathing XML data with PostgreSQL with the general consensus, if you are going to be stuffing XML in a relational database where will you stop? That is not what relational databases are designed for. We had comitted a sacrilegious sin and worsed yet encouraging bad habits by forcing people to think more about different options they have for storing data in a relational database and god forbid demonstrating querying such columns with xml specific functions. What were we thinking? How dare we try to query XML data with SQL? Perhaps we were thinking like this guy or this guy, both equally misguided spatial relational database folk. Of course we stepped one foot further by actually defining a column as xml and dare storing data in it for later consumption rather than just an intermediary step.
If I want to store documents, that are navigateable I should be using a document database like MongoDb, CouchDB etc designed for that kind of stuff. If I've got graphs I should be using a graph database. This got me thinking that the "Pure Relational Database" is dead, and I'm surprised most people don't seem to realize it.
So while "Relational databases" have changed over the last 25 years, most people's notions of them have not kept up with the pace of its change.
First let me define what I mean by Pure. A pure relational database is one with standard meat and potato types like text, dates, numbers well suited for counting money and computing how close the world is to total bankruptcy which you store as fields in a row of a table and that you then define foreign keys / constraints / primary keys on to relate them to other tables. You reconstitute your real world objects by stitching these tables together with joins etc and return sets using where conditions, summarize by using group bys or other mathy like constructs. Don't get me wrong; these are very beautiful things because they allow for easy slicing of dimensions and not having to think about all the dimensions that make up an object all at once. In fact it was so beautiful that some people thought, "wow that's cool, but it would be even cooler if I could store more complex objects in those columns with their own specific needs for querying." and so was born the object relational database as some people refer to them that thought relational but also understood that different types had their own unique needs for querying, storage, indexing etc.
Nowadays most, if not all, relational like databases have standardized on some variant of SQL. In essence though, the pure relational database doesn't allow you to define new types or have exotic types such as arrays, xml, graphs, geometries, rasters, sparse matrices etc. Much less thinking involved and less likely you will shoot yourself in the foot by dumping a bunch of xml in a field and trying to do something with it. When it is used to store more complex things such as spreadsheets and other user documents, these are stored as blobs and just retrieved. Even such use is frowned upon.
Well most relational databases I can think of nowadays have richer types: e.g. PostgreSQL, Oracle and Firebird all support arrays as a column type. Some even allow you to define custom types and functions to support your custom types e.g. PostgreSQL (I could go on forever), Oracle has rich user defined type support too, and SQL Server 2005+ with each version getting better and better for user defined custom types and introducing more exotic types and support infrastructure. Even MySQL/Drizzle (mostly in the form of different storage engines). Even my favorite light-weight SQLite under the hood has some tricks that aren't what I would call relational. E.g. Spatialite/RasterLite has a whole geometry type library built on SQLite with functions you can call from SQL and I'm sure there are lots of middleware tools you don't know about using the SQLite and Firebird engine for more than relational tasks (e.g. HTML5 anyone/ CAD anyone).
Monday, September 26. 2011
Recommended Books: PostgreSQL 9 Admin Cookbook
UPDATE Turns out there is a simpler way of getting rid of roles that have explicit permissions to objects as Tom Lane pointed out in the comments.
Will drop the permissions to objects a user has rights to even if they don't own the objects. Of course this needs to be applied with caution since it will drop tables
and other things you don't want necessarily dropped. So it is best to first run a:
One of the things that is still tricky in PostgreSQL is permission management. Even though 9.0 brought us default privileges and the like, these permissions aren't retroactive so still a pain to deal with if you already have objects defined in your database.
One of the annoyances we come across with is deleting roles. Lets say you have a role and it has explicit permissions to an object. PostgreSQL won't allow you to delete this role if it owns objects or has explicit permissions to objects. In order to delete it seems you have to go in and clear out all those permissions. To help with that -- we wrote a quickie script that will generate a script to revoke all permissions on objects for a specific role. It looks like this:
Continue reading "Bulk Revoke of Permissions for Specific Group/User role"
Sunday, September 04. 2011
We've been working on converting some of our SQL Server apps to PostgreSQL. In this article we'll describe some things to watch out for and provide a function we wrote to automate some of the conversion.
Although both databases are fairly ANSI-SQL compliant, there are still differences with their CREATE TABLE statements, data types, and how they handle other things that makes porting applications not so trivial.
Continue reading "SQL Server to PostgreSQL: Converting table structure"
Friday, June 03. 2011
PostgreSQL 8.4 introduced the ability to create user-defined variadic functions. These are basically functions that take as input an undefined number of arguments where the argument that is an undefined number are all of the same type and are the last input arguments. Depesz went over it two years ago in Waiting for 8.4 variadic functions, so we are a bit late to the party. In a nutshell -- variadic functions are syntactic sugar for functions that would otherwise take arrays. In this article we'll provide some more demonstrations of them to supplement Depesz article.
I was reminded that I had never explored this feature, when recently documenting one of the new PostGIS 2.0 Raster functions - ST_Reclass which employs this feature. I think ST_Reclass is a superb function and one of my favorite raster functions thus far that I hope to put to good use soon. Our new PostGIS family member,Bborie Park, is running thru our PostGIS Raster milestones much faster than I had dreamed. He's already implemented a good chunk of stuff we discussed in Chapter 13 - PostGIS Raster and had stated you probably won't see in PostGIS 2.0. He's going a bit faster than I can catalog them, so the documentation is already embarrassingly behind the fantastic functionality that is already present in PostGIS 2.0.
Continue reading "Variadic Functions in PostgreSQL"
Tuesday, May 10. 2011
Recommended Books: PostGIS in Action PostgreSQL 9.0 Manual - Volume 1A: SQL Reference PostgreSQL 9.0 Reference - Volume 1B: SQL Command Reference
PostgreSQL 9 High Performance / Admin Cookbook combo
What is the difference between CURRENT_TIMESTAMP and clock_timestamp()Answer:
CURRENT_TIMESTAMP is an ANSI-SQL Standard variable you will find in many relational databases including PostgreSQL, SQL Server, Firebird, IBM DB2 and MySQL to name a few that records the start of the transaction. The important thing to keep in mind about it is there is only one entry per transaction so if you have a long running transaction, you won't be seeing it changing as you go along.
clock_timestamp() is a PostgreSQL function that always returns the current clock's timestamp. I don't think I'm alone in using it for doing simple benchmarking and other things where for example I need to record the timings of each part of a function within the function using pedestrian RAISE NOTICE debug print statements.
There is another cool way I like using it, and that is for a batch of records each with an expensive function call, benchmarking how long it takes to process each record. One of the things I'm working on is improving the speed of the tiger_geocoder packaged in PostGIS 2.0. The first root of attack seemed to me would be the normalize_address function which I was noticing was taking anywhere from 10% to 50% of my time in the geocode process. That's a ton of time if you are trying to batch geocode a ton of records. The thing is the function is very particular to how badly formed the address is so a whole batch could be held up by one bad apple and since the batch doesn't return until all are processed, it makes the whole thing seem to take a while.
So rather than looping thru each, I thought it would be cool if I could run the batch, but for each record have it tell me how long it took to process relative to the rest so I could get a sense of what a problem address looks like. So I wrote this query:
Which returned an output something like this:
address_1 | city | state | zip | pp_addr | the_time | process_time | diff_from_start ------------------+------------+-------+------- +-------------------------------------------+--------------+------------------ 48 MAIN ST .. | S.. | MA | 021.. | 48 MAIN .. | 2011-05-10 03:24:43.078-04 | 00:00:00.032 | 00:00:00.032 15 ... | | MA | 018... | 15 GREN... | 2011-05-10 03:24:50.796-04 | 00:00:00.031 | 00:00:07.75
(Page 1 of 6, totaling 83 entries) » next page
Syndicate This Blog
Show tagged entries