
PostGIS in Action
Book Store
About the Authors
Consulting
PostgreSQL
PostGIS
Wednesday, January 25. 2012True or False every which wayRecommended Books: PostgreSQL 9.0 Volume 1 SQL Guide PostgreSQL has aggregate functions called bool_and and bool_or which it's had for as far back as I can remember. What do they do? given rows of premises (things that resolve to booleans), bool_and will return true if all of the premises are true. Similarly bool_or will return true if any of the premises in the set of rows is true. What if however your boolean expressions are not in rows, but instead passed in as a sequence of arbitrary statements of questionable fact. We want a function like bool_or or bool_and that takes an arbitrary number of boolean arguments. Are there functions that fit the bill. Indeed there are, but they don't scream out and say I work with booleans because they fit into a class of function we discussed in The wonders of Any Element and that also happen to be variadic functions. These are none other than greatest and least and they are old timer functions that you can find in most versions of PostgreSQL. We'll demonstrate how to use all 4 with booleans in this article. It must be said that greatest and least are much more useful when applied to other data types like dates and numbers, but we were amused at the parallel with booleans. Side note: we've started to write our book on PostgreSQL that will be published by O'Reilly. We hope to finish this book within the next 5-7 months but have preliminary e-Book drafts before then for early purchasers to scan.
The focus of the book will be PostgreSQL 9.1 with some highlights the upcoming PostgreSQL 9.2. Of course oldie but goodie topics are in the book too. It's a thrill to be writing again.
Continue reading "True or False every which way" Monday, January 16. 2012Table Inheritance and the tableoidIf I could name a number one feature I love most about PostgreSQL, it's the table inheritance feature which we described in How to Inherit and Uninherit. A lot of people use it for table partitioning using CONSTRAINT EXCLUSION. Aside from that, in combination with PostgreSQL schema search_path (customizable by user and/or database) it makes for a very flexible abstraction tool. For example, for many of our web apps that service many departments where each department/client wants to keep a high level of autonomy, we have a schema set aside for each that inherits from a master template schema. Each department site uses a different set of accounts with the primary schema being that of the department/client so that they are hitting their own tables. Inheritance allows us to keep data separate,do roll-up reports if we need to, use the same application front-end, and yet allows us the ability to add new columns in just one place (the master template schema). It is more flexible than other approaches because for example we may have a city organization that need to share tables, like for example a system loaded list of funding source shared across the agency. We can set aside these shared tables in a separate schema visible to all or have some have their own copy they can change if they don't want to use the shared one. Every once in a while, we find ourselves needing to query the whole hierarchy and needing to know which table the results of the query are coming from. To help solve that issue, we employ the use of the system column tableoid which all user tables have. The tableoid is the the object id of a table. PostgreSQL has many system columns that you have to explicitly select and can't be accessed with a SELECT * with the tableoid being one of them. These are: tableoid, cmax,cmin, xmin,xmax,ctid which are all described in System Columns. The PostgreSQL docs on inheritance have examples of using it, but we thought it worthwile to repeat the exercise since it's not that common knowledge and is unique enough feature of PostgreSQL that others coming from other relational databases, may miss the treat. I've often demonstrated it to non-PostgreSQL users who use for example SQL Server or MySQL, and they literally fall out of their chair when I show the feature to them and its endless possibilities. Continue reading "Table Inheritance and the tableoid"
Posted by Leo Hsu and Regina Obe
in 8.2, 8.3, 8.4, 9.0, 9.1, 9.2, beginner, q&a
at
05:52
| Comments (2)
| Trackbacks (0)
Monday, November 21. 2011How big is my database and my other stuffRecommended Books: PostgreSQL 9.0 SQL reference Volume 1A SQL Reference Volume 1B In this day and age of clouds and what not, it is still important to keep tabs on your database growth to ensure it doesn't outgrow your disks or so that you can be prepared when it does. A common question that is asked is how big your database is or all your databases combined. Luckily PostgreSQL has had for a long time functions to help you answer this easily. In this little article, I'll demonstrate how to answer these questions and more with some PostgreSQL built-in functions Continue reading "How big is my database and my other stuff"
Posted by Leo Hsu and Regina Obe
in beginner, q&a
at
01:59
| Comment (1)
| Trackbacks (0)
Defined tags for this entry: postgresql
Thursday, November 03. 2011What objects are in a PostgreSQL installed extensionAs mentioned in Lessons Learned Packaging PostGIS extensions, I am working on PostGIS 2.0.0 extension packaging. One of the things I wanted to know was what objects, types, functions etc were installed by my extension. The new packaging system allows for cataloging this relatively easily, but I couldn't find a function or view for this and didn't see one mentioned in the manual, so I created this query which seems to work pretty well as far as I can tell. The basic idea being that any object that an extension depends on that is not an extension is part of the extension package.
The output looks like: Continue reading "What objects are in a PostgreSQL installed extension"
Posted by Leo Hsu and Regina Obe
in 9.1, intermediate, postgis, q&a
at
16:55
| Comments (3)
| Trackbacks (0)
Defined tags for this entry: extensions, postgis
Tuesday, November 01. 2011How to create an n-column table really fastRecommended Books: PostgreSQL 9.0 SQL Reference 1A PostgreSQL 9.0 SQL Reference 1B Ever have the need to create a holding table say spreadsheet data with say 100 columns. You need to create a table to hold this stuff. Or perhaps you were feeling in a sadist mood and wanted to abuse your PostgreSQL database to see how many columns you can create in a table of a specific data type. Here is a quick script to do it:
Both variants will return output that looks like this: CREATE TABLE data_import(field1 varchar(255),field2 varchar(255),field3 varchar(255),field4 varchar(255)
,field5 varchar(255),field6 varchar(255),field7 varchar(255)
,field8 varchar(255),field9 varchar(255),field10 varchar(255));
Now if you want it to also execute because you are running it as part of an sql script, you could wrap it in an anonymous function.
Posted by Leo Hsu and Regina Obe
in 8.4, 9.0, 9.1, 9.2, beginner, postgresql versions, q&a
at
22:31
| Comments (0)
| Trackbacks (0)
Thursday, October 27. 2011PostgreSQL Array: The ANY and Contains trickRecommended Books: PostgreSQL 9.0 SQL Reference 1A PostgreSQL 9.0 SQL Reference 1B One of the main features I love about PostgreSQL is its array support. This is a feature you won't find in most relational databases, and even databases that support some variant of it, don't allow you to use it as easily. It is one of the features that makes building aggregate functions wicked easy in PostgreSQL with no messy compiling required. Aside from building aggregate functions, it has some other common day uses. In this article, I'll cover two common ways we use them which I will refer to as the ANY and Contains tricks. I like to think of this approach as YeSQL programming style: how SQL can be augmented by more complex data types and index retrieval mechanisms. Arrays and many other data types (spatial types, keyvalue (hstore), ltree etc) are far from relational structures, yet we can query them easily with SQL and can even relate them. Continue reading "PostgreSQL Array: The ANY and Contains trick" Monday, September 26. 2011Bulk Revoke of Permissions for Specific Group/User roleRecommended Books: PostgreSQL 9 Admin Cookbook UPDATE Turns out there is a simpler way of getting rid of roles that have explicit permissions to objects as Tom Lane pointed out in the comments.
Will drop the permissions to objects a user has rights to even if they don't own the objects. Of course this needs to be applied with caution since it will drop tables
and other things you don't want necessarily dropped. So it is best to first run a:
One of the things that is still tricky in PostgreSQL is permission management. Even though 9.0 brought us default privileges and the like, these permissions aren't retroactive so still a pain to deal with if you already have objects defined in your database. One of the annoyances we come across with is deleting roles. Lets say you have a role and it has explicit permissions to an object. PostgreSQL won't allow you to delete this role if it owns objects or has explicit permissions to objects. In order to delete it seems you have to go in and clear out all those permissions. To help with that -- we wrote a quickie script that will generate a script to revoke all permissions on objects for a specific role. It looks like this: Continue reading "Bulk Revoke of Permissions for Specific Group/User role"
Posted by Leo Hsu and Regina Obe
in 8.4, 9.0, 9.1, intermediate, mysql, q&a, sql server
at
00:11
| Comments (3)
| Trackbacks (0)
Sunday, July 31. 2011Querying table, view, column and function descriptionsRecommended Books: PostGIS in Action
PostgreSQL 9.0 Volume 1 SQL Reference One of the biggest complaints about software and perhaps even more specifically FLOSS software is the lack of documentation. I'm not talking about those small little dialogs we throw in our code that rarely see the light of day. I'm talking about stuff you throw in user docs or specifications and so forth that an analyst or end-user reads. The main reasons for this lack of documentation is that while everyone seems to want documentation, few really want to pay for it, and it's time consuming to keep documentation in synch with what the software actually does. Having documentation that is very inconsistent with the software is worse than not having any documentation at all. A good way to keep documentation up to date is to fold it into the process of developing and testing the software such that changes in software expose errors in the documentation and vice-versa or that the act of changing the software corrects the documentation. Part of the way we try to do that on the PostGIS project is to require each function patch to include documentation. We also try to make our documentation executable by auto-generating PostgreSQL help and some test suites from the documentation. For example the process of installing documentation-generated function and type descriptions in PostgreSQL catches inconsistencies between the implementation and what we actually publish about how the functions work. Even the image generator that builds most of the images in the manual is built with PostGIS as a core of it so that when our build-bot is rebuilding the documentation it is exercising some PostGIS output functions. I can't tell you how many times I've flagged changes in the current PostGIS 2.0 release simply by trying to install the docbook generated PostgreSQL comment descriptions and PostgreSQL complains that the function signature no longer exists that the documentations says should be there. So then I go back and correct the documentation or yell at the programmer if the documentation makes more sense than what they coded. On the other side, its easy to catch what we neglected to document simply by scanning the functions in PostgreSQL and seeing which ones don't have descriptions. Okay as far as databases goes, many a DB Programmer/Analyst has waltzed into a project only to be befuddled about the meanings of all these tables used by the application. If you are like us, you don't even want to look at any database documentation that is not part of the definition of the database because you know 99% of the time it's so obsolete or was just a pipe dream of someone working in a vacuum that its more useless than not having any documentation at all. It is nice to have nicely formatted documentation you can read separate from the code, but even sweeter if you can query the documentation just as easily as you can query the data. A good way of having up to date documentation is to weave it in as part of the process of defining the structure. For databases this means using foreign keys, primary keys, and using the commenting features that most relational databases offer these days. For example the databases we commonly work with, PostgreSQL, SQL Server, MS Access, and even MySQL all allow you to provide descriptions for tables, table columns, and sometimes other objects such as functions and stored procs right in the database. PostgreSQL even allows you to provide descriptions of columns in views though that's a bit messier to do. Sadly there isn't a consistent way of pulling these descriptions out of the database that will work for all of these. Each has differently defined meta tables it stores these descriptions in. For thise article, we'll demonstrate how to pull this information from PostgreSQL. PostgreSQL does offer many useful switches in psql for querying this data, but we'll focus our attention on pulling this data via SQL. It's much easier to incorporate this information in auto-generated documentation with SQL because you can have more control what you include and how to format it. Continue reading "Querying table, view, column and function descriptions"
Posted by Leo Hsu and Regina Obe
in beginner, postgis, q&a
at
05:07
| Comments (4)
| Trackbacks (0)
Defined tags for this entry: postgis documentation
Sunday, May 22. 2011Manually setting table column statisticsRecommended Books: PostgreSQL 9.0 Volume 1 SQL Reference PostgreSQL 9 High Performance PostGIS in Action Question: How do you deal with bad stat counts?You have a large table and the default planner stats collector underestimates distinct counts of a critical query column thus resulting in much less than optimal query plans. How can you manually set this? PostgreSQL 9.0 introduced ability to set two settings on table columns: n_distinct and n_distinct_inherited which are described a bit in ALTER TABLE help. The n_distinct is the estimated number of distinct values for that column with -1 or any negative number representing a percentage of estimated table count instead of a true count. n_distinct_inherited is a setting useful for parent tables that denotes the estimated distinct count sum of all a parent's child tables. By tweaking these settings when they are less than optimal, you can influence the query planner to produce better plans. Why this is necessary is mostly for large tables where the stat collector will not query the whole table to determine stats. The stats collector generally queries at most 10-30% of a table. Determine If you need to set countsIt's always nice to have the stat collector do all these things for you especially if you have a table that is constantly updated and distinct counts can fluctuate a lot. For static tables you may just want to set them manually. So how do you know whether you should bother or not. Well you can check the current values the stats collector has with this query:
You would then compare with your actuals
Will give you the current count. Setting n_distinct and n_distinct_inheritedYou may want to bump this up or down when you set the value. Next to set the column distinct count stats you would do something like below replacing 50 with the count you computed:
Posted by Leo Hsu and Regina Obe
in 9.0, 9.1, intermediate, postgresql versions, q&a
at
10:06
| Comments (0)
| Trackbacks (0)
Tuesday, May 10. 2011Difference Between CURRENT_TIMESTAMP and clock_timestamp() and how to exploit themRecommended Books: PostGIS in Action PostgreSQL 9.0 Manual - Volume 1A: SQL Reference
PostgreSQL 9.0 Reference - Volume 1B: SQL Command Reference PostgreSQL 9 High Performance / Admin Cookbook combo
Question:
What is the difference between CURRENT_TIMESTAMP and clock_timestamp() Answer:CURRENT_TIMESTAMP is an ANSI-SQL Standard variable you will find in many relational databases including PostgreSQL, SQL Server, Firebird, IBM DB2 and MySQL to name a few that records the start of the transaction. The important thing to keep in mind about it is there is only one entry per transaction so if you have a long running transaction, you won't be seeing it changing as you go along. clock_timestamp() is a PostgreSQL function that always returns the current clock's timestamp. I don't think I'm alone in using it for doing simple benchmarking and other things where for example I need to record the timings of each part of a function within the function using pedestrian RAISE NOTICE debug print statements. There is another cool way I like using it, and that is for a batch of records each with an expensive function call, benchmarking how long it takes to process each record. One of the things I'm working on is improving the speed of the tiger_geocoder packaged in PostGIS 2.0. The first root of attack seemed to me would be the normalize_address function which I was noticing was taking anywhere from 10% to 50% of my time in the geocode process. That's a ton of time if you are trying to batch geocode a ton of records. The thing is the function is very particular to how badly formed the address is so a whole batch could be held up by one bad apple and since the batch doesn't return until all are processed, it makes the whole thing seem to take a while. So rather than looping thru each, I thought it would be cool if I could run the batch, but for each record have it tell me how long it took to process relative to the rest so I could get a sense of what a problem address looks like. So I wrote this query:
Which returned an output something like this:
address_1 | city | state | zip | pp_addr | the_time | process_time | diff_from_start
------------------+------------+-------+------- +-------------------------------------------+--------------+------------------
48 MAIN ST .. | S.. | MA | 021.. | 48 MAIN .. | 2011-05-10 03:24:43.078-04 | 00:00:00.032 | 00:00:00.032
15 ... | | MA | 018... | 15 GREN... | 2011-05-10 03:24:50.796-04 | 00:00:00.031 | 00:00:07.75
Posted by Leo Hsu and Regina Obe
in 8.4, 9.0, 9.1, cte, db2, firebird, intermediate, mysql, postgresql versions, q&a, sql server, window functions
at
04:23
| Comments (0)
| Trackbacks (0)
Saturday, February 19. 2011Conditional Uniqueness with Partial IndexesRecommended Books: PostgreSQL 9.0 SQL Language Official Reference PostGIS in Action
Question
You have a system of products and categories and you want a product to be allowed to be in multiple categories, but you want a product to only be allowed to be in one main category. How do you enforce this rule in the database? Some people will say -- why can't you just deal with this in your application logic. Our general reason is that much of our updating doesn't happen at our application level. We like enforcing rules at the database level because it saves us from ourselves. We are in the business of massaging data. For this particular kind of example we wanted to make sure the database would provide us a nice safety net so that we wouldn't accidentally assign a product in two main categories. AnswerThere are two approaches we thought of. One is the obvious have a primary category column and a bridge table that has secondary categories. That is an ugly solution because when you do a query you have to do a union and always treat the secondary categories as different from the main. For most use-cases we don't usually care about distinguisihing primary from secondary category. The solution we finally settled on was to have one bridge table with a boolean field for if its the main category. We enforce the only one main category requirement using a partial index. Now not all databases support partial indexes This is one major value of using PostgreSQL that you have so many more options for implementing logic. As some people noted in the comments and the reddit entry. SQL Server 2008 has a similar feature called Filtered Indexes. Though PostgreSQL has had partial indexes for as far back as I can remember. This brings up an interesting point which I have observed -- if you were using PostgreSQL before, you would already know how to use the Filtered Indexes, Multi row inserts introduced in SQL Server 2008, and the SEQUENCES feature coming in SQL Server 2010. So we should all use PostgreSQL, because it teaches us how to use the newer versions of SQL Server before they come out. :)
So how does the partial index solution look: NOTE for simplicity, we are leaving out all the complimentary tables and the foreign key constraints that we also have in place.
Testing it out. It saves us and gives us a nice informative message to boot.
Posted by Leo Hsu and Regina Obe
in 8.2, 8.3, 8.4, 9.0, 9.1, beginner, postgresql versions, q&a, sql server
at
23:12
| Comments (3)
| Trackbacks (0)
Tuesday, January 18. 2011Reverse String Aggregation: Explode concatenated data into separate rows In PostgreSQL, SQL Server and MySQLRecommended Books: PostgreSQL 9 SQL Reference Manual SQL Server MVP Deep Dives High Performance MySQL In our last article we talked about String Aggregation implementing in PostgreSQL, SQL Server, and MySQL. This task is one that makes purist relational database programmers a bit squeamish. In this article we'll talk about the reverse of that, how do you deal with data that someone hands you delimeted in a single field and that you are asked to explode or re-sort based on some lookup table. What are the benefits of having a structure such as? : Well for the casual programmer or simple text file database that knows nothing about JOINS and so forth, it makes it simple to pull a list of people who like Tumbling.
You simply do a WHERE ';' || activities || ';' LIKE '%;Tumbling;%'. It's great for security too because you can determine security with a simple like check and also list all the security groups a member belongs in without doing anything.
Quite easy for even the least data-skilled of programmers to work with because most procedural languages have a split function that can easily parse these into an array useful for stuffing into drop down lists and so forth. As a consultant of semi-techie people
I'm often torn by the dilemma of "What is the way I would program for myself vs. the way that provides the most autonomy to the client". By that I mean
for example I try to avoid heavy-weight things like Wizards that add additional bloated dependencies or slow the speed down of an application. These bloated dependencies may provide ease to the client but make my debugging life harder. So I weight the options
and figure out which way works now and also provides me an easy escape route should things like speed or complexity become more of an issue.
This brings us to the topic of, what is wrong with this model? It can be slow because the LIKE condition you have can't easily take advantage of an index unless using a full text index so not ideal where this is the primary filtering factor. It's also prone to pollution because you can't easily validate that the values in the field are in your valid set of lookups or if your lookup changes, the text can be forced to change with a CASCADE UPDATE/DELETE RULE etc. In cases where this is of minor consequence which is many if referential integrity is not high on your list of requirements, this design is not bad. It might make a purist throw up but oh well there is always dramamine to fall back on. As long as you have done your cost benefit analysis, I don't think there should be any shame of following this less than respected route. While you may despise this model, it has its place and it's a fact of life that one day someone will hand it to you and you may need to flip it around a bit. We shall demonstrate how to do that in this article. Continue reading "Reverse String Aggregation: Explode concatenated data into separate rows In PostgreSQL, SQL Server and MySQL"
Posted by Leo Hsu and Regina Obe
in beginner, mysql, postgresql versions, q&a, sql server
at
18:50
| Comments (4)
| Trackbacks (0)
Friday, December 24. 2010String Aggregation in PostgreSQL, SQL Server, and MySQLRecommended Books: PostgreSQL 9 SQL Reference Manual SQL Server MVP Deep Dives High Performance MySQL Question: You have a table of people and a table that specifies the activities each person is involved in. You want to return a result that has one record per person and a column that has a listing of activities for each person separated by semicolons and alphabetically sorted by activity. You also want the whole set alphabetically sorted by person's name. This is a question we are always asked and since we mentor on various flavors of databases, we need to be able to switch gears and provide an answer that works on the client's database. Most often the additional requirement is that you can't install new functions in the database. This means that for PostgreSQL/SQL Server that both support defining custom aggregates, that is out as an option. Normally we try to come up with an answer that works in most databases, but sadly the only solution that works in most is to push the problem off to the client front end and throw up your hands and proclaim -- "This ain't something that should be done in the database and is a reporting problem." That is in fact what many database purists do, and all I can say to them is wake up and smell the coffee before you are out of a job. We feel that data transformation is an important function of a database, and if your database is incapable of massaging the data into a format your various client apps can easily digest, WELL THAT's A PROBLEM. We shall now document this answer rather than trying to answer for the nteenth time. For starter's PostgreSQL has a lot of answers to this question, probably more so than any other, though some are easier to execute than others and many depend on the version of PostgreSQL you are using. SQL Server has 2 classes of answers neither of which is terribly appealing, but we'll go over the ones that don't require you to be able to install .NET stored functions in your database since we said that is often a requirement. MySQL has a fairly simple, elegant and very portable way that it has had for a really long time. Continue reading "String Aggregation in PostgreSQL, SQL Server, and MySQL"
Posted by Leo Hsu and Regina Obe
in 8.2, 8.3, 8.4, 9.0, cte, db2, intermediate, mysql, oracle, postgresql versions, q&a, sql server, window functions
at
11:24
| Comments (14)
| Trackbacks (0)
Defined tags for this entry: common table expressions, mysql, oracle, postgresql 9.0, sql server, string concatenation, window functions
Friday, November 05. 2010greatest and least - oldies but goodiesRecommended Books: PostgreSQL 8.4 Official The SQL Language
MySQL Administrator's Bible SQL Server MVP Deep Dives Oracle 11G Essentials Problem: You have a set of numbers, or characters or whatever and you are trying to find the max or min of this set?If the values are separate records in a table or query, the answer is well known and respected across all relational databases -- use the aggregate MAX and MIN functions. But what if you have a set of free wheeling numbers or text not in separate records, and you want the max or min of each. Here is where the least and greatest functions come in handy. PostgreSQL has had these functions for as far back as I can remember and is not the only database to sport these marvelous functions. Our beloved MySQL and Oracle database have these functions as well. Sadly our more beloved SQL Server even in the SQL Server 2008 variant - lacks these functions. Okay how to use these functions -- you use it like this:
We would classify these functions along the lines of COALESCE. They are like COALESCE because they take an arbitrary number of arguments and the datatype that is returned is highest datatype that all arguments in the function can be autocast to. If there is no autocast then well you get an error. To demonstrate, guess what happens when you do this: SELECT least(-1, 'Kitty');
Well do this in PostgreSQL at least in 8.3+, you get a nice slap if you haven't installed any deprecated autocasts: ERROR: invalid input syntax for integer: "Kitty" LINE 1: SELECT least('Kitty', -1)Do this in MySQL - so friendly and forgiving, and great reader of minds and you get: -1
OF COURSE I apologize for the ambiguous sarcasm, its just sometimes I want my mind read and sometimes I don't and I just can't figure out whether today is one of those days or the other day.
Posted by Leo Hsu and Regina Obe
in 8.2, beginner, mysql, oracle, other dbms, q&a, sql server
at
01:50
| Comments (0)
| Trackbacks (0)
Thursday, August 12. 2010Starting PostgreSQL in windows without installRecommended Books: Windows Command Line Pocket Consultant Windows scripting Learning Guide This is a question that comes up quite often by windows users, so thought we would share how we normally do it. The question is can you run a PostgreSQL server on your windows desktop/server box without having to install anything? The answer is yes and quite easily. Why would you need to do this. There are a couple of cases -- one you are developing a single user app that you want users to be able to run from anywhere without having to install it first. The other common reason is, you aren't allowed to install anything on a user's pc and you also want to package along a database you already have created. For our purposes, many of our developers develop on portable WAMP like things, and for some of our applications, they need to work in both MySQL and PostgreSQL, so we need an easy way during development to swap one out for the other. Continue reading "Starting PostgreSQL in windows without install"
Posted by Leo Hsu and Regina Obe
in 8.2, 8.3, 8.4, 9.0, beginner, q&a
at
19:25
| Comments (20)
| Trackbacks (0)
(Page 1 of 4, totaling 50 entries)
» next page
|
QuicksearchCalendar
CategoriesArchivesSyndicate This BlogBlog AdministrationShow tagged entries |
|||||||||||||||||||||||||||||||||||||||||||||||||
