Sunday, September 20. 2009
Printer Friendly
A while ago we wrote about DZone RefCards cheatsheets and how its a shame there isn't one for PostgreSQL. They are a very attractive and useful vehicle for learning and brushing up on the most important pieces
of a piece of software or framework. Since that time we have been diligently working on one for PostgreSQL to fill the missing PostgreSQL slot. The fruits of
our labor are finally out, and a bit quicker than we expected. The cheatsheet covers both old features and new features introduced in PostgreSQL 8.4. We hope its useful to many old and new PostgreSQL users.
The Essential PostgreSQL Refcard can be downloaded from Essential PostgreSQL http://refcardz.dzone.com/refcardz/essential-postgresql?oid=hom12841
Thursday, July 30. 2009
Printer Friendly
PostgreSQL has supported what are called Out (output) parameters since version 8.1.
We were surprised it has been that long since we always thought of it as a feature from 8.2+
until it recently came up for discussion on PostGIS newsgroup and we decided to investigate how
long it has been supported.
What are OUT parameters? These are parameters you define as part of the function argument
list that get returned back as part of the result. When you create functions, the arguments
are defaulted to IN parameters when not explicitly specified (which means they are passed in and not returned) which
is why you sometimes see PgAdmin do something like IN somevariable variabletype when you use
the function wizard.
You can have INOUT parameters as well which are function inputs that both get passed in, can be modified
by the function and also get returned.
As a side note - In 8.4, PostgreSQL was enhanced to allow dynamic sql RETURN QUERY using RETURN QUERY EXECUTE syntax for plpgsql queries and also
allow set returning functions being called in the SELECT part for any pl language. In prior versions,
this was only a feature of PL functions written in SQL.
8.3 introduced RETURN query which required a static sql statement, but did make things a bit easier.
One of the common use cases for using OUT parameters is to be able to return multiple outputs from a function without having
to declare a PostgreSQL type as output of the function. In this article we shall cover all variants of this. We'll just focus on sql and plpgsql for this
discussion, since we are not sure to what extent other pl languages (if at all) support IN OUT.
Continue reading "Use of OUT and INOUT Parameters"
Tuesday, June 09. 2009
Printer Friendly
You'll often hear the term planner statistics thrown around by database geeks. Did you update your statistics. This lingo isn't even limited
to PostgreSQL, but is part and parcel to how most decent databases work. For example in PostgreSQL you do a vacuum analyze to update your planner statistics in addition
to cleaning up dead space. In SQL Server you do an UPDATE STATISTICS . In MySQL you do an
ANALYZE TABLE or a more invasive OPTIMIZE TABLE .
Normally all this "update your stats so your planner can be happy" is usually unnecessary unless
you just did a bulk load or a bulk delete or you are noticing your queries are suddenly slowing down. These stat things are generally updated behind the scenes by most databases
on an as needed basis.
What makes SQL really interesting and a bit different from procedural languages is that it is declarative (like functional and logical programming languages) and relies on the database planner to come up with strategies for navigating the data. Its strategy is not fixed as it is in procedural languages.
A big part of this strategy is decided on by the query planner which looks at distributions of data. Given different WHERE conditions for similar queries, it could come up with vastly different strategies if one value has a significantly
higher distribution in a table than another. This is also the mystery of why it sometimes refuses to use an index on a field because it has decided a table scan is more efficient and also why some people consider HINTS evil because they pollute the imperative nature of the language.
Continue reading "Planner Statistics"
Thursday, May 21. 2009
Printer Friendly
Someone asked me this recently and not playing with custom types much, I'm not sure this is
the best solution. Anyrate here it goes. Note this works for PostgreSQL 8.2 and above. Note sure about 8.1 etc.
I assume it probably does.
Let us say you created a custom type something like this:
CREATE TYPE my_type1 AS
(name varchar(150),
rotation_x double precision,
rotation_y double precision,
x_pos integer,
y_pos integer
);
Continue reading "Creating instance of custom type"
Sunday, April 26. 2009
Printer Friendly
Different Linux distros have their preferred place of where stuff goes and of course the default location on windows is completely different from that too. So there isn't really one default location where you can find PostgreSQL data cluster. Of course user's can pick their
locations as well. So what is a casual DBA supposed to do?
The pg_settings table
PostgreSQL has a convenient system table view called pg_settings that stores a lot of information. It stores the location of the data cluster, the pg_hbafile and other conf files.
In additon to that you can interogate it to find out information you will find in the postgresql.conf file. Why sift thru that postgresql.conf file (assuming you can already query your postgresql server) when you can find the answers you are looking
for with an SQL query?
Continue reading "Where is my data and other stuff"
Saturday, April 18. 2009
Printer Friendly
This question is one that has come up a number of times in PostGIS newsgroups worded in many different ways. The situation is that if you use a function a number of times
not changing the arguments that go into the function, PostgreSQL still insists on recalculating the value even when the function is marked IMMUTABLE. I have tested this on
8.2 and 8.3 with similarly awful results.
This issue is not so much a problem if function calculations are fast, but spatial function calculations relative to most other functions you will use are pretty
slow especially when dealing with large geometries. As a result your query could end up twice as slow. Even setting the costs of these functions to relatively high does not help the situation.
To demonstrate here is a non-PostGIS version of the issue that everyone should be able to run and demonstrates its not a PostGIS only issue.
Continue reading "How to force PostgreSQL to use a pre-calculated value"
Saturday, March 28. 2009
Printer Friendly
Even though others have blogged about this in the past and its well-documented in the docs, its a frequently enough asked question, that we thought we'd post it here again
with a couple of additional twists.
How to determine the size of a database on disk
SELECT pg_size_pretty(pg_database_size('somedatabase')) As fulldbsize;
How to determine the size of a database table on disk
NOTE: There are two functions in PostgreSQL - pg_relation_size and pg_total_relation_size. The pg_relation_size just measures the size of the actual table where
as the pg_total_relation_size includes both the table and all its toasted tables and indexes.
SELECT pg_size_pretty(pg_total_relation_size('someschema.sometable')) As fulltblsize,
pg_size_pretty(pg_relation_size('someschema.sometable')) As justthetblsize;
How to determine the size of a database schema
When you are doing a lot of data loading for later massaging to dump into your production tables, you will often generate junk tables in the process and worse yet other
people may generate junk tables period. As a general practice, we like to create a schema just for junk. The reason for that is that it is extremely easy to
exclude schemas from being backed up and to load individual schemas. So with this approach we create or move all tables we are not quite sure if they are needed or we only need temporarily
into a schema
called junk, scratch, possiblejunk or whatever you want to call the schema and exclude it from our backups.
Something of the form:
CREATE SCHEMA scratch;
ALTER TABLE someschema.joeyplaying SET SCHEMA scratch;
After saving the world from junk, you would like to know how much space you have saved your backups from backing up. So you would employ a query something of the form:
SELECT pg_size_pretty(pg_database_size('mycurrentdb')) As fullprod,
pg_size_pretty(CAST(pg_database_size('mycurrentdb') - (SELECT SUM(pg_total_relation_size(table_schema || '.' || table_name) )
FROM information_schema.tables WHERE table_schema = 'scratch') As bigint)) As tobebackedup_size,
pg_size_pretty(CAST((SELECT SUM(pg_total_relation_size(table_schema || '.' || table_name) )
FROM information_schema.tables
WHERE table_schema = 'scratch') As bigint) ) As junk_size;
Size of Geometries in PostGIS tables
PostGIS has a companion function for measuring geometry size which is useful when you want to get a sense of how much space your geometries are taking up on disk
for a set of records.
SELECT ST_Mem_Size(ST_GeomFromText('LINESTRING(220268 150415,220227 150505,220227 150406)'));
SELECT pg_size_pretty(CAST(SUM(ST_Mem_Size(the_geom)) As bigint) ) as totgeomsum
FROM sometable WHERE state = 'MA';
Friday, March 13. 2009
Printer Friendly
In our PLPython Part 4: PLPython meets aggregates we demonstrated the power of PostgreSQL in combination
with PLPython to create a sequence of ascii plots by using the power of aggregation. Our friend Simon Greener over at Spatial Db Advisor told me the example
was clever but ascii plots was so 70ish and ugly, why didn't I do SVG plots? He felt people being predominantly visual would be caught up in the ugliness of Ascii plots and miss the point.
At least Bob found them neat and can appreciate the elegance in simplicity, thanks Bob..
To make it up
to Simon, we shall demonstrate 2 features we left out in our prior article.
- Pretty SVG graphs -- so I don't have to hear more about SVG from Simon and how I'm caught in the 70s.
- Ability to plot a schedule -- schedule item, start date, end date using a multi-column aggregate function. Here is a good use for multi-column aggregates or at least I think so. (Yeh for Multi-column aggregates)
If you are into spatial processing in the database or how many ways you can play with XML particularly Oracle Locator, Oracle Spatial, PostGIS and SQL Server 2008, we highly recommend Simon's articles.
Continue reading "PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots"
Thursday, March 05. 2009
Printer Friendly
PostgreSQL 8.2 and above has this pretty neat feature of allowing you to define aggregate functions
that take more than one column as an input. First we'll start off with a rather pointless but easy to relate to example and then we'll follow up with something
a bit more interesting.
For more examples of creating aggregates in PostgreSQL, check out our other articles:
Continue reading "How to create multi-column aggregates"
Thursday, February 19. 2009
Printer Friendly
This article is a bit of a companion to our article on Setting up PostgreSQL as a Linked Server in Microsoft SQL Server 64-bit
In this article we shall demonstrate using Microsoft SQL Server 2005/2008 OPENQUERY AND OPENROWSET to add, delete and update data in PostgreSQL.
First we must start by saying there are a number of ways of copying data between databases. While OPENROWSET is not necessarily the fasted,
in certain cases such as when you are wrapping this in a stored procedure, it is one of the most convenient ways of doing this.
Why on earth would you want to copy data back and forth between 2 servers and 2 disparate DBMS systems for that matter?
We all would like to think we are an island and live in a world with one DBMS system, but we don't. There are many reasons for having multiple DBMS providers in
an organization. Some are better for some things than others, some are more integrated in an environment -- (for example in a windows shop the SQL Server drivers are already loaded on all
windows machines, but PostgreSQL provides the advantage of being able to run on more platforms such a FreeBSD/Unix/Linux box and with cheaper cost and more options for PL programming so is often better for a front-facing DMZ accessible database),
and there are numerous other reasons that are too hard to itemize. The other question of why triggering from SQL Server instead of PostgreSQL is because
its just a little easier from Microsoft SQL Server. The OPENROWSET and OPENQUERY logic that SQL Server provides is just simply better and easier to use than the dblink provided for PostgreSQL. Anyrate with that said lets move on with the show.
Although this example is focused on using PostgreSQL with Microsoft SQL Server, the same technique applies when
copying retrieving updating data from other databases such as MySQL or Oracle or DB II.
Continue reading "Using Microsoft SQL Server to Update PostgreSQL Data"
Thursday, February 05. 2009
Printer Friendly
In our August 2008/ September 2008 issue we demonstrated the power of PostgreSQL to create median and MS Access-like first and last aggregate functions in
SQL language. In this article we shall demonstrate how to create aggregates with Python. We shall
call this function agg_plot. What it will do is plot each grouping of data and return a plot for each grouping. The steps
we covered in those articles can be applied here.
For this part, we shall use the same library we discussed in PLPython Part 3: Using custom classes, pulling data from PostgreSQL.
Continue reading "PLPython Part 4: PLPython meets aggregates"
Sunday, January 04. 2009
Printer Friendly
Problem
You are developing a consultant search application where by a user looking for a consultant can check all the items in the list they require a consultant to have.
You have 2 tables. consultant and consultant_qual. The consultant_qual has primary key formed by 2 fields consultant_id and qual_id.
qual_id contains a constrained list
with things like 'OpenLayers', 'PostGIS', 'Python', 'C#' etc.
How do you devise a query that given a list of checked options by the user, only returns consultants with not 1 but all of those qualifications?
Continue reading "How to require all checked conditions are met by a result"
Saturday, January 03. 2009
Printer Friendly
In the first part of this series Fusion Charts and PostgreSQL Part 1: Database Analysis of USDA DB in our November/December 2008 issue, we did
some back-seat analysis of a database we had never seen before and formulated some thoughts of things that would be useful to
see in a dashboard as well as starting to develop some views to support our Dashboard. In the second part of our Fusion Charts series,
we covered creating a Dashboard application in ASP.NET that demonstrated both VB.NET and C# variants using the database we loaded and prepped in Part 1.
In this part three of our series, we shall conclude by demonstrating the same application we did in ASP.NET in PHP.
We are going to create a simple dashboard that has the following features:
- A drop-down list to allow the user to pick the kind of chart to display the data in (Bar, column, funnel etc)
- A drop-down list that allows the user to pick the metric to explore -- e.g. Cholestrol, Vitamin K, Caffeine etc.
- 2 charts -- one chart showing the top 5 food groups for our metric and another showing the top 5 foods for our metric
Our final product will look like this:
You can see the app in action - USDA Food Stats and discover some interesting things about the food you eat or were considering eating.
Continue reading "Fusion Charts and PostgreSQL Part 3: PHP Dashboard"
Friday, December 19. 2008
Printer Friendly
In the first part of this series Fusion Charts and PostgreSQL Part 1: Database Analysis of USDA DB in our November/December 2008 issue, we did
some back-seat analysis of a database we had never seen before and formulated some thoughts of things that would be useful to
see in a dashboard as well as starting to develop some views to support our Dashboard.
In this part, we start the fun off by building an ASP.NET app in both VB and C#. In the next part of
this series, we shall perform the same feat with PHP.
We are going to create a simple dashboard that has the following features:
- A drop downlist to allow the user to pick the kind of chart to display the data in (Bar, column, funnel etc)
- A drop downlist that allows the user to pick the metric to explore -- e.g. Cholestrol, Vitamin K, Caffeine etc.
- 2 charts -- one chart showing the top 5 food groups for our metric and another showing the top 5 foods for our metric
Our final product will look like this:
You can see the app in action - USDA Food Stats and discover some interesting things about the food you eat or were considering eating.
Continue reading "Fusion Charts and PostgreSQL Part 2: ASP.NET Dashboard"
Monday, November 17. 2008
Printer Friendly
Sometimes when you are testing or setting up a server or just porting things to another server, you just want to install the same set of users as you had before without restoring any databases.
In PostgreSQL, the users (Login Roles) and group roles are stored at the server level and only the permissions to objects are stored at the database level.
Question: How do you restore just the users and roles without having to do a full pg_dumpall of your server?
Continue reading "Backing up Login Roles aka Users and Group Roles"
|