Sunday, November 28. 2010
Continue reading "Quick Intro to R and PL/R - Part 1"
In this article we'll provide a summary of what PL/R is and how to get running with it. Since we don't like repeating ourselves,
we'll refer you to an article we wrote a while ago which is still fairly relevant today called Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide
and just fill in the parts that have changed. We should note that particular series was more geared toward the spatial database programmer (PostGIS in particular). There is a lot of overlap between the PL/R, R, and PostGIS user-base which is comprised
of many environmental scientists and researchers in need of powerful charting and stats tools to analyse their data who are high on the smart but low on the money human spectrum.
This series will be more of a general PL/R user perspective. We'll follow more of the same style we did with Quick Intro to PL/Python. We'll end our series with a PL/R cheatsheet similar to what we had for PL/Python.
As stated in our State of PostGIS article, we'll be using log files we generated from our PostGIS stress tests. These stress tests were auto-generated from the PostGIS official documentation.
The raster tests are comprised of 2,095 query executions exercising all the pixel types supported. The geometry/geograpy tests are comprised of 65,892 spatial SQL queries exercising every PostGIS geometry/geography supported in PostGIS 2.0 -- yes this includes TINS, Triangles,Polyhedral Surfaces, Curved geometries and all dimensions of them.
Most queries are unique. If you are curious to see what these log tables look like or want to follow along with these exercises, you can download the tables from here.
What is R and PL/R and why should you care?
R is both a language and an environment for doing statistics and generating graphs and plots. It is GNU-licensed and a common favorite of Universities and Research institutions. PL/R is a procedural language for PostgreSQL that allows you to write database stored functions
in R. R is a set-based and domain specific language similar to SQL except unlike the way relational databases treat data, it thinks of data as matrices, lists and vectors. I tend to think of it as a cross between LISP and SQL though more experienced Lisp and R users will probably disagree with me on that. This makes it easier in many cases to tabulate data both across columns as well as across rows.
The examples we will show in these exercises, could be done in SQL, but they are much more succinct to write in R. In addition to the language itself, there are a whole wealth of statistical and graphing functions available in R that you will not
find in any relational database. These functions are growing as more people contribute packages. Its packaging system called Comprehensive R Archive (CRAN) is similar in concept to Perl's CPAN and the in the works PGXN for PostgreSQL.
Tuesday, November 23. 2010
Continue reading "The State of PostGIS, Joys of Testing, and PLR the Prequel"
I've always enjoyed dismantling things. Deconstruction was a good way of analyzing how things were built by cataloging all the ways
I could dismantle or destroy them. I experimented with mechanical systems, electrical circuitry, chemicals and biological systems sometimes coming close to bodily harm. In later years I decided to play it safe and just stick with programming and computer simulation
as a convenient channel to enjoy my destructive pursuits.
Now getting to the point of this article.
In later articles, I'll start to demonstrate the use of PL/R, the procedural language for PostgreSQL that allows you to program functions in the statistical language and Environment R. To
make these examples more useful, I'll be analyzing data generated from PostGIS tests I've been working on for stress testing the upcoming PostGIS 2.0. PostGIS 2.0 is a major
and probably the most exciting release for us. Paul Ramsey did a summary talk recently of Past, Present, Future of PostGIS at State of PostGIS FOSS4G Japan http://www.ustream.tv/recorded/10667125
which provides a brief glimpse of what's in store in 2.0.
Monday, October 11. 2010
Continue reading "PgAdmin III 1.13 - change in plugin architecture and PostGIS Plugins"
One of the neat changes already present in the PgAdmin III 1.13dev, is the change in plugin architecture. Version 1.13 dev allows for multiple plugin*.ini files. How does this work.
Well if you have a plugins.d folder in your PgAdmin III version folder, it will read all the inis in that folder and load them as plugins.
Recall in PgAdmin III Plug-in Registration: PostGIS Shapefile and DBF Loader,
we demonstrated how to load the PostGIS shapefile and dbf loader as a plugin in PgAdmin III, well this time we will demonstrate how to do it using PgAdmin version 1.13. Better yet, we'll show you the new and improved
PgAdmin III Shapefile and DBF Loader in the works for PostGIS 2.0.
Thursday, July 29. 2010
Continue reading "Explain Plans PostgreSQL 9.0 Text, JSON, XML, YAML - Part 1: You Choose"
One of the new features of PostgreSQL 9.0 is the ability to specify the format of an explain plan.
In prior versions your only choice was text (and graphic explain with tools like PgAdmin III and other GUIS), but in 9.0 on, you have the additional options of
The main benefit of the JSON, XML, YAML formats is that they are easier
to machine parse than the default text version. This will allow for creative renderings of planner trees with minimal coding.
In Part 1 of this series, we'll demonstrate how to output the plans in these various formats and what they look like.
to transform these into works of art you can hang on your wall.
-- START POSTGIS IN ACTION ASIDE --
We just submitted the third major revision of Chapter 3 Data Modeling
of our upcoming PostGIS in Action book.
The second major revision we never submitted and threw it out because it wasn't worldly enough and was too involved. We may
use it later on for an example.
Chapter 3 should be up on Manning Early Access Program (MEAP) soon. If you haven't bought the book yet Buy now.
You don't want to miss out on a major masterpiece in the making. Okay we exaggerate a bit.
-- END POSTGIS IN ACTION ASIDE --
Wednesday, May 12. 2010
We have just packaged up PostGIS binaries for Windows PostgreSQL 9.0 beta 1. These are binaries for PostGIS 1.5 current stable branch and WKT Raster raster support.
You can download these from the PostGIS Windows Experimental Builds section.
When PostGIS 1.5.2 is officially released, we'll be adding PostGIS 1.5.2 for PostgreSQL 9.0 on stack builder section along with the 8.3 and 8.4 versions.
Saturday, April 17. 2010
Continue reading "PostGIS Raster its on: 10 things you can do NOW with raster"
We just finished the first draft of the last chapter of our book: First look at PostGIS WKT Raster. This completes our hard-core writing and now on to more drafting,
polishing all the chapters.
In Chapter 13 we demonstrate how to use PostGIS WKT Raster functions by example and cross breed with PostGIS geometry functionality. I was pleasantly surprised to see how nicely the raster and geometry functions play together.
We had intended this chapter to be short about 20 pages in length, because how much can one say about pixels and pictures. As it turns out, a lot.
Rasters are more versatile than their picture portrayal on a screen. Rasters are a class of structured storage suitable for representing any numeric,
cell based data where each cell has one or more numeric properties (the bands). This covers quite a bit of data you collect with remote sensing and other electronic instrumentation. We had to stretch to over 30 pages; even then we felt we were missing some critical examples.
There is a lot of useful functionality in PostGIS WKT Raster
already and should make a lot of people looking for raster support in PostgreSQL very happy. Although the chapter may portray some scenes of violence and torture inflicted on elephants, you can rest assured
that it is pure illusion and no real elephants or blue elephant dolls were harmed in the making of this chapter.
As a side note -- our book is now listed on Amazon PostGIS in Action.
It is not available in hard-copy yet,but you can pre-order and of course you can order from PostGIS in Action from Manning directly
to get the chapter drafts we have posted, updates as we polish them, and the final book when it comes out in hard print.
The Amazon listing would have been so much more exciting, had they not stripped me of my last name or had Leo married to himself.
UPDATE: It appears I now have a last name again
In hind sight, I suppose OBE is more commonly seen as a title of honor rather than a last
name, so its only fitting that I should be stripped of mine and Tim Berners-Lee gets it tacked on at the end of his name.
To find out more about PostGIS WKT Raster, we encourage you to check out these links.
Now we'll itemize 10 things you can do now with PostGIS WKT Raster. In order to use PostGIS WKT Raster, you need PostGIS 1.3.5 or above. Preferably 1.4 or 1.5 or 2.0 alpha.
PostGIS WKT Raster is currently packaged as a separate library and we have windows binaries available.
Friday, January 01. 2010
Continue reading "Compiling PostGIS 1.5 and installing after Yum PostgreSQL Install"
Since this article was written, Yum now comes packaged with PostGIS 1.5. Please refer to PostGIS Yum Install for PostgreSQL 9.0
if you want to go with a faster more idiot proof install process
A couple of people have asked this, so thought we would address the common issues people run into
with compiling PostGIS.
The first question, some will ask is Why don't you just take the packaged PostGIS that comes with the PostgreSQL
There are two reasons for not installing PostGIS under using Yum and our general reasons for not.
- We like running a newer GEOS than what is generally packaged with Yum because certain PostGIS functionality
is not enabled with older versions of GEOS. For example GEOS 3.1 introduced a couple of new functions like ST_Covers and
the Cascaded Union and prepared geometry speed enhancements require GEOS 3.1+. GEOS 3.2 brought more robust support for dealing with topological
exceptions, new buffering features like one-sided buffer (which the upcoming PostGIS 1.5 exposes if and only if you are running
GEOS 3.2+), faster buffering, and several memory leak cleanups. We highly recommend using GEOS 3.2.0. It is just a more robust product than prior GEOS versions.
- We generally like to run newer versions of PostGIS than what Yum provides and also run multiple versions of PostGIS
on the same server. This is a bit tricky to do with Yum since it only exposes one and generally an older one than what we
care for. PostGIS 1.5 will be especially great since it will have the geodetic support and cool functions like ST_ClosestPoint
and other major goodies.
NOTE: PostGIS 1.3, PostGIS 1.4, PostGIS 1.5 can coexist on the same PostgreSQL daemon service as long as they are installed in different databases. They will all share the same Proj and GEOS. So installing PostGIS 1.5 will not break your
PostGIS 1.4 or 1.3 installs. The newer GEOS 3.2.0 C-API is backward compatible with older GEOS C-API and the C-API is what PostGIS uses.
Thursday, December 31. 2009
This was a truly exciting year for us and the PostgreSQL project and perhaps a bit depressing for MySQL.
The following events happened:
- PostgreSQL 8.4 was released which had blow away features like Common Table Expressions (CTE) , Recursive CTEs, and Windowing Functions. This meant we could finally get some of our hard-core Oracle and SQL server friends really excited about PostgreSQL.
- This is the first year we got out of our shy mode and actually presented at conferences. We presented at PGCon 2009 and OSCON 2009.
- The PostGIS project steering committee was formed with Regina as one of the founding members
- We started writing our PostGIS in Action due out sometime in 2010. Sadly we are a bit behind schedule, but on the bright side, you can buy the book now and it will probably be a bit heftier than the 325 pages we had planned. To celebrate our upcoming book, we have launched our book promo site PostGIS in Action: The Book where the adventure begins. There you will find source code downloads, data, presentations as we put each together. You will also see a brief description of chapters , our progress with each chapter, what you can expect from each chapter, and related links to the chapter content. We are currently at what we hope is our last quarter sprint.
- We wrote a DZone cheatsheet which was confronted with mixed emotions.
- 2009 was also the year Oracle threatened to buy Sun and engulf MySQL in the process. Interestingly this was predictable in someone's wildest dreams. Is this the end of Open source databases as we know it? Only time will tell.
Plans for 2010
What are our plans for 2010?
- Get PostGIS 1.5 out the door some time in January 2010
We hope in 2010 to present at at least one PGCon conference and hopefully make FOSS4G 2010 in September 2010. Our book better be written by then.
- Finish our book and hopefully soon.
- Increase the adoption of PostgreSQL and PostGIS significantly. To paraphrase our favorite Larry's famous words our strategy is to Get big very fast.
- Get PostGIS 2.0 out the door sometime in late 2010.
What will happen to the database industry in 2010
I usually try to keep my mouth shut on these topics. I must say that I have noticed a bit of animosity from some PostgreSQL people toward the whole MySQL/Oracle affair, comments like He lives by the sword, he should die by the sword. Other interesting conjectures as to what this means for Open Source databases, Is Monty right that the apparent rape of MySQL by Oracle is only bad and will cause countless pain and suffering for many. All I can say is "What..ever".
Some argue that Monty's fight is all about money and some don't that he is earnestly trying to save the world from Oracle. To me its a fight about a man who has spent half his life nurturing this child MySQL named after his own son. Of course he has quite a bit of emotional attachment to it, as many in the PostgreSQL clan have an equal attachment to PostgreSQL and we have an equal non-economic (as well as economic) attachment to PostGIS and PostgreSQL. Equally so I'm sure Larry Ellison has perhaps a stronger attachment to the namesake Oracle database named after a CIA project he spent more than half his life nurturing.
So in short the motives on all side are clear and irrelevant to all except said people with said motives. In the end, what is relevant is what is relevant.
Friday, November 06. 2009
Continue reading "PostGIS does Geography"
The upcoming version of PostGIS - PostGIS 1.5 will be an exciting one. It has native geodetic support in the form of the new
geography type, similar in concept to SQL Server's geography support. For windows users, we have experimental binary builds hot off the presses for PostgreSQL 8.3 and 8.4
Thursday, October 22. 2009
This week is a busy week for events. While PostgreSQL is having its PostgreSQL West
conference in Seattle, the biggest Open Source GIS conference of the year is happening in Sydney, Australia FOSS4G 2009.
Sadly given our schedule and the distances of the commutes, we couldn't make either conference.
pointed out that the video is out for Paul Ramsey's FOSS4G 2009 Keynote speech
on Beyond Nerds Bearing Gifts: The Future of the Open Source Economy.
I think its a very important distinction Paul makes between selling software and selling a product, that a lot of people miss when trying to evaluate the solvency of
open source software.
For those who don't know Paul, he's one of the co-founders of the PostGIS project and Refractions Research.
Wednesday, August 26. 2009
Continue reading "PostGIS changing of the Guards"
As David Page already noted, Leo and I are taking over responsibility of building PostGIS windows one-click installers/stack builder from Mark Cave-Ayland. The PostGIS 1.4 windows packaging was a little late in coming this time since it
was our first and also some things changed in the PostGIS packaging for 1.4. Even so we made some mistakes such as statically compiling libproj in with the postgis-1.4.dll and forgetting some new images in the packaged html help, which we will fix in 1.4.1 release.
Mark will still be providing a supporting role and helping out when we screw up or helping us if we run into compile issues as we go along so he's not going away; he will be a great safety net.
When Mark started his role a long time ago, he was as many would like to say "Very entrenched in the dark side,"
and over the years, he has seen the light. As a result, these moments of catching issues in the PostGIS release
cycle that effect windows users such as troubleshooting the memory bug in the loader files that affected Windows Vista users and testing on various Windows OS, has fallen on us, because well we have access
to all windows os.
It also became painful for Mark
to walk in the shadow of darkness once he had seen the light. Luckily we are still windows addicts so this having to constantly test on Windows and building for
Windows is something we would naturally do anyway and yes as shocking as it sounds we do run some production PostgreSQL apps on windows
and it works pretty well, thank you very much. We don't expect this to change any time soon.
As part of this change, we hope to provide more interim windows builds of PostGIS so windows users can experiment with future releases before they come
out. Yes compiling on windows is a tad bit more difficult than on Linux. These PostGIS windows experimental builds can be found http://postgis.net/windows_downloads
Main changes in PostGIS
- The PostGIS steering committee has agreed to be good and not be adding new functions
between micro releases of PostGIS as we have done in the past and as we've been smacked around for. As part of that change,
from PostGIS 1.4 moving forward each micro version will overwrite the previous micro version in the MS Windows registry. E.g. 1.4.1 will overwrite 1.4.0 so no need to uninstall the old
and reinstall to get rid of registry junk. Just install on top of your existing 1.4.
- As of PostGIS 1.4 it is possible to run different versions of PostGIS in different databases on teh same PostgreSQL server install since the .so/.dll from minor to minor have unique names (naming is postgis-1.4.so (postgis-1.4.dll), postgis-1.5.so etc). This is mostly useful
for testing and comparing different versions of PostGIS before you officially upgrade and if you have several different spatial apps using different databases, you don't risk breaking them all at once.
- PostGIS is now an official incubation project of OSGEO. Things are still being drafted. But I guess that means our
PostGIS defacto steering committee composed of Kevin, Paul, Mark, and myself
is now more or less official.
Friday, July 03. 2009
Continue reading "PostGIS 1.4 hot on the heels of PostgreSQL 8.4"
PostgreSQL 8.4 has come out, and while I am a bit disappointed that PostGIS 1.4 has not come out for fear that we've missed a bit of the PostgreSQL 8.4 momentum,
I am happy that we are nearing closer and just maybe we'll have it out by end next week. We now have a PostGIS 1.4RC1 http://postgis.refractions.net/download/postgis-1.4.0rc1.tar.gz tar ball
as well as experimental binary builds of this for windows user's running PostgreSQL 8.3 http://postgis.refractions.net/download/windows/pg83/experimental/postgis/
or PostgreSQL 8.4 http://postgis.refractions.net/download/windows/pg84/experimental/postgis/. Please give both a try.
Working in the Cathedral Really?
As Paul duly noted in his blog entry Working in the Cathedral
the model for PostGIS development is morphing, but I wouldn't call this morphing process one that is entirely toward the Cathedral model. Unlike the perceived Cathedral model, I would like to think we will have more frequent releases and beta releases, perhaps parallel experimental builds and most importantly, more fun.
The main idea being making it much easier for mere mortals and fake mortals to taste test the cookies in the oven while they are cooking. By fake I mean unit tests, build bots, and computer generated people where the fear of destruction is removed.
I feel this is the similar model PostgreSQL goes by or is trying to achieve.
Friday, May 08. 2009
Continue reading "PostGIS 1.3.6 is out and new upcoming PostGIS book"
This month is jam packed with a lot of PostGIS news.
PostGIS 1.3.6 is out
PostGIS 1.3.6 has been released. It is mostly a bug fix relase and is the first PostGIS that can be compiled under PostgreSQL 8.4 beta. Details can be found at
PostGIS 1.3.6 release notes. We don't have Windows binaries ready yet, but expect to see that in the next week or so.
We are writing a PostGIS Book
Leo and I have been writing a PostGIS in Action book for the past couple of months, and now that it is finally listed on the Manning website, we can talk about it.
We are working on our chapter 4 right now. If you are interested in learning PostGIS, check it out. The first chapter is free and with the Manning Early Access Program (MEAP), you can purchase the book now and have great influence on
the direction of the book.
The book starts off hopefully with a gentle introduction to OpenGIS Consortium (OGC) spatial databases and concepts in general and PostgreSQL/PostGIS in particular. As we move further into the book, we cover more advanced ground.
We plan to cover some of the new PostgreSQL 8.4 features in conjunction with PostGIS, writing stored functions to solve spatial problems and some of the other new exciting stuff and ancillary tools for PostGIS such as PgRouting, Tiger Geocoder,
and WKT Raster.
Given all that ground, I suspect our estimate of 325 pages, may be a little low when all is said and done. It is funny that when we started out, we thought to ourselves -- "How can anyone fill up 325 pages." Turns out very easily especially
once you start throwing in diagrams and pictures to demonstrate a point. Diagrams are kind of important to have when describing GIS and geometry concepts. So far its been fun and has forced us to sit down and walk thru all the things we took for granted and thought we understood but didn't. You realize just how little
you understand when you try to explain something to someone else who really doesn't understand. So perhaps the process of explaining is the greatest of all learning experiences.
Saturday, April 18. 2009
Continue reading "How to force PostgreSQL to use a pre-calculated value"
This question is one that has come up a number of times in PostGIS newsgroups worded in many different ways. The situation is that if you use a function a number of times
not changing the arguments that go into the function, PostgreSQL still insists on recalculating the value even when the function is marked IMMUTABLE. I have tested this on
8.2 and 8.3 with similarly awful results.
This issue is not so much a problem if function calculations are fast, but spatial function calculations relative to most other functions you will use are pretty
slow especially when dealing with large geometries. As a result your query could end up twice as slow. Even setting the costs of these functions to relatively high does not help the situation.
To demonstrate here is a non-PostGIS version of the issue that everyone should be able to run and demonstrates its not a PostGIS only issue.
Saturday, March 28. 2009
Even though others have blogged about this in the past and its well-documented in the docs, its a frequently enough asked question, that we thought we'd post it here again
with a couple of additional twists.
How to determine the size of a database on disk
SELECT pg_size_pretty(pg_database_size('somedatabase')) As fulldbsize;
How to determine the size of a database table on disk
NOTE: There are two functions in PostgreSQL - pg_relation_size and pg_total_relation_size. The pg_relation_size just measures the size of the actual table where
as the pg_total_relation_size includes both the table and all its toasted tables and indexes.
SELECT pg_size_pretty(pg_total_relation_size('someschema.sometable')) As fulltblsize,
pg_size_pretty(pg_relation_size('someschema.sometable')) As justthetblsize;
How to determine the size of a database schema
When you are doing a lot of data loading for later massaging to dump into your production tables, you will often generate junk tables in the process and worse yet other
people may generate junk tables period. As a general practice, we like to create a schema just for junk. The reason for that is that it is extremely easy to
exclude schemas from being backed up and to load individual schemas. So with this approach we create or move all tables we are not quite sure if they are needed or we only need temporarily
into a schema
called junk, scratch, possiblejunk or whatever you want to call the schema and exclude it from our backups.
Something of the form:
CREATE SCHEMA scratch;
ALTER TABLE someschema.joeyplaying SET SCHEMA scratch;
After saving the world from junk, you would like to know how much space you have saved your backups from backing up. So you would employ a query something of the form:
SELECT pg_size_pretty(pg_database_size('mycurrentdb')) As fullprod,
pg_size_pretty(CAST(pg_database_size('mycurrentdb') - (SELECT SUM(pg_total_relation_size(table_schema || '.' || table_name) )
FROM information_schema.tables WHERE table_schema = 'scratch') As bigint)) As tobebackedup_size,
pg_size_pretty(CAST((SELECT SUM(pg_total_relation_size(table_schema || '.' || table_name) )
WHERE table_schema = 'scratch') As bigint) ) As junk_size;
Size of Geometries in PostGIS tables
PostGIS has a companion function for measuring geometry size which is useful when you want to get a sense of how much space your geometries are taking up on disk
for a set of records.
SELECT ST_Mem_Size(ST_GeomFromText('LINESTRING(220268 150415,220227 150505,220227 150406)'));
SELECT pg_size_pretty(CAST(SUM(ST_Mem_Size(the_geom)) As bigint) ) as totgeomsum
FROM sometable WHERE state = 'MA';