For starters: The PostgreSQL 9 High Performance book is a more advanced book than the PostgreSQL 9 Admin Cookbook and is more of a sit-down book. At about 450 pages, it's a bit longer than the PostgreSQL Admin Cookbook. Unlike the PostgreSQL 9 Admin Cookbook, it is more a concepts book and much less of a cookbook.
It's not a book you would pick up if you are new to databases and trying to feel your way thru PostgreSQL, however if you feel comfortable with databases in general, not specific
to PostgreSQL and are trying to eek out the most performance you can it's a handy book. What surprised me most about this book was how much of it is not specific to PostgreSQL, but in fact hardware considerations that are pertinent to most relational databases.
In fact Greg Smith, starts the book off with a fairly
shocking statement in the section entitled PostgreSQL or another database?There are certainly situations where other database solutions will perform better. Those are words you will rarely hear from die-hard PostgreSQL users, bent on defending their database
of choice against all criticism and framing PostgreSQL as the tool that will solve famine, bring world peace, and cure cancer if only everyone would stop using that other thing and use PostgreSQL instead:).
That in my mind, made this book more of a trustworthy reference if you came from some other DBMS, and wanted to know if PostgreSQL could meet your needs comparably or better than what you were using before.
In a nutshell, if I were to contrast and compare the PostgreSQL 9 Admin Cookbook vs. PostgreSQL High Performance, I would say the Cookbook is a much lighter read focused on getting familiar with and getting the most out of the software (PostgreSQL), and PostgreSQL High Perofrmance is focused
on getting the most out of your hardware and pushing your hardware to its limits to work with PostgreSQL. There is very little overlap of content between the two and as you take on more sophisticated projects, you'll definitely want both books on your shelf. The PostgreSQL 9 High Perofrmance book isn't going to teach you
too much about writing better queries,day to day management, or how to load data etc, but it will tell you how to determine when your database is under stress or your hardware is about to kick the bucket and what is causing that stress. It's definitely a book you want to have if you plan to run large PostgreSQL databases or a high traffic
site with PostgreSQL.
PostgreSQL 9 High Performance is roughly about 25% hardware and how to choose the best hardware for your budget, 40% in-depth details about how PostgreSQL works with your hardware and trade-offs made by PostgreSQL developers to get a healthy balance of performance vs. reliability, and another 35% about various useful monitoring
tools for PostgreSQL performance and general hardware performance. Its focus is mostly on Linux/Unix, which is not surprising since most production PostgreSQL installs are on Linux/Unix. That said there is some coverage of windows
such as FAT32/NTFS discussion and considerations when deploying terabyte size databases on Windows and issues with shared memory on Windows.
Full disclosure: I got a free e-Book copy of this book just as I did with PostgreSQL 9 Admin Cookbook.
Question: You have a table of people and a table that specifies the activities each person is involved
in. You want to return a result that has one record per person and a column that has a listing of activities for each person
separated by semicolons and alphabetically sorted by activity. You also want the whole set alphabetically sorted by person's name.
This is a question we are always asked and since we mentor on various flavors of databases,
we need to be able to switch gears and provide an answer that works on the client's database. Most
often the additional requirement is that you can't install new functions in the database. This means that
for PostgreSQL/SQL Server that both support defining custom aggregates, that is out as an option.
Normally we try to come up with an answer that works in most databases, but sadly the only solution that works in
most is to push the problem off to the client front end and throw up your hands and proclaim -- "This ain't something that should be
done in the database and is a reporting problem." That is in fact what many database purists do, and all I can say to them is wake up and smell the coffee before you are out of a job.
We feel that data
transformation is an important function of a database, and if your database is incapable of massaging the data into a format
your various client apps can easily digest, WELL THAT's A PROBLEM.
We shall now document this answer rather than trying to answer for the nteenth time. For starter's
PostgreSQL has a lot of answers to this question, probably more so than any other, though some are easier to execute than others
and many depend on the version of PostgreSQL you are using. SQL Server has 2 classes of answers neither of which is terribly appealing,
but we'll go over the ones that don't require you to be able to install .NET stored functions in your database since we said that is often a requirement.
MySQL has a fairly
simple, elegant and very portable way that it has had for a really long time.
In Part 2 of PL/R we covered how to build PL/R functions that take arrays and output textual outputs of generated R objects. We then used this in an aggregate SQL query using array_agg. Often when you are building PL/R functions
you'll have R functions that you want to reuse many times either inside a single PL/R function or across various PL/R functions.
Unfortunately, if you wanted to call a PL/R function from another PL/R function, this is not possible unless you are doing it from spi.execute call.
There is another way to embed reusable R code in a PostgreSQL database.
In order to be able to share databases stored R code across various PL/R functions, PL/R has a feature called a plr_module. In this tutorial
we'll learn how to create and register shareable R functions with plr_module. In the next part of this series we'll start to explore generating graphs with PL/R.
In Intro to PL/R and R, we covered how to enable PL/R language in the database and wrote some PL/R functions
that rendered plain text reports using the R environment. What makes combining R and PostgreSQL in
PL/R most powerful is when you can start writing SQL summary queries that use R functions like any other SQL function.
In this next example, we'll be using PostGIS test runs from tests we autogenerated from the Official PostGIS documentation (Documentation Driven Testing (DDT))
as described in the Garden Test section of the PostGIS Developer wiki.
We've also updated some of our logging generator and test patterns so future results may not represent what we demonstrated in the last article.
On a side note: Among the changes in the tests was to introduce more variants of the Empty Geometry now supported by PostGIS 2.0.
Our beloved PostGIS 2.0 trunk is at the moment somewhat unstable when working with these new forms of emptiness and stuffing geometries in inappropriate places. At the moment it doesn't survive through the mindless machine gun battery of tests we have mercilessly inflicted.
It's been great fun trying to build a better dummy while watching Paul run around patching holes to make the software more dummy proof as the dummy stumbles across questionable but amusing PostGIS use cases not gracefully handled by his new serialization and empty logic.
On yet another side note, it's nice to
see that others are doing similar wonderful things with documentation. Check out Euler's comment on catalog tables
where he uses the PostgreSQL SGML documentation to autogenerate PostgreSQL catalog table comments using OpenJade's OSX to convert the SGML to XML and then XSL similar to what we did with PostGIS documentation to autogenerate PostGIS function/type comments and as a platform
for our test generator.
For our next exercises we'll be using the power of aggregation to push data into R instead of pg.spi.execute. This will make our functions far more reusable and versatile.