Tuesday, December 28. 2010
In a prior article we did a review of PostgreSQL 9 Admin Cookbook, by Simon Riggs and Hannu Krosing. In this article
we'll take a look at the companion book PostgreSQL 9 High Performance by Greg Smith.
Both books are published by Packt Publishing and can be bought directly from Packt Publishing or via Amazon. Packt is currently running a 50% off sale if you
buy both books (e-Book version) directly from Packt. In addition Packt offers free shipping for US, UK, Europe and select Asian countries.
For starters: The PostgreSQL 9 High Performance book is a more advanced book than the PostgreSQL 9 Admin Cookbook and is more of a sit-down book. At about 450 pages, it's a bit longer than the PostgreSQL Admin Cookbook. Unlike the PostgreSQL 9 Admin Cookbook, it is more a concepts book and much less of a cookbook.
It's not a book you would pick up if you are new to databases and trying to feel your way thru PostgreSQL, however if you feel comfortable with databases in general, not specific
to PostgreSQL and are trying to eek out the most performance you can it's a handy book. What surprised me most about this book was how much of it is not specific to PostgreSQL, but in fact hardware considerations that are pertinent to most relational databases.
In fact Greg Smith, starts the book off with a fairly
shocking statement in the section entitled PostgreSQL or another database? There are certainly situations where other database solutions will perform better. Those are words you will rarely hear from die-hard PostgreSQL users, bent on defending their database
of choice against all criticism and framing PostgreSQL as the tool that will solve famine, bring world peace, and cure cancer if only everyone would stop using that other thing and use PostgreSQL instead:).
That in my mind, made this book more of a trustworthy reference if you came from some other DBMS, and wanted to know if PostgreSQL could meet your needs comparably or better than what you were using before.
In a nutshell, if I were to contrast and compare the PostgreSQL 9 Admin Cookbook vs. PostgreSQL High Performance, I would say the Cookbook is a much lighter read focused on getting familiar with and getting the most out of the software (PostgreSQL), and PostgreSQL High Perofrmance is focused
on getting the most out of your hardware and pushing your hardware to its limits to work with PostgreSQL. There is very little overlap of content between the two and as you take on more sophisticated projects, you'll definitely want both books on your shelf. The PostgreSQL 9 High Perofrmance book isn't going to teach you
too much about writing better queries,day to day management, or how to load data etc, but it will tell you how to determine when your database is under stress or your hardware is about to kick the bucket and what is causing that stress. It's definitely a book you want to have if you plan to run large PostgreSQL databases or a high traffic
site with PostgreSQL.
PostgreSQL 9 High Performance is roughly about 25% hardware and how to choose the best hardware for your budget, 40% in-depth details about how PostgreSQL works with your hardware and trade-offs made by PostgreSQL developers to get a healthy balance of performance vs. reliability, and another 35% about various useful monitoring
tools for PostgreSQL performance and general hardware performance. Its focus is mostly on Linux/Unix, which is not surprising since most production PostgreSQL installs are on Linux/Unix. That said there is some coverage of windows
such as FAT32/NTFS discussion and considerations when deploying terabyte size databases on Windows and issues with shared memory on Windows.
Full disclosure: I got a free e-Book copy of this book just as I did with PostgreSQL 9 Admin Cookbook.
Quality of Book Print
One person yelled at me on reddit for not going into details about the quality of the print and so forth. So here is my general biased comment of it..I have just the e-Book so can't comment on the hard-copy aside from what I know about PacktPub books I do have hard-copies off. As far as e-Book
quality goes, I can only speak for the PDF version. What I can say about it, is that it fits my requirements. I can copy and paste code from the book,
the index in the back and table of contents is clickable and takes me to any section when I click on the page number link. The text is fairly easy to read. It's not very
colorful though. I think there are only about 3 or 4 color graphs in this book. Some colorful graphical explain plans would have helped A LOT. In short it needs more color :).
Andrew Dunstan also did a review of PostgreSQL 9 High Performance
where he did complain about the code wrapping and the explain plans being difficult to read because of the wrapping. I personally find text explain plans difficult to read
period. Sure things could have been wrapped a bit better or you could have YAML'd the damn thing, but it wouldn't have helped me much. What I really wanted to see was Graphical PgExplain plans along side the text plans.
As strange as it sounds, perhaps because I come from a SQL Server background, I use the graphical explain plan as a roadmap into my information overloaded textual explain plan. Without that crutch,
I feel a bit naked. The other snippets of code like SQL examples, I did find well-indented and easy to read, so have no idea what Andrew was whining about :).
Overview coverage of chapters
I liked David Christensen's review quote about the key theme of this book Measure don't guess. I think that
quote sums this book up quite nicely. Here is my slightly more detailed account. I'm not going to bore you with the gory details of each chapter as was suggested on reddit, but will try to summarize the flavor of each chapter. If you don't like that
tough and write your own review :).
- Chapter 1: PostgreSQL Versions Although the book is called PostgreSQL 9, it really covers PostgreSQL 8.2-9.0 and goes through a history of major enhancements in each release that have improved performance and dears to ask the question, is
really right for your needs? It will give you a good idea of if you are using version X of PostgreSQL, if there is a compelling reason for you to upgrade, and if you have chosen
not to use PostgreSQL in the past, if the issue that stopped you has been remedied.
- Chapter 2: Database Hardware This chapter focuses on selecting the best hardware to run PostgreSQL and in terms of pricing which hardware features are the biggest bang for the buck. Although it phrases the selection in the context of PostgreSQL,
I would say that this is a chapter that is almost as applicable to any relational database today, particularly any you would run on a Unix system. It talks about choosing the right RAID configurations, SANS vs NASs, Disk controllers, onboard RAM, Solid State drives (SSDs) etc and how they play a role in query and general health of your PostgreSQL database server.
All those things that generally bore me to death, but I know are important and something I would quickly pass off to Leo (the one that is suspicious of any server he didn't put together with his own bare hands) and say "Make sure you know this stuff".
- Chapter 3: Database Hardware Benchmarking Again another chapter, not too specific to PostgreSQL. It describes a ton of Unix utilities that would make performance fanatics giddy. Things like sysbench, bonnie, memtest, hdtune etc. Various techniques for doing memory tests with PostgreSQL test queries.
- Chapter 4: Disk Setup Again stuff I would consider not specific to PostgreSQL. It talks about how you should partition your data and why it's generally a bad idea to put your data on an Operating System partition. It talks about various
Linux file system types like ext2, ext3, ext4, XFS, Solaris and FreeBSD (UFS, ZFS options) and the pros and cons of each. File system limits etc. It does get into specific PostgreSQL areas such as how to setup your temp table spaces.
- Chapter 5: Memory for Database Caching This chapter is very PostgreSQL focused and discusses how PostgreSQL does caching and how the various parameters in postgresql.conf affect its caching behavior. It also goes into detail how
about how caching and the parameters have changed from PostgreSQL 8.2 to present. It's definitely a chapter you want to read if you are upgrading or thinking of upgrading your PostgreSQL server.
- Chapter 6: Server Configuration Tuning This is an extension fo the Chapter 5 topics and covers every key parameter in postgresql.conf. Issues to watch out for with connection pooling when running on Windows that are pretty much
non-issues on Linux. Also performance settings that you can set at the client level.
- Chapter 7: Routine Maintenance This is a chapter I would consider an important read for both newbies and long-time PostgreSQL users. It goes into detail
about techniques and tools for monitoring performance of queries, gotchas and advantages of how PostgreSQL implements MVCC, auto explain, log file analysis tools such as pgFouine.
- Chapter 8: Database Benchmarking Mostly focused on using pgbench, setting up custom scripts, and analyzing pgbench results.
- Chapter 9: Database Indexing Covers the various kinds of indexes, how to determine size of an index relative to table size so you can do a better cost/benefit of creating the index, measuring speed of index creation, concepts such as clustering and fill factor, benchmarking how an index improves query performance.
- Chapter 10: Query Optimization This particular chapter WILL help you write better queries. This covers various tools for analyzing queries, how to read explain plans, the new XML/YAML etc outputs introduced in PostgreSQL 9, visual explain plans. Various characteristics of different join strategies employed by PostgreSQL and postgresql settings
that affect query performance.
It uses the pagilla and Dell Store 2 databases for most of the exercises.
- Chapter 11: Database Activity and Statistics This covers the various built-in views in PostgreSQL for checking on statistics and some sample queries covering everything from
reading table I/O, background writer stats to disk usage and table locks.
- Chapter 12: Monitoring and Trending This covers various toolkits (mostly Linux) for monitoring both server as well as specifically PostgreSQL behavior over time. It doesn't go into too much detail
about how to use any of them, but does provide where to find more about each and the pros and cons of each. Items covered are Bucardo checkpostgres tool, Staplr,Cacti etc.
- Chapter 13: Pooling and Caching Covers using pgPool-II, pgBouncer, memchached and pgmemcache.
- Chapter 14: Scaling and Replication Covers using various replication strategies and pros and cons of each : Hot Standby (introduced in PostgreSQL 9), Londiste, Slony, Bucardo, replication features of pgPool-II
- Chapter 15: Partitioning Data Covers how to partition data in PostgreSQL using inheritance, how to arrive at optimal sizes for your partitions, list partitioning, range partitioning and redirecting inserts with partition rules vs. dynamic triggers.
It also provides a brief overview of horizontal partitioning and sharding with PL/Proxy and GridSQL.
- Chapter 16: Avoiding Problems This covers common mistakes people perform when bulk loading data, cautions when backing up with a newer pg_restore than the database version, trigger memory usage and avoiding running out of memory etc.