Postgres OnLine Journal: April 2016 / May 2016 / June 2016
An in-depth Exploration of the PostgreSQL Open Source Database
 

Table Of Contents

From the Editors
What's new and upcoming in PostgreSQL
PostgreSQL Q & A
Basics
PL Programming

From the Editors

 

PGConfUS 2016 PostGIS slides and tutorial material



We gave a PostGIS Intro Training and a PostGIS talk at PGConfUS 2016 in Brooklyn, New York and just got back. A number of people asked if we'd make the slides and material available. We have these posted on our presentation page: http://www.postgis.us/presentations and will be putting on the PostgreSQL Wiki as well in due time. There will be a video coming along for the talk, but the training was not recorded.

We also have two more talks coming up in North Carolina in Early May at FOSS4G NA 2016 - one on PostGIS Spatial Tricks which has more of a GIS specialist focus than the top 10 talk we gave, but there will be some overlap. The other talk is a topic a couple of people asked us in training and after our talk, on routing along constrained paths. If you are attending FOSS4G NA 2016, you won't want to miss our talk pgRouting: A Crash Course which is also the topic of our upcoming book.

Just like FOSS4G NA 2015, there is a pgDay track which is PostgreSQL specific material, useful to a spatial crowd, but not PostGIS focused.

I should also add, that there will be a couple of other talks on routing (not all pgRouting) and other PostGIS topics like PostGIS topology. Couple that I'm looking forward to seeing


What's new and upcoming in PostgreSQL

 

PLV8 binaries for PostgreSQL 9.6 windows both 32-bit and 64-bit



To celebrate recent release of PostgreSQL 9.6beta1, we've started to experiment with our favorite extensions. For starters, PLV8 (aka PL/JavaScript) binaries listed below and upcoming PostGIS 2.3.0 and ogr_fdw detailed here

BigSQL windows 64 distributions doesn't currently come with PL/V8 for windows, but these seem to work on that as well. Note: OpenSCG uses MingW64 chain to build BigSQL on windows as well and a slightly newer than what I use. So when copying, make sure not to overwrite files that came with BigSQL.

Listed below PL/V8 binaries built for PostgreSQL 9.6beta1 for windows 64-bit and 32-bit. These I tested with EDB PostgreSQL 9.6 windows installs and seem to work fine and also passed all regression tests. These were built with PLV8 v1.4.8.

PL/V8 PostgreSQL 9.6 binaries

We hope windows users find these useful.


What's new and upcoming in PostgreSQL

 

PostgreSQL 9.6 phrase text searching how far apart can you go



We've been playing around with new phrase text feature of full-text PostgreSQL 9.6. In doing so, I was curious how big a number one can designate as max that words can be apart. I discovered thru trial and error, that the magic number is 16384 which is much bigger than I had suspected.

The tests

To test out the new full text search phrase feature, we downloaded data from https://archive.org/download/stackexchange. We picked the data science archive since it's relatively small. Data is prepped in stackexchange_posts.zip.

Load up the data from psql with:

\encoding utf8;
\i stackexchange_posts.sql

Add full text index

ALTER TABLE posts ADD COLUMN fts tsvector;
UPDATE posts SET fts =
setweight( to_tsvector(COALESCE(tags,'') ), 'A') ||
setweight( to_tsvector(COALESCE(title,'') ) , 'B') ||
setweight( to_tsvector(COALESCE(body,'') ), 'C');
CREATE INDEX idx_posts_fts_gin ON posts USING gin(fts);

Do a sample query:

SELECT count(*)
FROM posts
WHERE fts @@ to_tsquery('social <5> ( science | scientist)') and title > '';
count
----------------
1

Up this to 20

SELECT count(*)
FROM posts
WHERE fts @@ to_tsquery('social <20> ( science | scientist)') and title > '';
count
----------------
6

Let's get crazy

SELECT count(*)
FROM posts
WHERE fts @@ to_tsquery('social <20000000> ( science | scientist)') and title > '';
ERROR:  distance in phrase operator should not be greater than 16384

and just to confirm the error is right, I did try going up one more and exactly 16384.

On a related note, we've started work on our 3rd edition of PostgreSQL: Up and Running, slated to be due out in January 2017, and hoping for pre-release around time of PostgreSQL 9.6 release. 2nd edition did not cover Full text at all, which I regreted. Full Text will have a fairly nice size showing in the 3rd edition. There will be quite a bit of coverage of 9.5 and 9.6 features in it.


PostgreSQL Q & A

 

PostGIS 2.2 Windows users hold off on installing latest PostgreSQL patch release



Someone reported recently on PostGIS mailing list, that they were unable to install PostGIS 2.2.1 bundle or PostGIS 2.2.2 binaries on a clean PostgreSQL 9.5.2 install. Someone also complained about PostgreSQL 9.3 (though not clear the version) if that is a separate issue or the same. I have tested on PostgreSQL 9.5.2 Windows 64-bit and confirmed the issue. The issue does not affect PostgreSQL 9.5.1 and older. I haven't confirmed its an issue with the 32-bit installs, but I suspect so too. This issue will affect OGR_FDW users and people who used our compiled WWW_FDW.

We've built new binaries for PostGIS 2.2.2 that have CURL compatible with the OpenSSL distributed with lastest PostgreSQL patch release for (PostgreSQL 9.3-9.5). These downloadable from experimental section - http://postgis.net/windows_downloads/ If you still have issues after using these, post to postgis mailing list. We'll be making new installers in a couple of weeks.

What's the issue?

In PostGIS 2.2, we started packaging CURL support with the GDAL driver. GDAL services both PostGIS raster and OGR_FDW extensions packaged with PostGIS 2.2 and enabling curl allowed for a couple more popular raster and vector drivers to be exposed. Since our clients needed SSL support and we needed SSL support with Curl, we compiled CURL with SSL support. However EDB already ships OpenSSL, so we had compiled against a version of OpenSSL that is compatible with what they shipped with PostgreSQL and do not ship it with our binaries. In PostgreSQL 9.5.2 (and I suspect other versions released in last round of releases), the OpenSSL libraries libeasy32.dll and ssleay32.dll got upgraded to 1.0.2g (9.5.1 shipped with 1.0.2f which in theory should be compatible with 1.0.2g since they look like micro differences), but for whatever reason aren't. Might have to do with something else. Anyrate the error you get when you try to do.

CREATE EXTENSION postgis;
is: ERROR: could not load library "../9.5/lib/rtpostgis-2.2.dll":The specified procedure could not be found. Since the www_fdw.dll and ogr_fdw.dll were also compiled with Curl support, I suspect you'll run into the same issue.

How do I fix it now?

If you already installed the latest PostgreSQL 64-bit patch release and are running into this issue, we do have a hot fix which seems to work but haven't throughly tested. We have recompiled the libcurl-4 with the same version of OpenSSL 1.0.2g that ships with latest EDB binaries and that seems to do the trick. To fix your install, replace the libcurl-4.dll packaged with PostGIS 2.2, with the version in http://winnie.postgis.net/download/windows/pg95/buildbot/libcurl7.48withsslx64.zip. On quick tests, this seems to still work okay with PostgrSQL 9.5.1 as well.

We are working on a fix now, that works with PostgreSQL 9.5.2 and is also backward compatible with PostgreSQL 9.5.1 (and other). We will create a new PostGIS 2.2.2 binary package with (latest PostgreSQL patch release). The binaries will be available in the experimental section of http://postgis.net/windows_downloads.

We will not have the installers available for another couple of weeks or month because we want to package pgRouting 2.2.0 with that (which is right now in RC1) and also installer packaging takes a lot more time to prepare and test. Given we do this in our free-time, paying work almost always takes priority over installer packaging.

To those who have asked, if you would like to contribute to our packaging efforts (and other PostGIS activties), I have set up a patreon account.


Basics

 

First Look at pgAdmin 4



When David Page announced pgAdmin 4, I was really excited to try it out. I was impressed I could compile it so easily on windows. I had a few bumps, but not too bad.

One of the reasons I'm excited about it is that it's built on Python and a web framework, and there is a large Python and web developer following in the GIS community, so I suspect someone will step up to the plate to add a mapviewer plugin to this so I can have a seamless PostGIS experience.

The interface is also very slick and pretty and I love the sorting and paging capability now in the query window. Check this sampling from our workshop database.

Still room in Introduction to PostGIS Training

We'll be giving a Training on PostGIS in this coming Monday in PGConf.us New York. Still slots available for signup.

If you are looking for something a bit shorter and will be attending the conference, we'll also be giving a PostGIS talk covering some things you can do with PostGIS.


PL Programming

 

PLV8 for Breaking Long lines of text



Recently we found ourselves needing to purchase and download Zip+4 from the USPS. Zip+4 provides listing of mailable addresses in the US. We intend to use it for address validation.

Each file has one single line with no linefeeds or carriage returns! From spec, each 182-character segment constitutes a record. USPS was nice enough to provide a Java graphical app called CRLF that can inject breaks at specified intervals. That's all nice and well, but with hundreds of files to parse, using their interactive graphical CRLF tool is too tedious.

How could we compose a PostgreSQL function to handle the parsing? Unsure of the performance among procedural languages, we wrote the function in PL/pgSQL, SQL, and PL/V8 to compare. PL/V8 processed the files an astounding 100 times faster than the rest.

PL/V8 is nothing but PL using JavaScript. V8 is a moniker christened by Google to distinguish their JavaScript language engine from all others. It's really not all that different, if at all from any other JavaScript. PL/V8 offers a tiny footprint compared to the stalwarts of PL/Python, PL/Perl, or PL/R. Plus, you can use PL/V8 to create windowing functions. You can't do that with PL/pgSQL and SQL. PL/V8 is sandboxed, meaning that it cannot access web services, network resources, etc. PL/Python, PL/Perl, and PL/R have non-sandboxed versions. For certain applications, being sandboxed is a coup-de-gras.

In our casual use of PL/V8, we found that when it comes to string, array, and mathematical operations, PL/V8 outshines PL/pgSQL, SQL, and in many cases PL/R and PL/Python.

We tested PL/V8 version 1.4.8 on PostgreSQL 9.6 Beta for Windows 64-bit on both the BigSQL Windows 64-bit and the EDB Windows 64-bit distributions

With combined help from Stack Exchange: JavaScript: An Elegant Way to Split String into Segments and the PL/V8 manual, we cooked up this function:

CREATE OR REPLACE FUNCTION split_text(p_text text, p_rec_l integer) 
RETURNS SETOF text AS
$$
   for (var i = 0, cl = p_text.length; i < cl; i += p_rec_l) {
          plv8.return_next( p_text.substring(i, i + p_rec_l) );
   }
$$
LANGUAGE plv8;

A simple test:

SELECT split_text('JohnJackPaulJaneMary', 4);

Output:

split_text
------------
John
Jack
Paul
Jane
Mary
(5 rows)

To try out the function on our Zip+4 files we loaded up the 19 files for Massachusetts into a table. Each file contains one row so we end up with 19 rows in our zip4raw table. The row size varies between 2KB and 27MB with most around 10MB.

Step 1 is to get directory listing of the files:

CREATE SCHEMA staging;
CREATE TABLE staging.dir_list(filename text);
COPY staging.dir_list 
    FROM PROGRAM 'dir C:\data\USPS\zip4\ma\*.txt /b /S'  
    WITH (format 'csv');
CREATE UNLOGGED TABLE staging.zip4raw(data text);

Step 2 is to load each file as a record into zip4raw table. Speed-wise both BigSQL and EDB came out pretty close with BigSQL faster by a tiny margin.

DO language plpgsql
$$
BEGIN
    EXECUTE (
        SELECT string_agg(
            'COPY staging.zip4raw(data) FROM ' || 
            quote_literal(filename) , ';'
        )
        FROM staging.dir_list
    );
END;
$$

-- Query returned successfully with no result in 2.3 secs. (PostgreSQL 9.6beta1 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit BigSQL distrib)
-- Query returned successfully with no result in 2.6 secs. (PostgreSQL 9.6beta1, compiled by Visual C++ build 1800, 64-bit EDB distrib)

At this point we have a table called staging.zip4raw with 19 rows. Now we use the split_text function to break each row into multiple records, at every 182-character intervals.

CREATE UNLOGGED TABLE staging.zip4raw_records AS 
SELECT r 
FROM 
    staging.zip4raw AS z CROSS JOIN LATERAL 
    split_text(z.data,182) AS r;

-- Query returned successfully: 818681 rows affected, 1.3 secs execution time. (PostgreSQL 9.6beta1 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit BigSQL distrib)
-- Query returned successfully: 818681 rows affected, 1.4 secs execution time.(PostgreSQL 9.6beta1, compiled by Visual C++ build 1800, 64-bit EDB distrib)

Using unlogged tables reaps a noticeable performance boost. Without it, execution times peaked into the 4-6 seconds range.

Our executed query looks like this:

CREATE TABLE zip4_detail AS
SELECT 
    substring(r,158, 2)::varchar(2) AS stusps,
    substring(r,160,3)::varchar(3) AS countyfp, 
    substring(r,1,1)::varchar(1) AS copyright_detail_code,
    substring(r,2,5)::varchar(5) AS zip_code,
    substring(r,7, 10)::varchar(10) AS update_key_no,
    substring(r,17,1)::varchar(1) AS action_code,
    substring(r,18,1)::varchar(1) AS record_type_code,
    substring(r,19,4)::varchar(4) AS carrier_route_id,
    substring(r,23,2)::varchar(2) AS street_pre_drctn_abbrev,
    substring(r,25,28)::varchar(28) AS street_name,
    substring(r,53,4)::varchar(4) AS street_suffix_abbrev,
    substring(r,57,2)::varchar(2) AS street_post_drctn_abbrev,
    substring(r,59,10)::varchar(10) AS addr_primary_low_no,
    substring(r,69,10)::varchar(10) AS addr_primary_high_no,
    substring(r,79,1)::varchar(1) AS addr_prmry_odd_even_code,
    substring(r,80,40)::varchar(40) AS building_or_firm_name,
    substring(r,120,4)::varchar(4) AS addr_secondary_abbrev,
    substring(r,124,8)::varchar(8) AS addr_secondary_low_no,
    substring(r,132,8)::varchar(8) AS addr_secondary_high_no,
    substring(r,140,1)::varchar(1) AS addr_secny_odd_even_code,
    substring(r,141,2)::varchar(2) AS low_no_zip_sector_no,
    substring(r,143,2)::varchar(2) AS low_no_zip_segment_no,
    substring(r,145,2)::varchar(2) AS high_no_zip_sector_no,
    substring(r,147,2)::varchar(2) AS high_no_zip_segment_no,
    substring(r,149,1)::varchar(1) AS base_alt_code,
    substring(r,150,1)::varchar(1) AS lacs_status_ind,
    substring(r,151,1)::varchar(1) AS govt_bldg_ind,
    substring(r,152,6)::varchar(6) AS finance_no,
    substring(r,163,2)::varchar(2) AS congressional_dist_no,
    substring(r,165,6)::varchar(6) AS muncipality_ctyst_key,
    substring(r,171,6)::varchar(6) AS urbanization_ctyst_key,
    substring(r,177,6)::varchar(6) AS prefd_last_line_ctyst_key
FROM (
    SELECT r 
    FROM 
        staging.zip4raw AS z CROSS JOIN LATERAL 
        split_text(z.data,182) AS r
) AS c
WHERE r LIKE 'D%';

-- Query returned successfully: 752794 rows affected, 14.7 secs execution time. (PostgreSQL 9.6beta1 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit BigSQL distrib)
-- Query returned successfully: 752794 rows affected, 15.1 secs execution time. (PostgreSQL 9.6beta1, compiled by Visual C++ build 1800, 64-bit EDB distrib)

BigSQL performed slightly better than EDB VC++ in these tests. This agrees with Jim Mlodgenski recent benchmark findings, that PostgreSQL compiled with Mingw64 is slightly faster than PostgreSQL compiled with VC++.