Using PLV8 to build JSON selectors

Unfortunately we missed Postgres Open this year, but we did catch some of the slides. One of them was Embracing the web with JSON and PLV8 by Will Leinweber of Heroku. He had a great slide deck with a lot of interesting points. One surprising for us was that even in tasks that both PL/PgSQL and PL/V8 can do, PL/V8 is sometimes faster as demonstrated in his slides: #51 thru #54

Another interesting point he covered which is the topic of this article is the ease with which you can build PLV8 functions from javascript libs on the web. In particular JSON:Select library. In one article we demonstrated a JQuery app with PLV8 and one of JQuery's foundations is the CSS like selector syntax it provides for JSON and HTML document elements which allows you to drill down a document using CSS3 style referencing, much like what xpath does for xml. One of the glaring features missing in PostgreSQL 9.2 basic JSON support is a function to navigate a JSON document comparable to the PostgreSQL built-in xpath function for xml. So how do we get this json selector goodness available to us in the database? Like all good monkeys, we copy/emulate it.

In this article we'll demonstrate how to install JSON:select and conclude with how to use it to manipulate the GeoJSON Feature Collection we discussed in Creating GeoJSON Feature Collections with JSON and PostGIS.

Creating an in-db JSON selector function

We're stealing this example from Will's presentation described in his slide 71 and the documented json selector code from: https://github.com/lloyd/JSONSelect/blob/master/src/jsonselect.js. For client side javascript, you'd want to use the minified version, but since we are using this for server-side in-db code we don't need to worry about code weight and prefer readability over size. You'll need to install PL/V8 in your database before you can install the select code which means you have to have the binaries installed and then in database to plv8 enable:
CREATE EXTENSION plv8;

Remember if you are on Windows, we just made fresh compiled binaries for windows PostgreSQL 9.2, and if you are not, you can compile your own or beg your package maintainer to offer PL/V8 extension as an option if they don't have it already.

The JSON selector code is pretty long, but here is the basic structure of the function.

CREATE OR REPLACE FUNCTION
json_select(selector text, data json)
returns json as $$
  exports = {};
  //copy the contents of jsonselect.js here
  return JSON.stringify(
    exports.match(selector,
                  JSON.parse(data)));
$$ LANGUAGE plv8 IMMUTABLE STRICT;

Now let's test this out with our GeoJSON example:

SELECT json_select('.properties .loc_name', '{"type":"FeatureCollection",
"features":[
  {"type":"Feature","geometry":{"type":"Point","coordinates":[42.400469999999999,-71.2577]},
     "properties":{"loc_id":1,"loc_name":"Waltham, MA"}},
  {"type":"Feature","geometry":{"type":"Point","coordinates":[42.990189999999998,-71.462590000000006]},
    "properties":{"loc_id":2,"loc_name":"Manchester, NH"}},
 {"type":"Feature","geometry":{"type":"Point","coordinates":[-96.757239999999996,32.909770000000002]},
    "properties":{"loc_id":3,"loc_name":"TI Blvd, TX"}}
]
}');
--- output is huh JSON array object? ---
["Waltham, MA","Manchester, NH","TI Blvd, TX"]

If you are like us you prefer sets over JSON thingies, so how do we go from this JSON thing to a row set we are more familiar with working with?

Calling PLV8 functions from other PL/V8 functions

It turns out calling PL/V8 (javascript) functions from others is easy to do as documented in PLV8 Wiki. Soooo:

CREATE OR REPLACE FUNCTION json_select_multi(selector text, data json)
returns SETOF text as $$
    var func = plv8.find_function("json_select");
    var o = JSON.parse(func(selector,data));
    for(var i=0; i < o.length; i++){
        if (typeof o[i] == 'object') {
            /** we only want to stringify objects and leave numbers etc. alone **/
            if (o[i].length > 0){
                for (var j=0; j < o[i].length; j++) {
                /** if the object is an array of objects we want to expand it **/
                    plv8.return_next(JSON.stringify(o[i][j]) )
                }
            }
            else {
                plv8.return_next(JSON.stringify(o[i]));
            }
        }
        else {
            plv8.return_next(o[i]);
        }
    }
$$ LANGUAGE plv8 IMMUTABLE STRICT;

Testing the functions

Test our new set function.

SELECT json_select_multi('.properties .loc_name', '{"type":"FeatureCollection",
"features":[
  {"type":"Feature","geometry":{"type":"Point","coordinates":[42.400469999999999,-71.2577]},
     "properties":{"loc_id":1,"loc_name":"Waltham, MA"}},
  {"type":"Feature","geometry":{"type":"Point","coordinates":[42.990189999999998,-71.462590000000006]},
    "properties":{"loc_id":2,"loc_name":"Manchester, NH"}},
 {"type":"Feature","geometry":{"type":"Point","coordinates":[-96.757239999999996,32.909770000000002]},
    "properties":{"loc_id":3,"loc_name":"TI Blvd, TX"}}
]
}');
--output is a set of text much better :) ---
 json_select_multi
-------------------
 Waltham, MA
 Manchester, NH
 TI Blvd, TX

Repurposing relational technology

Now if I were a blasphemous document loving person, with little respect for relational databases, I may not be satisfied with just being able to parse JSON documents, but would also want to store them in the database?

CREATE TABLE myfeatures(id serial primary key, data json);
INSERT INTO myfeatures(data)
SELECT json_select_multi('.features',  '{"type":"FeatureCollection",
"features":[
  {"type":"Feature","geometry":{"type":"Point","coordinates":[42.400469999999999,-71.2577]},
     "properties":{"loc_id":1,"loc_name":"Waltham, MA"}},
  {"type":"Feature","geometry":{"type":"Point","coordinates":[42.990189999999998,-71.462590000000006]},
    "properties":{"loc_id":2,"loc_name":"Manchester, NH"}},
 {"type":"Feature","geometry":{"type":"Point","coordinates":[-96.757239999999996,32.909770000000002]},
    "properties":{"loc_id":3,"loc_name":"TI Blvd, TX"}}
]
}')::json;

Which would result in 3 table rows of features. There are varying degrees of blaphemy one can achieve, we can only stand this much.

Querying Spatial feature collection

If we had PostGIS installed in our database as we could get with:

CREATE EXTENSION postgis;

We could even do something as insane as this:

CREATE OR REPLACE FUNCTION json_select_geog(selector text, data json)
returns geography  as $$
    SELECT ST_Union(geom)::geography 
        FROM (SELECT ST_GeomFromGeoJSON(json_select_multi(selector, data)) As geom) As foo ;
$$ LANGUAGE sql IMMUTABLE COST 1;


CREATE INDEX idx_myfeatures_gist_geog
   ON myfeatures USING gist (json_select_geog('.geometry',data));

If we had more than a paltry sampling of data, our spatial json soup index would be used when we write a query something like:

SELECT json_select_multi('.loc_id', data)::integer As loc_id
    , json_select_multi('.loc_name', data)  As loc_name
    FROM myfeatures 
WHERE ST_DWithin(ST_GeogFromText('POINT(42.40 -71.257)')
    , json_select_geog('.geometry', data),100000 );

If you had a lot of spatial data and bigger geometry types like long linestrings or polygons, we suspect, it would probably be more efficient to not go crazier than:

DROP TABLE IF EXISTS myfeatures;
CREATE TABLE myfeatures(id serial primary key, properties json, geog geography(POINT,4326));
INSERT INTO myfeatures(properties, geog)
SELECT json_select_multi('.properties .loc_id',feat)::integer As id, json_select_multi('.properties', feat)::json As properties 
    , json_select_geog('.geometry', feat) As geog
FROM ( SELECT json_select_multi('.features',  '{"type":"FeatureCollection",
"features":[
  {"type":"Feature","geometry":{"type":"Point","coordinates":[42.400469999999999,-71.2577]},
     "properties":{"loc_id":1,"loc_name":"Waltham, MA"}},
  {"type":"Feature","geometry":{"type":"Point","coordinates":[42.990189999999998,-71.462590000000006]},
    "properties":{"loc_id":2,"loc_name":"Manchester, NH"}},
 {"type":"Feature","geometry":{"type":"Point","coordinates":[-96.757239999999996,32.909770000000002]},
    "properties":{"loc_id":3,"loc_name":"TI Blvd, TX"}}
]
}')::json As feat ) AS foo;



When we check the table:

SELECT * FROM myfeatures;
 id |                properties                |            geog
----+------------------------------------------+--------------------------------
  1 | {"loc_id":1,"loc_name":"Waltham, MA"}    | 0101000020E6100000B...
  2 | {"loc_id":2,"loc_name":"Manchester, NH"} | 0101000020E6100000C...
  3 | {"loc_id":3,"loc_name":"TI Blvd, TX"}    | 0101000020E61000004...

In short, for good or bad, PostgreSQL 9.2 provides more food for thought as to how you can organize your data. It's hard not to be a glutton about it and easy to lose sight of the fundamentals.