A while back, we discussed using pgAdmin pgScript as a quicky way for
running a repetitive update script where you want each loop to commit right away. Since stored functions have to commit as a whole, you can't use stored functions alone for this kind of processing.
If you are using the psql client packaged with PostgreSQL 9.3 or above,
then you can take advantage of the \watch command that was introduced in that version of psql. We'll demonstrate that
by doing a batch geocoding exercise with PostGIS tiger geocoder and also revise our example from the prior article to use the more efficient and terser LATERAL construct introduced in PostgreSQL 9.3.
Coming in PostgreSQL 9.4 are dynamic background workers which will provide more options for doing batch processing directly in the database. That unfortunately will require more coding in C and permissions to install custom built extensions.
Our test dataset
Our test dataset is small so not very interesting. In normal case you'd have like 10,000 or more addresses you are geocoding.
The \watch num_seconds command runs preceding statements every num_seconds. So if we wanted to geocode in batches of 500,
we'd do the below in our psql console. We also revised the example to use the number of record limit that is in the latest postgis_tiger_geocoder geocode function.
The downside of the \watch is unlike the pgScript approach you can't say run X number of times, so you'll need to stop it after you have concluded all your addresses have been geocoded. That would be a really nice feature to have for future versions of \watch. Now what are we doing in this example which may not be obvious:
We are using ST_SetSRID to cast to wgs 84 (SRID 4326), instead of doing a ST_Transform. Reason is the tiger geocoder data is in NAD 84 long lat (4269) which is close enough to 4326 that it is good enough to assume they are the same, especially given street center-line accuracy won't be exact anyway.
We do a LEFT JOIN instead of CROSS JOIN (aka ,) because since geocode is a set returning function, if no records are returned, we'd lose a record, and we want to flag records with no matches with a -1 for rating and null for everything else. Using a LEFT JOIN ensures all records are returned and records with no matches return a NULL gc.
You could leave out the word LATERAL entirely and in 9.3 and above things would still work fine since LATERAL is an optional keyword. We like
to include because its sometimes hard visually to distinquish invalid syntax from prior versions (or unintentional usages in newer versions) without that optional keyword.
Side note: you'll want to make sure autocommit is on in psql for batch processing. By default it is.