Postgres OnLine Journal: December 2014
An in-depth Exploration of the PostgreSQL Open Source Database
 

Table Of Contents

Using PostgreSQL Contribs

Using PostgreSQL Contribs

 

Oracle FDW 1.1.0 with SDO_Geometry PostGIS spatial support



Oracle FDW is a foreign data wrapper PostgreSQL extension that allows you to read and write to Oracle database tables from a PostgreSQL database. You can get it via the PGXN network or the main website http://laurenz.github.io/oracle_fdw/.

What is new about the latest 1.1.0 release is that there is now support for the Oracle SDO_GEOMETRY type that allows you to map the most common geometry types POINT, LINE, POLYGON, MULTIPOINT, MULTILINE and MULTIPOLYGON to PostGIS geometry type. Much of the spatial plumbing work was done by Vincent Mora of Oslandia. If we have any Windows Oracle users out there, yes there are binaries available for windows for PostgreSQL 9.1- 9.4 for both 32-bit and 64-bit. The FDW does have a dependency on the OCI.dll which I think comes shipped with Oracle products. Unfortunately, we are not Oracle users so can't kick the tires.


Using PostgreSQL Contribs

 

OGR foreign data wrapper on Windows first taste



This christmas I received something very special from Paul Ramsey and Even Roualt as detailed in Foreign Data Wrappers for PostGIS. It's been something I've been patiently waiting for for 4 years. I think it has a few issues I'm working to replicate, but overall it's much faster than I expected and pretty slick.

So why is ogr_fdw so special, because GDAL/OGR is an avenue to many data sources, NOT JUST GEOSPATIAL. It's the NOT JUST that I am most excited about. Though the focus is geospatial you can use it with non-geospatial datasources, as we described a long time ago in OGR2OGR for data loading

Windows Bag O' FDWs for PostgreSQL 9.4 coming soon

To celebrate this new gift, I've compiled ogr_fdw binaries for PostgreSQL 9.4 windows (both 32-bit and 64-bit). I'll be doing the same for other PostgreSQL versions in due time and hope to include this in upcoming PostGIS 2.2 windows bundle. For now, we'll be including the ogr_fdw in our bag-of-fdws for both 64-bit and 32-bit PostgreSQL 9.4. I'm still testing these out before I release, and will provide download links for these in a separate article.

Now this new bag o fdws, contains my other favorite FDWs -- the file_textarray_fdw, which we can't imagine living without for the kind of work we do. I also included www_fdw. Both file_textarray_fdw and www_fdw are show-cased in our new PostgreSQL: Up and Running, 2nd edition book, which just came out in print. Our PostGIS In Action 2nd edition, is still sadly one or 2 months away from hard-copy print and in final stages of proofing.

PostgreSQL 9.4 Windows OGR_FDW

At first pass, I used the same GDAL library (currently 1.11.1) as we use to build PostGIS 2.1 with. This unfortunately was not good enough for our needs, so I recompiled GDAL 1.11.1 to include Expat 2.1 and FreeXL 1.0.0h, as well as Curl (since I had compiled that for www_fdw) dependencies. With expat dependency, you get access to being able to read Microsoft Excel 2007+ XSLX workbooks and OpenOffice spreadsheets via this FDW and with FreeXL you get access to reading older Microsoft spreadsheet binary format .XLS files.

UPDATE: I've recompiled GDAL with ODBC support

For those interested in the gory details of my compile, I have it on my ogr_fdw_depends_build.sh gist which I'm fine tuning a bit.

I wanted to include ODBC, but it seems under mingw the ODBC is called odbc32 and haven't dug in enough to figure out how to trick GDAL into using that instead of the odbc it is looking for. With ODBC you get a lot of other stuff like any data source you can access via ODBC.

This version of GDAL I tested with the PostGIS 2.1.5 windows binaries we recently released on application stackbuilder and seems to be a simple drop and replace affair (for the packaged libgdal-1.dll) (just make sure you also copy the included expat dll, lib-curl,and libfreexl as well) and keep in mind, if you use these, they may be overwritten by a PostGIS upgrade you do.

When we built GDAL for PostGIS, we took only the defaults, because we didn't want to deal with extra dependencies. This meant really useful common drivers like ODBC, Geospatial webservices, OpenOffice Spreadsheet, Excel XLSX, Excel XLS, and SQLite/SpatiaLite will not be accessible with the version of GDAL packaged with PostGIS windows stackbuilder install. I'm rethinking this for PostGIS 2.2. Right now, PostGIS doesn't need the OGR (vector) api of GDAL, and GDAL is only used by the PostGIS raster api to access the raster drivers. So the GDAL vector (OGR) bindings have just been conveniently hybernating waiting for someone to wake them up. Also since this FDW doesn't actually rely on PostGIS, you don't need PostGIS to use it.

I also don't compile with PostgreSQL support, which at first glance seems like a real oddity. The main reason for that is that I use the same GDAL for multiple versions of PostgreSQL, but I don't distribute libpq.dll since that comes packaged with PostgreSQL EDB installers. So I didn't want to run into weird dependency issues with GDAL trying to hook into a different version of libpq.dll than what it was compiled with. If you do compile with PostgreSQL, then you get PostgreSQL (and for raster -- ability to read PostGIS raster), which are very useful for client side use, but of marginal interest for server side (when you've got postgres_fdw already).

The unhighlighted vector drivers are what you get when you build GDAL with no extra dependencies. Still a very rich list. The yellow highlighted ones are the additional ones you get compiling with expat, freexl and Curl. The green highlighted ones are the additional you get compiling with ODBC.

Supported Formats:
  -> "ESRI Shapefile" (read/write)
  -> "MapInfo File" (read/write)
  -> "UK .NTF" (readonly)
  -> "SDTS" (readonly)
  -> "TIGER" (read/write)
  -> "S57" (read/write)
  -> "DGN" (read/write)
  -> "ODBC" (read/write)
  -> "VRT" (readonly)
  -> "REC" (readonly)
  -> "Memory" (read/write)
  -> "BNA" (read/write)
  -> "CSV" (read/write)
  -> "GML" (read/write)
  -> "GPX" (read/write)
  -> "KML" (read/write)
  -> "GeoJSON" (read/write)
  -> "GMT" (read/write)
  -> "WAsP" (read/write)
  -> "PGeo" (readonly)
  -> "MSSQLSpatial" (read/write)
  -> "PCIDSK" (read/write)
  -> "OpenFileGDB" (readonly)
  -> "XPlane" (readonly)
  -> "AVCBin" (readonly)
  -> "AVCE00" (readonly)
  -> "DXF" (read/write)
  -> "Geoconcept" (read/write)
  -> "GeoRSS" (read/write)
  -> "GPSTrackMaker" (read/write)
  -> "PGDump" (read/write)
  -> "GPSBabel" (read/write)
  -> "SUA" (readonly)
  -> "OpenAir" (readonly)
  -> "PDS" (readonly)
  -> "WFS" (readonly)
  -> "HTF" (readonly)
  -> "AeronavFAA" (readonly)
  -> "Geomedia" (readonly)
  -> "EDIGEO" (readonly)
  -> "GFT" (read/write) (Google Fusion tables)
  -> "GME" (read/write) (Google Map tables)
  -> "SVG" (readonly)
  -> "CouchDB" (read/write)
  -> "Idrisi" (readonly)
  -> "ARCGEN" (readonly)
  -> "SEGUKOOA" (readonly)
  -> "SEGY" (readonly)
  -> "XLS" (readonly)
  -> "ODS" (read/write)
  -> "XLSX" (read/write)
  -> "ElasticSearch" (read/write)
  -> "PDF" (read/write)
  -> "Walk" (readonly)
  -> "CartoDB" (readonly)
  -> "SXF" (readonly)

Compile with more, and you could get as many as this. I'll be putting the MapInfo, ESRI Shapefile and MS Office excel drivers to use right away.

Taking ogr_fdw for a test drive

How you connect to datasources varies a bit from driver to driver. For example an excel workbook is considered a server, and each worksheet is a layer (table in PostgreSQL speak). For datasources like ESRI Shapefile or CSV files, the server is the folder that the files sit in and each individual file is a table.

Paul already demonstrated some examples of using Web Feature service (WFS) and ESRI OpenFileGDB (ESRIs geodatabase) examples https://github.com/pramsey/pgsql-ogr-fdw. Those examples may be exciting to folks who get high on spatial data, but for most people, I imagine they are yawn material. But what about good'ol spreadsheets or other relational databases. Something most people work with?

To take ogr_fdw for a ride

  1. I downloaded a random xlsx file from: Student weight status category reporting results: beginning 2010 and saved to my C:/fdw_data/excel folder on my PostgreSQL server.
  2. Then using the command line tool: ogr_fdw_info that comes packaged with ogr_fdw, I checked to see what spreadsheets the xlsx file had in it with this command:
    ogr_fdw_info -s "C:/fdw_data/excel/Student_Weight_Status_Category_Reporting_Results__Beginning_2010.xlsx"
    and got this: Student Weight Status Category

    Sadly the government had no excel workbooks I could find that had more than one worksheet; a big waste of a workbook. I did try with some of the beauties that people send us and confirmed that yes each worksheet gets listed as a separate layer.

  3. To get the table structure for the spreadsheet I do this:
    ogr_fdw_info -s "C:/fdw_data/excel/Student_Weight_Status_Category_Reporting_Results__Beginning_2010.xlsx" -l "Student Weight Status Category"

    Which outputs:

    CREATE SERVER myserver
      FOREIGN DATA WRAPPER ogr_fdw
      OPTIONS (
        datasource 'C:/fdw_data/excel/Student_Weight_Status_Category_Reporting_Results__Beginning_2010.xlsx',
        format 'XLSX' );
    
    CREATE FOREIGN TABLE student_weight_status_category (
      fid integer,
      location_code varchar,
      county varchar,
      area_name varchar,
      region varchar,
      school_years varchar,
      no__overweight real,
      pct_overweight real,
      no__obese real,
      pct_obese real,
      no__overweight_or_obese real,
      pct_overweight_or_obese real,
      grade_level varchar,
      area_type varchar,
      street_address varchar,
      city varchar,
      state varchar,
      zip_code varchar,
      location_1 varchar )
      SERVER myserver
      OPTIONS ( layer 'Student Weight Status Category' );

    Now this isn't a geospatial file, but GDAL has this narrow-minded view of the world that everything worth recording involves location or space. Unfortunately you can't simply remove that geom column, but you can ignore it. Just set the type to bytea and call it a day and ignore that column when you query. Also I'm going to change the name of the server to reflect its a single workbook This was a bug in the outdated version of ogr_fdw_info I was using. There is no geometry column output if you feed it a non-spatial file

So when all is said and done, you should run a script that looks something like:

CREATE SERVER student_weight_workbook
  FOREIGN DATA WRAPPER ogr_fdw
  OPTIONS (
    datasource 'C:/fdw_data/excel/Student_Weight_Status_Category_Reporting_Results__Beginning_2010.xlsx',
    format 'XLSX' );

CREATE FOREIGN TABLE student_weight_status_category (
  fid integer,
  location_code varchar,
  county varchar,
  area_name varchar,
  region varchar,
  school_years varchar,
  no__overweight real,
  pct_overweight real,
  no__obese real,
  pct_obese real,
  no__overweight_or_obese real,
  pct_overweight_or_obese real,
  grade_level varchar,
  area_type varchar,
  street_address varchar,
  city varchar,
  state varchar,
  zip_code varchar,
  location_1 varchar )
  SERVER student_weight_workbook
  OPTIONS ( layer 'Student Weight Status Category' );

Then you can run any sql and even join your new table with any local table like this:

  SELECT county , sum(no__overweight) As tot_overweight 
  FROM student_weight_status_category 
  WHERE region ILIKE '%North%'
  GROUP BY county
  ORDER BY county;

and get an output like:
    county  | tot_overweight
------------+----------------
ALBANY      |           8084
CLINTON     |           2638
COLUMBIA    |           1903
DELAWARE    |           1581
ESSEX       |            943
FRANKLIN    |           2570
FULTON      |           1209
GREENE      |           1805
HAMILTON    |             46
MONTGOMERY  |           1794
N/A         |          18927
OTSEGO      |           2039
RENSSELAER  |           5366
SARATOGA    |           6747
SCHENECTADY |           4700
SCHOHARIE   |            893
WARREN      |           2714
WASHINGTON  |           2646

I did run into one issue that on occasion if I started with just querying a subset of the columns, I got weird errors like: value too long for type character varying(11) which would mysteriously disappear the second time I run the query or if I do a select of the whole table twice the issue would disappear. I haven't determined if its an issue with the specific data source driver, the ogr_fdw, or somethng about my compile. Haven't had quite the same issue with ESRI shapefiles, though I have had others with things like counts not being the same as first time I ran if I run two simulatenous queries or server crashing if I run two queries and cancel one out. So I suspect there is a dirty memory issue somewhere, but not sure where at the moment.


Using PostgreSQL Contribs

 

Foreign Data Wrappers for PostgreSQL 9.4 Windows



As stated in last article, I've packaged FDW binaries for PostgreSQL 9.4 windows 32-bit and 64-bit and added in the ogr_fdw one. These we've tested with the standard EDB VS built PostgreSQL windows installs and work fine with those.

FDW binaries for PostgreSQL 9.4

Below are the zip files that contain the binaries and dependency files for our favorite FDWs (that aren't normally available for windows):

If you are using Chrome, Chrome is currently flagging the zip files on this site as possible malware with comment Not commonly Downloaded Warning. If you get such a notice click the drop down option it provides and choose Keep. Alternatively try the 7z versions, which you can extract with 7-zip. These for some reason aren't being flagged.
UPDATE: these have been recompiled with GDAL 2.0 branch (2.0.1) to fix some issues with excel driver, also been recompiled with latest ogr_fdw source to fix some issues like datetime handling. However we took out curl support and added sqlite3 support (so drivers now support SQLite formats like GPKG, but no longer support web formats like Google Fusion/CartoDb). If you need the curl support and are happy with not having latest ogr fdw/gdal fixes use the old file now renamed - fdw_win32_94_bin_gdal1.11.7z, fdw_win64_94_bin_gdal1.11.7z

FDWs in the 9.4 downloads

  • file_textarray_fdw - this one is still our favorite because it has no extra depenendencies and handles delimited files (even large ones very fast) and can handle delimited data with varying columns in each row. We've turned a few of our clients that have to deal with third-party flat files onto this one. This is one of those that I really feel should be part of the core PostgreSQL distribution because it's more useful than the packaged file_fdw and fairly trivial to compile.
  • https://github.com/cyga/www_fdw - for querying web services
  • ogr fdw - A monster of an FDW which has already won my heart. Have some weird data source/file you've never heard of before (especially spatial)? There's a good chance OGR FDW can query it. Note this GDAL build has more drivers than what gets packaged with the PostGIS version. I have also included ODBC, so full set of drivers supported is (read-only since this fdw only supports read at the moment):
    Supported Formats:
      -> "ESRI Shapefile" (read/write)
      -> "MapInfo File" (read/write)
      -> "UK .NTF" (readonly)
      -> "SDTS" (readonly)
      -> "TIGER" (read/write)
      -> "S57" (read/write)
      -> "DGN" (read/write)
      -> "VRT" (readonly)
      -> "REC" (readonly)
      -> "Memory" (read/write)
      -> "BNA" (read/write)
      -> "CSV" (read/write)
      -> "GML" (read/write)
      -> "GPX" (read/write)
      -> "KML" (read/write)
      -> "GeoJSON" (read/write)
      -> "GMT" (read/write)
      -> "ODBC" (read/write)
      -> "WAsP" (read/write)
      -> "PGeo" (readonly)
      -> "MSSQLSpatial" (read/write)
      -> "PCIDSK" (read/write)
      -> "OpenFileGDB" (readonly)
      -> "XPlane" (readonly)
      -> "AVCBin" (readonly)
      -> "AVCE00" (readonly)
      -> "DXF" (read/write)
      -> "Geoconcept" (read/write)
      -> "GeoRSS" (read/write)
      -> "GPSTrackMaker" (read/write)
      -> "PGDump" (read/write)
      -> "GPSBabel" (read/write)
      -> "SUA" (readonly)
      -> "OpenAir" (readonly)
      -> "PDS" (readonly)
      -> "WFS" (readonly)
      -> "HTF" (readonly)
      -> "AeronavFAA" (readonly)
      -> "Geomedia" (readonly)
      -> "EDIGEO" (readonly)
      -> "GFT" (read/write)
      -> "GME" (read/write)
      -> "SVG" (readonly)
      -> "CouchDB" (read/write)
      -> "Idrisi" (readonly)
      -> "ARCGEN" (readonly)
      -> "SEGUKOOA" (readonly)
      -> "SEGY" (readonly)
      -> "XLS" (readonly)
      -> "ODS" (read/write)
      -> "XLSX" (read/write)
      -> "ElasticSearch" (read/write)
      -> "PDF" (read/write)
      -> "Walk" (readonly)
      -> "CartoDB" (readonly)
      -> "SXF" (readonly)

We'll be updating the 9.3 ones next with the latest code and adding in the ogr_fdw.