UPDATE: If you are using PostgreSQL 9.5+, you can use the IMPORT FOREIGN SCHEMA feature which is available in ogr_fdw 1.0.1+. We demonstrate this in: ogr fdw IMPORT FOREIGN SCHEMA.
The ogr_fdw comes with this to die for commandline utility called ogr_fdw_info that does generate the table structures for you and will also list all the tables in the Foreign data source if you don't give it a specific table name. So with this utility I wrote a little hack involving using PostgreSQL COPY PROGRAM feature to call out to the ogr_fdw_info commandline tool to figure out the table names and some DO magic to create the tables.
Though ogr_fdw is designed to be a spatial foreign data wrapper, it's turning out to be a pretty nice non-spatial FDW as well especially for reading spreadsheets which we seem to get a lot of. This hack I am about to demonstrate I am demonstrating with LibreOffice/OpenOffice workbook, but works equally well with Excel workbooks and most any data source that OGR supports.
This hack came in really handy for processing workbooks (with several spreadsheets e.g. an Excel or LibreOffice workbook) and SQL Server tables. Sadly I tried it on my MS Access data source, and while it could read the list of tables, PostgreSQL returned blank text when trying to grab the table structure. This was really weird considering from the commandline I can see the table structure. Haven't figured out what caused that. I blame it on some funky character MS Access is spitting out that PostgreSQL is reading as a black hole character or some weird permission issue with reading MS access catalogs from within PostgreSQL. The commandline worked fine using directly even for MS Access and reading the table once the foreign table was created worked fine as well.
The ogr_fdw_info wrapper function
The wrapper function just spits out the SQL to do the work. The reason I have it not do the work is because depending on my needs, I may say want to prefix the table names with something or change the data types it emits. For example in SQL Server, unless the timestamp field is the last field in the table, ogr_fdw doesn't handle it right. I still have to investigate this one. So where it says timestamp, for SQL Server I swap it out for varchar.
Note that I also hard-coded the executable path in there which is very specific to where it will end up if you are running PostgreSQL 9.4 on windows.
Some things to be cognizant of
The COPY FROM is a server SQL command (not the psql variant). As such the executable must be on the server and accessible by the postgres service account
Your data source must also be accessible from the server postgres process, which usually means it must reside on the server somewhere unless its a webservice like CartoDB or WFS.
Your paths will obviously be different if on Linux/Unix/Mac and probably much shorter. I purposely used the windows path because windows users have it harder with windows love of spaces and all that.
2016-01-03 - ogr_fdw_table function updated to prevent COPY FROM from mangling layer names with special characters in it. Done by forcing to text with csv and using | (as delimiter which wouldn't exist in most datasources so prevents , from being used which could be found in name of data source.
layer text DEFAULT''::text)
RETURNS text AS
CREATE TEMP TABLEogr_fdw_out(out text);
IF layer >''THENEXECUTE'COPYogr_fdw_outFROMPROGRAM''"C:/ProgramFiles/PostgreSQL/9.4/bin/ogr_fdw_info"-s"'|| data_source ||'"-l"'|| layer ||'"''WITH(format''csv'',delimiter''|'')';
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
For demonstrating this little puppy in action, I need a workbook with several spreadsheets to do it justice. So I downloaded a LibreOffice spreadsheet template from Monthly Home Budget Workbook
and created a new workbook called Budget2015.ods from that. Note that LibreOffice, OpenOffice, and the Microsoft Excel 2007 and above (the .xlsx) format all are some variant of compressed XML. I will reiterate that what data sources you can read with ogr_fdw is dictated by your libgdal library and what support it is compiled with. In the case of reading LibreOffice/OpenOffice workbooks and the newer Microsoft XLSX files, your GDAL library needs to be compiled with Expat. In order to read older Excel files (xls format), you need to have your libgdal built with FreeXL support. The windows builds we created and detailed in PostgreSQL 9.4 bag-o-fdws are built with Expat and FreeXL (and as you can see, offer xlsx and ods support) and xls.
Reading LibreOffice Calc workbooks (ODS) direclty from PostgreSQL
So to test out, I throw my new fangled budget workbook in my FDW folder and ran this little statement.
And the output is this beautiful thing. Now as a non-spatial person, you have to get over the hump that Layer means Table as far as you are concerned
Form 2 - Equity
Form 3 - Income
Form 4 - Lump Sum Planning
Form 5 - Monthly Cash Flow
Form 7 - Allocation
Form 8 - Spending Details
Form 9 - Work Expenses
Categories and Items
So we've got a couple of spreadsheets (tables, layers) in this workbook. I'm going to demonstrate how to see the structure with 2 different widely different
spreadsheets just to demonstrate a particular behavior of the OGR ODS driver (the XLSX and XLS drivers seem to behave the same as far as I can tell) when faced with something that is not quite tabular looking (or lacking headers) vs. is tabular.
Now if you open up the workbook in LibreOffice Calc and look at the dashboard sheet, you'll see it's got charts and rows of data with no header. It has no header for the table. The only thing that is sure is that
column 3 has got numbers. So since OGR needs to ascibe a column name and can't infer one from the top row, it just calls them field1.. fieldn. OGR always adds an additional column, called fid at the beginning which is really a row number identifier. You'll also sometimes see a geom column as the second column if it thinks it's a spatial datasource.
If you were to do the same exercise using:
SELECTogr_fdw_sql_table('C:/fdw_data/Budget2015.ods','Form 4 - Lump Sum Planning');
You'd get the same CREATE SERVER command since they come from the same workbook, but your CREATE FOREIGN TABLE would look like this:
Note how the table name and the field names have been changed so they are legal field and column names for PostgreSQL (not requiring you to quote the columns or table name).
For example the spaces in the header of spreadsheet cells got converted to underscore and lower case: e.g. Monthly Amount became monthly_amount. At first glance,
it seems a little strange it considered annual_amount varchar and monthly_amount integer. OGR ODS driver by default infers the datatype from the data it sees (there are override environment variables for this, but can't pass them into the OGR_FDW yet I don't think). The Monthly Amount column
had just 0s in it and the Annual Amount column had no values. If you go back and fill in the Annual Amount column with numbers (so that monthly_amount recomputes to numbers with decimals), and then you rerun this command, you'll find that the annual_amount and monthly_amount both then show as real data type.
To prepare the data for querying directly from PostgreSQL, we're going to:
Create a schema to house foreign tables:
CREATE SCHEMA budget2015;
Create a foreign server pointing to this workbook. For this take the CREATE SERVER statement and replace myserver with: svr_budget2015_workbook.
CREATE SERVER svr_budget2015_workbook
FOREIGN DATA WRAPPER ogr_fdw
format 'ODS' );
Apply some DO magic to create the foreign tables: This do magic drops the foreign tables if they exist and recreates them. And also replaces references to myserver with the new server name. Finally it prefixes the tables with buget2015. so they get dumped in the budget2015 schema. You could do the same with set search_path before you run the DO.
DO language plpgsql $$
DECLARE var_sql text;
var_sql :=(WITH conn As(SELECT'C:/fdw_data/Budget2015.ods'::text As conn),
cte_tb_names AS(SELECTunnest(string_to_array(ogr_fdw_sql_table(conn), E'\n'))As table_name
, tb AS(SELECTtrim(table_name)As table_name
FROM cte_tb_names WHERE table_name NOTLIKE'Layers:%'andtrim(table_name)>'')SELECTstring_agg(replace(regexp_replace(ogr_fdw_sql_table(conn, tb.table_name),
'myserver','svr_budget2015_workbook'), E'\n')AssqlFROM tb, conn);
Finally to verify I got the worksheets as linked foreign tables, I run this query:
WHERE table_schema='budget2015'ORDERBY table_name;