File FDW Family: Part 1 file_fdw

Last time we demonstrated how to use the ODBC Foreign Data wrapper, this time we'll continue our journey into Foreign Data Wrapper land by demonstrating what I'll call the File FDW family of Foreign Data Wrappers. There is one that usually comes packaged with PostgreSQL 9.1 which is called fdw_file but there are two other experimental ones I find very useful which are developed by Andrew Dunstan both of which Andrew demoed in PostgreSQL Foreign Data Wrappers and talked about a little bit Text files from a remote source. As people who have to deal with text data files day in and out, especially ones from mainframes, these satisfy a certain itch.

In this article, we'll just cover the file_fdw one, but will follow up in subsequent articles, demonstrating the array and fixed length record ones.

Before we begin, We'll create a staging schema to throw all our foreign data tables.

CREATE SCHEMA staging;

We'll also create a folder on on the root of our postgres server called fdw_data where we will house all the flat files. Make sure the postgres service account has access to it.

Using file_fdw

We'll start with the fdw_file foreign data wrapper extension which is useful for querying CSV, tab-delimited and other delimited files.

Setting up Foreign Data Table using the file_fdw driver

  1. Install the extension with the SQL statement CREATE EXTENSION file_fdw;
  2. Create a Foreign server This is pretty much a formality for file_fdw since hte file always is on the local postgresql server.
    CREATE SERVER file_fdw_server FOREIGN DATA WRAPPER file_fdw;
  3. Create a Foreign Table

    For this exercise, we'll pull a tab delimited data file from FAA Aircraft Reference called aircraft.txt and save to /fdw_data folder. The file layout is found at AircraftFileLayout.txt

    Note that the aircraft.txt has a header row. We'll save this to our fdw_data folder

    CREATE FOREIGN TABLE staging.aircraft (
        Model                         Char (12),
        Last_Change_Date              VarChar(10),
        Region                        VarChar(2),
        Make                          VarChar(6),
        Aircraft_Group                VarChar(6),
        Regis_Code                    VarChar(7),
        Design_Character              VarChar(3),
        No_Engines                    VarChar(11),
        Type_Engine                   VarChar(2),
        Type_Landing_Gear             VarChar(2),
        TC_Data_Sheet_Number          VarChar(8),
        TC_Model                      VarChar(20)
    ) SERVER file_fdw_server
    OPTIONS (format 'csv',header 'true'
    , filename '/fdw_data/aircraft.txt', delimiter E'\t', null '');
    

    You can change the location of a file as well. For example, the query above will not work if you are on windows. You'll need to specify the drive letter as well. You can do this:

    ALTER FOREIGN TABLE staging.aircraft OPTIONS ( SET filename 'C:/fdw_data/aircraft.txt' );

Now to query our table, we simply do this:

SELECT model, region, make, last_change_date
FROM staging.aircraft
WHERE make LIKE 'BEECH%'
ORDER BY last_change_date DESC LIMIT 5;
    model     | region |  make  | last_change_date
--------------+--------+--------+------------------
 B200CT       | CE     | BEECH  | 12/18/2001
 3000         | CE     | BEECH  | 12/17/2001
 B300C        | CE     | BEECH  | 10/11/2001
 C12C         | CE     | BEECH  | 10/11/2001
 65A901       | CE     | BEECH  | 10/11/2001

Notice in the above definition that the Last_change_date field is set to varchar(10), but when we look at the data, it's in American date format and sorting is not using date sorting so records get sorted by month first. To force date sorting, we can make this change

We could do this:

ALTER FOREIGN TABLE staging.aircraft ALTER COLUMN last_change_date type date;

And that works great if you are in the US since the default DateStyle is MDY and rerunning the query gives:

    model     | region |  make  | last_change_date
-------------+--------+--------+------------------
U21A         | CE     | BEECH  | 2004-01-28
B200CT       | CE     | BEECH  | 2001-12-18
3000         | CE     | BEECH  | 2001-12-17
65A901       | CE     | BEECH  | 2001-10-11
C12C         | CE     | BEECH  | 2001-10-11

However if your DateStyle is 'DMY' as it is in Europe and other places, you'll get an error when you go to query the data. We couldn't find a way to force the DateStyle property of a foreign table column aside from doing this:

set DateStyle='MDY'

before running a query on the table