Last time we demonstrated how to query delimited text files using the fdw_file that comes packaged with PostgreSQL 9.1+, this time we'll continue our journey into Flat file querying Foreign Data Wrapper using an experimental foreign data wrapper designed for also querying delimited data, but outputting it as a single column text array table. This one is called file_textarray_fdw and developed by Andrew Dunstan. It's useful if you are dealing with for example jagged files, where not all columns are not properly filled in for each record or there are just a ton of columns you don't want to bother itemizing before you bring in. The benefit is you can still query and decide how you want to break it apart. You can grab the source code from file_text_array_fdw source code. If you are on windows, we have compiled binaries in our Bag o' FDWs for both PostgreSQL 9.1 32-bit FDW for Windows bag and PostgreSQL 9.1 64-bit FDW for Windows bag that should work fine with the EDB installed windows binaries. For other systems, the compile is fairly easy if you have the postgresql development libraries installed.
We'll be using the same database schema and system file folder to house things as we did in our prior article.
make installor you can do just a
makeand copy the .sql and .control files into the PostgreSQL share/extensions folder, and the .so files into your PostgreSQL lib folder. Keep in mind, you need PostgreSQL 9.1 or above.
CREATE EXTENSION file_textarray_fdw;
CREATE SERVER file_tafdw_server FOREIGN DATA WRAPPER file_textarray_fdw;
This is often necessary to do for most foreign data wrappers, though for some reason we were able to skip this step with the file_fdw one.
We need to create a foreign data mapping user to a database user. We don't want to itemize what users can access our data source, so'll create just a group mapping for all users using the group role
public which all our users of our database are members of.
CREATE USER MAPPING FOR public SERVER file_tafdw_server;
For this exercise, we'll use the 2010 Gazetteer natial place which is tab-delimited and can be downloaded from census 2010 places data and more specifically the Gaz_places_national.txt which we will save in our local server fdw_data folder.
CREATE FOREIGN TABLE staging.places2010( x text ) SERVER file_tafdw_server OPTIONS (filename '/fdw_data/Gaz_places_national.txt', encoding 'latin1', delimiter E'\t');
You can change the location of a file as well. For example, the query above will not work if you are on windows. You'll need to specify the drive letter as well. You can do this:
ALTER FOREIGN TABLE staging.places2010 OPTIONS ( SET filename 'C:/fdw_data/Gaz_places_national.txt' );
Now to query our table, we do this:
SELECT x As state, x As place FROM staging.places2010 WHERE x = 'STUSPS' OR (x IN('RI', 'DC') AND x ILIKE 'wa%');
This will give both the header column and all data for Rhode Island and DC fitting our criterion.
state | place --------+------------------------- STUSPS | NAME DC | Washington city RI | Wakefield-Peacedale CDP RI | Warwick city RI | Watch Hill CDP
Note I kept the header, this allows for easily flipping to a more friendly hstore structure. For this next example, you'll need hstore installed and if you don't have it install it with:
CREATE EXTENSION hstore;
We'll use a common table expression, but you can just as easily dump the data into a table of your choosing.
WITH cte As (SELECT hstore(headers.x, p.x) As kval FROM (SELECT x FROM staging.places2010 WHERE x = 'STUSPS') As headers CROSS JOIN staging.places2010 As p WHERE (p.x IN('RI', 'DC') AND p.x ILIKE 'wa%') ) SELECT kval->'STUSPS' As state, kval->'NAME' As name, kval->'GEOID' As geoid FROM cte ORDER BY kval->'NAME';
state | name | geoid -------+-------------------------+--------- RI | Wakefield-Peacedale CDP | 4473130 RI | Warwick city | 4474300 DC | Washington city | 1150000 RI | Watch Hill CDP | 4475200
Every FDW has its own set of options, which makes sense since FDWs are so varied in purpose. I only used a couple of options in this tutorial for the foreign table, but here is a more exhaustive list of what's available for file_textarray_fdw