GDAL OGR2OGR for Data Loading

What is FWTools and OGR GDAL?

FWTools GIS Toolkit is a freely available open source toolkit for Windows and Linux that can do more than GIS tricks. It is a precompiled bundle of Open Source GIS tools. The FW comes from the initials of Frank Warmerdam, the originator of the toolkit and current President of the Open Source Geospatial Foundation (OSGEO).

One key component of the GIS Toolkit is the GDAL/OGR library. Parts of the library have been enhanced by several in the OSGEO community. GDAL is a basic foundation of countless Open source GIS as well as commercial GIS applications. Here are Listings of commercial and open source software that use it and GDAL sponsors.

This is a library which historically has been developed and maintained by Frank Warmerdam, but has started to garner quite a few developers. GDAL is X/MIT licensed (similar to BSD license), therefore the licensing is very generous for commercial use. The toolkit can be downloaded from http://fwtools.maptools.org/

In this article we shall talk a little bit about the OGR2OGR commandline data loader tool packaged in here, which should be of interest to many even those who do not know or care about GIS data. We shall demonstrate some non-GIS uses of this light-weight tool.

If you want to see some GIS example uses of this tool such as loading data into Postgis tables from various GIS formats or exporting data out and brief install instructions, check out our OGR2OGR Tips page.

Although GDAL/OGR was designed to deal with GIS data, a side-effect of the design is that it also handles regular attribute and relational data. The reason is simple. Spatial Data is not an island; it most often accompanies standard relational and attribute data to make it more meaningful. If you load mapping objects and geometries, you want to load the attributes of them as well.

When using OGR2OGr for loading non-spatial data ,which it really wasn't designed for, it does have some additional annoying consequences, but we feel the almost fit in your pocket feel of it makes it a convenient swiss army knife even for non-spatial data.

I would say we are not alone in being addicted to this tool. The ability to load data from various data sources with a simple line of script code is breath-taking. I think Bill Dollins says it well in his A One-Touch (Almost) Data Loader With FWTools and PostGIS. One only needs to look at the OGR supported formats to get a sense of just how powerful this tool is. Enough talk about the greatness of this tool; I'm beginning to make myself nauseous. On with the show.

OGR2OGR can read from any read supported format and write to any write supported format. You can imagine the permutations of data loading you can do here. To get a list of the current drivers your install supports. Launch your FW Tools command line and type:


ogrinfo --formats

(NOTE: some drivers such as the Oracle Spatial and IBM Informix Spatial Datablade require proprietary libraries so you must compile yourself)

Some Caveats - Rought Spots

  1. We had hoped the -nlt NONE would prevent geometry columns from being created, but doesn't appear to or rather we haven't been successful in getting this to work, so blank geometry columns are created.
  2. The overwrite also doesn't appear to work if you also include a SCHEMA. Works if you don't include a SCHEMA option.
  3. Data Types such as varchar are brought in as char, integers as numeric.
  4. Sometimes OGR fails if the Client Encoding is not LATIN1 and you've got high-end characters in your data. To get around this we usually set the database client_encoding to LATIN1 at least while we are loading the database with a command of the form
    ALTER DATABASE pgdbname SET client_encoding=latin1;
  5. Views are brought in as if they were regular tables. Which in a lot of cases is a feature, but it would be nice to override this.
  6. In older version, if you are trying to append to a schema qualified table, it seems to try to recreate the table even if you don't designate update.
  7. Primary keys are not preserved and a dummy key is always created when in create mode.
  8. It can't create schemas or databases
  9. When using the -sql tag, the field lengths and data types are lost

For the following examples we will be a bit verbose and spell out the full connection strings to the PostgreSQL database. Keep in mind any arguments left out will use their default parameters. For example if no host or user or password or port is specified, OGR will use localhost and logged in user and 5432 port. For us this is usually wrong since our PostgreSQL server is on a separate box from our OGR2OGR scripts (e.g. we are using PgAgent that is sitting on a separate box) or is running on a non-standard port so we spell it out. We spell it out too because few tutorials spell these out, leaving many to read deeply in the docs to figure this basic stuff out. Other note - for the below - we put these on multiple lines so they fit on a page. In practice, each ogr2ogr call should be a one liner.

Getting listings of tables/views in a data source

To get a listing of tables/views (layers in GIS lingo), the ogrinfo command-line tool comes in handy. Simply pass in the connection info for the database. Below is an example of querying the tables of an ODBC datasource, PostgreSQL, Dbase:

		
ogrinfo "ODBC:some_dsn"
ogrinfo PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432"
ogrinfo "/path/to/somefile.csv" 
ogrinfo "/path/to/somefile.dbf"
		
	

MySQL to PostgreSQL

For starters, lets say you need to do a nightly or one-time load from MySQL to PostgreSQL. In the past, writing to MySQL with OGR was not possible, but as of OGR 1.3.2, read and write to MySQL is possible (so MySQLers 4+ are now in luck and can now load ESRI shape files among other data directly into MySQL). The PostgreSQL driver in OGR has always supported both directions. Below is a simple example of dumping general non-spatial table from MySQL to PostgreSQL. To dump a spatial MYSQL table, the command is pretty much the same.

The below statement will copy the tables customers and orders from a MySQL database to a PostgreSQL database into the schema called mysqldump.. All the below should be on a single line. I wrapped to save space.


--Pull customers and orders into the mysqldump schema

ogr2ogr -overwrite -update -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword" 
	MYSQL:"mydb,host=myhost,user=mylogin,password=mypassword,port=3306" 
		 -lco OVERWRITE=yes -lco SCHEMA=mysqldump customers orders


--Pull customers and orders into the public schema, and overwrite if it exists

ogr2ogr -overwrite -update -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword" 
	MYSQL:"mydb,host=myhost,user=mylogin,password=mypassword,port=3306" customers orders

--Pull all the tables from mysql into the mysqldump schema

ogr2ogr -overwrite -update -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword" 
	MYSQL:"mydb,host=myhost,user=mylogin,password=mypassword,port=3306"
	

--For nightly loads where you already have the structure of the table -- you can do a truncate table with a psql script and then append with OGR2OGR -- note: nln means new name for layer. In this case we want to take the mysql table orders and load into our mysqldump.orders table.


psql -U pgloginname -d pgdbname -h pghost -c "TRUNCATE TABLE orders"
ogr2ogr -append -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword" 
	MYSQL:"mydb,host=myhost,user=mylogin,password=mypassword,port=3306" -nln "mysqldump.orders" orders

Microsoft SQL Server (and other ODBC) to PostgreSQL

OGR has a driver that supports ODBC DSN connections. Unfortunately this will only work on Windows. Below is an example that imports all tables from a SQL Server database to PostgreSQL. In this example, the DSN I have is using NT Authentication or has the name and password encoded in the System DSN. For this you need to setup a DSN connection to the SQL Server database in ODBC Manager (we described setting up a DSN in Using MS Access with PostgreSQL, except the ODBC driver seems to only work with System DSNs not File DSNs. For more details on referencing the ODBC DSN - check http://www.gdal.org/ogr/drv_odbc.html


ogr2ogr -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword" 
	"ODBC:system_dsn_goes_here" -update -lco OVERWRITE=yes -nlt NONE

As noted, geometry fields seem to be created and right field types are not necessarily created. SQL Server text come in as VARCHARS and VARCHARS come in as CHARS etc. One way to get around this is to create your table structures before hand and use OGR2OGR to just load the data.

E.g.
--This will append to a table called orders reading from the SQL Server (ODBC) table orders and customers.  It will only append
like-named fields.

psql -h pghost -p 5432 -U pguser -d pgdbname -c "TRUNCATE TABLE orders;TRUNCATE TABLE customers;"
ogr2ogr -append -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432" 
	"ODBC:system_dsn_goes_here" orders customers

Or correct the structures afterword by dropping unnecessary fields using a script generation technique similar to what we described in DML to Generate DDL and DCL. So your script builder SQL to drop all the ogc_fid and wkb_geometry fields created by OGR2OGR would look something like this.

SELECT 'ALTER TABLE ' || t.table_schema || '.' || t.table_name || ' DROP COLUMN ogc_fid;' 
FROM information_schema.tables t 
    INNER JOIN information_schema.columns c 
    ON (t.table_name = c.table_name AND t.table_schema = c.table_schema AND c.column_name = 'ogc_fid')
UNION 
SELECT 'ALTER TABLE ' || t.table_schema || '.' || t.table_name || ' DROP COLUMN wkb_geometry;' 
FROM information_schema.tables t 
INNER JOIN information_schema.columns c 
ON (t.table_name = c.table_name AND t.table_schema = c.table_schema AND c.column_name = 'wkb_geometry'; 


Loading Data from DBase (DBF) files

Below is an example of loading a single dbase file. Although the formats do not list DBase as a supported format, the ESRI Shapefile format includes several files (.shp, .dbf, .shx), so the OGR utility uses the ESRI shapefile driver to accomplish this.


ogr2ogr -f "PostgreSQL" 
	PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432"  
		somefile.dbf  -nln mynewschema.mynewtable

Loading Data from Microsoft Accesss

For loading Microsoft Access, if you have a regular old Access database, it is best to use the ODBC driver. While the PGeo driver (ESRI Personal Geodatabase) does work against an MS Access Database, it relies on certain ESRI meta data tables to work so is not really useable for pure Access database use. One advantage of using OGR2OGR to export MS Access tables is that, unlike the MS Access export which requires you to export each table individually, one can do a bulk export of all the Microsoft Access tables or a subset of tables. The downside of using it over the Microsoft Access export feature is that its implementation of datatypes as mentioned, is impoverished, therefore things that should be text or varchar come in as char or varchar. Integers come in as numerics. Below is an example of copying Microsoft Access tables directly into PostgreSQL. To set it up, you would register a System DSN as we did for our SQL Server example. We'll show a slightly different feature here

This example apppends data from a table called company from an MS Access database into a table in accessdump.company in PostgreSQL, but only copies the records with company_group=2


ogr2ogr -append -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432" 
ODBC:"someaccessdsn" -nln "accessdump.company" -where "company_group=2" company

Exporting data out of PostgreSQL

As mentioned, OGR2OGR can be used to create a data dump in some supported writable format. Below is a simple example to export a table to DBase.

This example will create a table called test in the folder C:/datadbf and in addition to the dbf file an annoying .shp and a .shx that you can discard if you know you are not outputing geometries.


ogr2ogr -overwrite -f "ESRI Shapefile" "C:/datadbf/" 
	PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432" -nln test somepgtable

If you want to output only a subset of the data in a table you can use the -sql, -select, -where properties of OGR2OGR - such as shown below


ogr2ogr -overwrite -f "ESRI Shapefile" "C:/datadbf/" -select "company,sector_id" -where "company_group=2"
	PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432" "msaccessdump.company"

If you needed to use an SQL statement to rename fields and so forth, you would do it something like the below. The main issue we have with this is that the OGR tool is not smart enough to figure out the data types and lengths to output the fields so lengths etc are wrong. In general if we need to rename fields, we create a view in PostgreSQL and output the view.


ogr2ogr -overwrite -f "ESRI Shapefile" "C:\datadbf" 
	-sql "SELECT company_name As company,sector_id FROM accessdump.company" 
	PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432" -nln compsec2