Using DbLink to access other PostgreSQL Databases and Servers

People coming from SQL Server and MySQL often complain about how you can't query other databases from within a PostgreSQL database. In Microsoft SQL Server, there is a concept of querying across databases on the same server with dbname.dbo.sometable and querying across servers (even of different types e.g. Oracle) by setting up a linked server and doing something such as servername.dbname.dbo.sometable or using the OPENROWSET(..) or OPENQUERY(...) syntax.

MySQL has a similar feature by using syntax dbname.sometable, but MySQL lacks schemas so there is no way to segregate a database into nice buckets as you can with SQL Server and PostgreSQL.

In this article we shall provide some examples of using the contrib module - dblink to query local PostgreSQL databases and remote PostgreSQL databases. DbLink is probably most comparable in structure to SQL Server's OpenRowset functionality. It lacks the power of SQL Server's Linked Server approach or OPENQUERY that allows for synchronized joins between linked servers/databases and local tables and updates/inserts on linked servers. This makes it not terribly useful in cases where you need to join lots of data with local data. It does however come in handy for bulk copy operations from one database/server to another.

Install DbLink

DbLink is located in share/contribs/dblink.sql of your PostgreSQL install.To use it, load the dblink.sql file in the database you would like to use it in.

Querying other local PostgreSQL Databases with DbLink

DBLink returns a generic data row type. One of the annoying things about this is that in order to use it, you need to specify the output structure. Perhaps in future versions of PostgreSQL, this limitation of lack of introspection of generic row types will be irradicated. The lack of introspection is an issue not only for DbLink, but for other generic row type returning functions such as TableFunc cross tab functions.

Below is an example of querying a database on the same server and cluster using DbLink. Note if no username and password is specified, then DbLink connects with whatever account you are currently using.


SELECT towns.*
FROM dblink('dbname=somedb','SELECT town, pop1980 FROM towns')
    AS towns(town varchar(21), pop1980 integer);


If you are running on a non-standard port (something other than 5432), then you will need to specify the port as follows:


SELECT p.*
FROM dblink('dbname=ma_geocoder port=5433',
            'SELECT gid, fullname, the_geom_4269 
                    FROM ma.suffolk_pointlm')
    AS p(gid int,fullname varchar(100), the_geom geometry);


Querying Remote servers with DbLink and Bulk Insert

DBLink comes in particularly handy for querying remote PgServers and doing bulk inserts. To do so, you will need to specify the full credentials of the remote server. Below is an example of this.


SELECT blockgroups.*
    INTO temp_blockgroups
    FROM dblink('dbname=somedb port=5432 
            host=someserver user=someuser password=somepwd',
    'SELECT gid, area, perimeter, state, county, 
            tract, blockgroup, block, 
            the_geom 
            FROM massgis.cens2000blocks')
            AS blockgroups(gid int,area numeric(12,3),
                perimeter numeric(12,3), state char(2), 
                county char(3), tract char(6), 
                blockgroup char(1), 
                block char(4), the_geom geometry);


Joining with Local Data

This is the major area where DbLink falls short. While you can join remote dblinked tables with local tables, for large tables, this tends to be really slow as indexes are not used and DBLinked function call needs to return the full dataset before they can be joined. Below is a simple example of a join:


SELECT realestate.address, realestate.parcel, s.sale_year, s.sale_amount, 
        FROM realestate INNER JOIN 
                dblink('dbname=somedb port=5432 host=someserver 
                user=someuser password=somepwd',
                'SELECT parcel_id, sale_year, 
                    sale_amount FROM parcel_sales')
            AS s(parcel_id char(10),sale_year int, sale_amount int)
             ON realestate.parcel_id = s.parcel_id;


If sales table is relatively huge, then you may be better forcing a nested join by creating a function that queries dblink and putting that in the SELECT clause and having a dynamic dblink query of the form.


CREATE TYPE payment AS
   (
    payment_id integer,
    customer_id smallint,
    staff_id smallint,
    rental_id integer,
    amount numeric(5,2),
    payment_date timestamp);

CREATE OR REPLACE FUNCTION fn_remote_payments(cust int)
  RETURNS SETOF payment AS
$$

 SELECT * FROM dblink('dbname=pagila host=localhost user=pagila',
    'SELECT payment_id, customer_id, staff_id, 
            rental_id, amount, payment_date 
        FROM payment WHERE customer_id = ' || $1) As
            p(payment_id int,customer_id smallint, 
                staff_id smallint, 
            rental_id integer, amount numeric(5,2), 
            payment_date timestamp )

$$
  LANGUAGE 'sql' VOLATILE;  
  
SELECT rs.customer_id, realestate.address, 
    realestate.parcel, (fn_remote_payments(rs.customer_id)).* 
        FROM realestate rs
        WHERE rs.last_name = 'SMITH';


Potential Improvements

DBLink is not as feature rich as something like Microsoft SQL Server Linked Server or Oracle's DbLink. We can think of 2 features which hopefully will be implemented in future versions of PostgreSQL that will improve DbLink among other contribs.