Saying Happy Valentine in PostGIS

A while back I mentioned to Bborie (aka dustymugs) and Sandro (aka strk): We're missing ability to label our images with text. Picture this: What if someone has got a road or a parcel of land and they want to stamp the parcel id or the road name on it and they don't want to have to depend on some piece of mapping software. Many report writers and databases can talk to each other directly such as we demonstrated in Rendering PostGIS raster graphics with LibreOffice and we've got raster functionality in the database already. Can we cut out the middleman? Some ideas came to mind. One we could embed a font lib into PostGIS thus having the ultimate bloatware minus the built-in coffee maker. strk screeched. He's still recovering from my constant demands of having him upgrade his PostgreSQL version. Okay fair enough.

Bborie suggested why don't you import your letters as rasters and then vectorize them. So that's exactly what we are going to demonstrate in this article and in doing so we will combine some of the new features coming in PostGIS 2.1 we've been talking about in Waiting for PostGIS 2.1 series.

Warning this article will have a hint of Rube Goldbergishness in it. If you are easily offended by seeing stupid tricks done with SQL, stop reading now. We are going to take a natural vector product and rasterize it just so we can vectorize it again so we can then rasterize it again. Don't think about it too much. It may trigger activity in parts of your brain you didn't know were there thus resulting in stabbing pains similar to what you experience by quickly guplping down a handful of Wasabi peas.
So here are the steps for creating your own font set you can overlay on your geometries and rasters.

See if you can spot the use of window functions and CTEs in these examples.

Step 1: Find a font you like

I haven't tried, but in theory you can just write out letters evenly spaced on a piece of paper and scan the paper so you have a single image. This would allow you to automate your own writing in the database thus allowing your database be able to impersonate you. A very exhilarating, but slightly disturbing idea.

Too many years of using a keyboard has made my technical drawing skills less than legible, so instead:

  1. I grabbed a free font set called Kankin
  2. Installed it on my computer
  3. Opened up a spreadsheet and in each cell typed in a different letter across with the font.
  4. Took a snapshot image of masterpiece spreadsheet. Which gave this image:
  5. Imported this image into PostGIS with raster2pgsql using command-line packaged tool:
    raster2pgsql -F -Y pics/fonts/kankin.png font_pics | psql -U postgres -d testpostgis210 -h localhost -p 5432

    For the rest of these steps you are going to need to be running a fairly recent PostGIS 2.1.0 development version which has the ubber-cool ST_Tile function which we demonstrated here and is just well super cool.

Step 2: Chop your image into letter bits, reclass, vectorize

For this exercise I'm assuming the letters are evenly spaced. One of the changes in 2.1.0 is that ST_Polygon function ALWAYS returns a multipolygon. Prior versions sometimes returned Polygons.

  1. Create a table to hold new letter vectors:
    CREATE TABLE font_set(letter char(1), geom geometry(Multipolygon), font_name varchar(50)
      , CONSTRAINT pk_font_set PRIMARY KEY (letter,font_name));
    
  2. Do a quick histogram using the raster ST_Histogram function to get a sense of where the pixel values that repesent letters. In a simple world every thing would be as clear as black and white . You'd only have 2 values: that which represents the drawing of a letter and that which represents whitespace. But our world has all this noise caused by differences in pressure you put on your pen, the ink running out and so forth, and the imperfectness of your snapshotting. Our histogram demonstates the issue.

    WITH fp AS(SELECT ST_Tile(ST_Band(rast,1),ST_Width(rast)/36, ST_Height(rast)) As rast 
            FROM font_pics LIMIT 1 )
    SELECT (h).*
    FROM (SELECT ST_Histogram(rast) As h
            FROM fp) As foo;

    The output of the above query is shown below and was so fast I didn't even bother timing it - so as you can see our letter drawing probably falls in the 0-30 range.

      min   |   max   | count |       percent
    --------+---------+-------+----------------------
          0 |  2.9375 |  1984 |     0.11372234323054
     2.9375 |   5.875 |    23 |  0.00131835377737017
      5.875 |  8.8125 |    32 |  0.00183423134242806
     8.8125 |   11.75 |    16 | 0.000917115671214032
      11.75 | 14.6875 |    18 |  0.00103175513011579
    14.6875 |  17.625 |    35 |  0.00200619053078069
     17.625 | 20.5625 |    16 | 0.000917115671214032
    20.5625 |    23.5 |    22 |  0.00126103404791929
       23.5 | 26.4375 |    15 | 0.000859795941763155
    26.4375 |  29.375 |    16 | 0.000917115671214032
     29.375 | 32.3125 |     0 |                    0
    32.3125 |   35.25 |     0 |                    0
      35.25 | 38.1875 |    36 |  0.00206351026023157
    38.1875 |  41.125 |    36 |  0.00206351026023157
     41.125 | 44.0625 |   541 |   0.0310099736329245
    44.0625 |      47 | 14656 |    0.840077954832053
  3. Looking at the above and eyeballing looks like we want to keep 0 - 38 and chuck the rest. What we need is a way to reclassify our world into black and white.

    Luckily PostGIS 2+ has a function ST_Reclass that allows us to chuck the pixel values we don't want. Now we reclassify and vectorize. See the Seamless Vector / Raster operations flying around like a lab experiment gone nuts. We leave it as an exercise to the reader to figure out what we are doing in this snippet -- or you can read the Second edition of PostGIS in Action book in our upcoming Chapter 7. Chapter 7 first draft will be available on MEAP next month and we hope to explain some of these operations.

    INSERT INTO font_set(letter, geom, font_name)
    WITH fp AS(SELECT ST_Tile(ST_Band(rast,1),ST_Width(rast)/36, ST_Height(rast)) As rast 
            FROM font_pics WHERE filename = 'kankin.png' )
        , fpi As (SELECT (ROW_NUMBER() OVER())::integer As pos, rast
    FROM fp)
    SELECT substring('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', pos,1) As letter
        , ST_Polygon(
            ST_SetUpperLeft(ST_SetBandNoDataValue(
                ST_Reclass(ST_Band(rast,1)
                ,'[0-38]:1,(38-255]:0'::text, '1BB'::text)
                  ,0),0,0)
           ) As geom
        , 'kankin'
    FROM fpi;
    

Putting our fontset to work

So now we've got a table of vectorized letters. To make our new vectorized fontset easier to use, we create a word function.

CREATE OR REPLACE FUNCTION get_word_vector(param_word text, param_font_name text) 
 RETURNS geometry
AS
$$
  WITH letters As
   (SELECT geom, width, sum(width + 1) OVER (ORDER By pos) As run_length
        FROM (SELECT geom, letter,ROW_NUMBER() OVER() As pos, ST_XMax(geom) As width
                , ST_YMax(geom) As height
        FROM font_set CROSS JOIN regexp_split_to_table($1, E'') As l 
        WHERE letter = l AND font_name = $2) As foo
)

SELECT ST_Union(
     ST_Translate(geom, run_length - width,0) 
     ) As word_geom
FROM letters; 
$$
language 'sql';

And now we use it to write a somewhat complicated looking SQL statement that can probably be wrapped into a function and be significantly shorter. This utilizes the ST_Union aggregate function for raster, which is much improved in speed in upcoming 2.1.

-- takes about 300ms to do this (that includes outputting the png image)
WITH the_words AS
        (SELECT pos, ST_Translate(get_word_vector(word, 'kankin'), 0, -pos*100) As geom
                FROM  (VALUES (0,'HAPPY'),(1,'POSTGIS'),(2,'VALENTINE')  ) AS my(pos, word)
        )
 , canvas
    AS (SELECT ST_MakeEmptyRaster((ST_XMAX(ext) - ST_XMin(ext))::integer, 
        (ST_YMAx(ext) - ST_YMin(ext))::integer,0, 0, 1, -1, 0, 0, 0) As rast
        FROM (SELECT ST_Extent(geom) As ext FROM the_words) As foo
            )
SELECT ST_AsPNG(
    ST_Resize(ST_Union(ST_AsRaster(geom
     , canvas.rast
     , ARRAY['8BUI','8BUI', '8BUI']::text[]
     , ARRAY[pos*75,6,0], ARRAY[0,0,0]) 
        ), 0.3,0.3))
     FROM  the_words CROSS JOIN canvas;

Which of course outputs this: