PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots

In our PLPython Part 4: PLPython meets aggregates we demonstrated the power of PostgreSQL in combination with PLPython to create a sequence of ascii plots by using the power of aggregation. Our friend Simon Greener over at Spatial Db Advisor told me the example was clever but ascii plots was so 70ish and ugly, why didn't I do SVG plots? He felt people being predominantly visual would be caught up in the ugliness of Ascii plots and miss the point. At least Bob found them neat and can appreciate the elegance in simplicity, thanks Bob..

To make it up to Simon, we shall demonstrate 2 features we left out in our prior article.

  1. Pretty SVG graphs -- so I don't have to hear more about SVG from Simon and how I'm caught in the 70s.
  2. Ability to plot a schedule -- schedule item, start date, end date using a multi-column aggregate function. Here is a good use for multi-column aggregates or at least I think so. (Yeh for Multi-column aggregates)

If you are into spatial processing in the database or how many ways you can play with XML particularly Oracle Locator, Oracle Spatial, PostGIS and SQL Server 2008, we highly recommend Simon's articles.

Python meets PostgreSQL multi-column aggregates

In this article we shall demonstrate how to create multi-column aggregates with PL/Python. Note you can do this with any PL or SQL language in PostgreSQL, but Python happens to have a lot of nifty libraries we wanted to put to use. For this exercise, We shall create a function agg_svgschedule. What it will do is plot each grouping of data and return an svg schedule plot for each grouping. This aggregate will take 3 columns - a schedule item, a start date and an end date. The steps we covered in How to create multi-column aggregates, can be applied here.

Our basic approach is as follows:

Install Py-SVG

Installing Py-SVG is pretty easy. At the command-line run the following:

  1. Download the Py-SVG package py-svg http://sourceforge.net/projects/py-svg
  2. Extract it
  3. From command line either on linux or window run:
    C:\Python25\python /path/to/pysvg/setup.py install

    For linux users its usually:
    /usr/bin/python2.5/python /path/to/pysvg/setup.py install

Code is shown below and final output

CREATE OR REPLACE FUNCTION svgplot_schedule(param_dataset text)
  RETURNS text AS
 $$
  from SVG import Schedule
  title = "Schedule"
  g = Schedule.Schedule(dict(
      width = 640,
      height = 480,
      graph_title = title,
      show_graph_title = True,
      key = False,
      scale_x_integers = True,
      scale_y_integers = True,
      show_data_labels = True,
      show_y_guidelines = True,
      show_x_guidelines = True,
      # show_x_title = True, # not yet implemented
      x_title = "Time",
      show_y_title = False,
      rotate_x_labels = True,
      rotate_y_labels = False,
      x_label_format = "%m/%d",
      timescale_divisions = "1 day",
      add_popups = True,
      popup_format = "%m/%d/%y",
      area_fill = True,
      min_y_value = 0,
      ))
  #convert our string of | delimited values to an array
  adataset = param_dataset.split('|')
  #pipe in converting each string to number
  g.add_data(dict(data=adataset, title="Data"))
  return g.burn()
$$ LANGUAGE plpythonu;

CREATE OR REPLACE FUNCTION plot_svgschedule_databuild(param_dataset text, param_schitem text, param_schstart date, param_schend date)
  RETURNS text AS
 $$
  if param_dataset is None : 
    return str(param_schitem) + '|' + str(param_schstart) + '|' + str(param_schend)
  return param_dataset + '|' + str(param_schitem) + '|' + str(param_schstart) + '|' + str(param_schend)
$$ LANGUAGE plpythonu;


CREATE AGGREGATE agg_svgschedule(text,date, date) (
  SFUNC=plot_svgschedule_databuild,
  STYPE=text,
  FINALFUNC=svgplot_schedule
);

##Plotting fake data but lets pretend it is real
CREATE TABLE mysvg_scheduledata(pt_id serial, pt_person text, pt_schitem text, pt_start date, pt_end date);
INSERT INTO mysvg_scheduledata(pt_person, pt_schitem, pt_start, pt_end)
    SELECT DISTINCT ON(pers, activity, startdate) pers, activity, startdate, enddate
    FROM
    (SELECT (ARRAY['Jack', 'Jill', 'Tom'])[i] As pers, 
        (ARRAY['Gym', 'Crew', 'Programming', 'Skiing'])[CAST(random()*10*j As integer) % 3 + 1] As activity, 
             CURRENT_DATE + CAST(CAST(k As text) || ' day' as interval) As startdate, 
                CURRENT_DATE + CAST(CAST(k + (CAST(random()*10*j As integer) % 3 + 1) As text) || ' day' as interval) As enddate
        FROM  generate_series(1,3) As i
                CROSS JOIN  generate_series(1,4) As j
                CROSS JOIN generate_series(1,3) As k
            ) As foo
    ORDER BY pers, activity, startdate;


#and now to test
SELECT pt_person,  agg_svgschedule(pt_schitem, pt_start, pt_end) As svggraph
 FROM (SELECT pt_person, pt_schitem, pt_start, pt_end 
        FROM mysvg_scheduledata 
            ORDER BY pt_person, pt_schitem, pt_start, pt_end) As foo
 GROUP BY pt_person
 ORDER BY pt_person;

Which gives us this:

pt_personsvggraph
Jack
Jill
Tom