Table Of Contents
PostgreSQL Q & A
Creating cumulative sums by combining aggregation with windowing
When you want to include a running sum for each line of data in your query, you generally use a window clause with SUM. What if you don't want to list all line items of your data. You want a report that gives you a weekly sum and another that gives you the running sum for the whole year. What do you do then? We'll demonstrate how to do that.
The sample data
Let's start by creating a table with some sales data.
CREATE TABLE sales(id integer GENERATED ALWAYS AS IDENTITY, product varchar(100), order_date date, quantity integer, CONSTRAINT pk_sales PRIMARY KEY(id) ); INSERT INTO sales(product, order_date, quantity) WITH d AS( SELECT f.product, d.ts::date, mod(date_part('epoch', d.ts)::integer, f.ord + 73) AS quantity FROM unnest(ARRAY['pen', 'book']) WITH ORDINALITY AS f(product,ord) CROSS JOIN generate_series('2021-11-01'::timestamp, '2022-01-07'::timestamp, interval '1.2 hours') AS d(ts) WHERE mod(date_part('epoch', d.ts)::integer,7) = 1 ) SELECT * FROM d WHERE quantity > 0;
A basic monthly sum would look like this:
SELECT to_char(order_date, 'YYYY-MM') AS month, product, SUM(quantity) As month_sales FROM sales GROUP BY product, month ORDER BY product, month;
The Output of the above is:
month | product | month_sales ---------+---------+------------- 2021-11 | book | 2550 2021-12 | book | 2640 2022-01 | book | 510 2021-11 | pen | 3076 2021-12 | pen | 3134 2022-01 | pen | 614 (6 rows)
But what if you want a sales to date as of the end of the month in addition to the monthly sum. How do you include that as an additional column?
SELECT to_char(order_date, 'YYYY-MM') AS month, product, SUM(quantity) As month_sales, SUM(SUM(quantity)) OVER(PARTITION BY product ORDER BY to_char(order_date, 'YYYY-MM')) AS sales_to_date FROM sales GROUP BY product, month ORDER BY product, month;
month | product | month_sales | sales_to_date ---------+---------+-------------+--------------- 2021-11 | book | 2550 | 2550 2021-12 | book | 2640 | 5190 2022-01 | book | 510 | 5700 2021-11 | pen | 3076 | 3076 2021-12 | pen | 3134 | 6210 2022-01 | pen | 614 | 6824 (6 rows)
PostgreSQL keeps on adding new great stuff. It's hard to remember all the good stuff that has been added over the years. One of the neat ones from the past is the variadic Unnest function which I believe was introduced in PostgreSQL 9.4. It's rare that I ever had to use it, but today I was handed some data where this function was just literally what the doctor ordered. I can't do anything this sweet in other databases I have used.
Many people have seen the standard unnest used to unnest things like arrays or other component things. But did you know unnest is variadic so can take an unlimited number of arguments and keeps them balanced. If you have an array with 3 things and another with 2, you get a set with 3 things, not 6.
Observe the difference between this which gives you 6 rows
SELECT a, b FROM unnest(ARRAY['a', 'b', 'c']) AS a, unnest(ARRAY[1,2]) AS b;
And this which gives you 3 rows
SELECT a, b FROM unnest(ARRAY['a', 'b', 'c'], ARRAY[1,2] ) AS f(a, b);
I was handed a list of practices from a google spreadsheet where each row consisted of a single medical practice. And stuffed in those rows were all the doctors that worked in such a practice. But their names and titles were in different columns. It is not for me to ask why people enter data this way, but luckily PostgreSQL can make sense of it and explode it.
The raw data
My original table looked like this:
CREATE TABLE practices(practice_name text primary key, emp_name text, emp_credentials text, emp_specialty text); INSERT INTO practices(practice_name, emp_name, emp_credentials, emp_specialty) VALUES ('Seuss Pediatrics', 'Theophrastus Seuss Theodore Giesel', 'MD RN', 'Pediatrics Oncology' ), ('County General', 'Ben Casey David Zorbal', 'MD MD PhD', 'Neurosurgery Neurosurgery' ), ('Practice Of None', NULL, NULL, NULL);
Printed out looks like this:
practice_name | emp_name | emp_credentials | emp_specialty ------------------+--------------------+-----------------+--------------- Seuss Pediatrics | Theophrastus Seuss+| MD +| Pediatrics + | Theodore Giesel | RN | Oncology County General | Ben Casey +| MD +| Neurosurgery + | David Zorbal | MD PhD | Neurosurgery Practice Of None | | | (3 rows)
Normalizing the data
And I needed to normalize it such that I have one person per row and have at least one row per practice even if no one works there.
SELECT p.practice_name, e.name, e.cred, e.spec FROM practices AS p LEFT JOIN LATERAL UNNEST( string_to_array(emp_name, E'\n'), string_to_array(emp_credentials, E'\n'), string_to_array(emp_specialty, E'\n') ) AS e(name, cred, spec) ON true;
Which outputs a table that looks like this:
practice_name | name | cred | spec ------------------+--------------------+--------+-------------- Seuss Pediatrics | Theophrastus Seuss | MD | Pediatrics Seuss Pediatrics | Theodore Giesel | RN | Oncology County General | Ben Casey | MD | Neurosurgery County General | David Zorbal | MD PhD | Neurosurgery Practice Of None | | | (5 rows)
If you don't want to include practices with no staff, you can shorten the code a bit like so.
Side note the
, is short for
CROSS JOIN. I like to spell it out, but people accuse me of being too pedantic about such things. Also note the below is still a LATERAL JOIN, but the word LATERAL is optional for functions.
SELECT p.practice_name, e.name, e.cred, e.spec FROM practices AS p, UNNEST( string_to_array(emp_name, E'\n'), string_to_array(emp_credentials, E'\n'), string_to_array(emp_specialty, E'\n') ) AS e(name, cred, spec) ;
Using PostgreSQL Extensions
PostgreSQL 15 64-bit for Windows FDWs
We are pleased to provide binaries for file_textarray_fdw and odbc_fdw for PostgreSQL 15 Windows 64-bit.
To use these, copy the files into your PostgreSQL 15 Windows 64-bit install folders in same named folders and then run CREATE EXTENSION as usual in the databases of your choice. More details in the packaged README.txt
- Windows 64-bit package fdw_win64_15_bin.zip fdw_win64_15_bin.7z
This package contains the following FDWs:
- odbc_fdw (version 0.5.2.3) for connecting to ODBC data sources such as SQL Server, Oracle, MS Access databases, and anything else that has a 64-bit ODBC driver. Note that since this is for PostgreSQL 64-bit, it can only use ODBC 64-bit connections.
- file_textarray_fdw, great and fast for working with arbitrary and pesky delimited data. Especially where they weren't considerate enough to give you the same number of columns per row.
Note this package does not include ogr_fdw since ogr_fdw is packaged as part of PostGIS packages from EnterpriseDb Stackbuilder (for PostGIS >= 3.0) .
If you do not have PostGIS installed (and don't want to for some reason) and want to use ogr_fdw on windows, you can download from: Winnie's PG 15 Extras. ogr_fdw is a great FDW for querying not just spatial data, but also a ton of other file formats or relational (including odbc, dbase files, spreadsheets) since spatial is a superset.