VARIADIC Unnest

PostgreSQL keeps on adding new great stuff. It's hard to remember all the good stuff that has been added over the years. One of the neat ones from the past is the variadic Unnest function which I believe was introduced in PostgreSQL 9.4. It's rare that I ever had to use it, but today I was handed some data where this function was just literally what the doctor ordered. I can't do anything this sweet in other databases I have used.

Many people have seen the standard unnest used to unnest things like arrays or other component things. But did you know unnest is variadic so can take an unlimited number of arguments and keeps them balanced. If you have an array with 3 things and another with 2, you get a set with 3 things, not 6.

Observe the difference between this which gives you 6 rows


SELECT a, b
FROM unnest(ARRAY['a', 'b', 'c']) AS a, unnest(ARRAY[1,2]) AS b;

And this which gives you 3 rows


SELECT a, b
FROM unnest(ARRAY['a', 'b', 'c'], ARRAY[1,2] ) AS f(a, b);

I was handed a list of practices from a google spreadsheet where each row consisted of a single medical practice. And stuffed in those rows were all the doctors that worked in such a practice. But their names and titles were in different columns. It is not for me to ask why people enter data this way, but luckily PostgreSQL can make sense of it and explode it.

The raw data

My original table looked like this:


CREATE TABLE practices(practice_name text primary key, emp_name text, 
                                        emp_credentials text, emp_specialty text);
INSERT INTO practices(practice_name, emp_name, emp_credentials, emp_specialty)
VALUES 
('Seuss Pediatrics',  
'Theophrastus Seuss
Theodore Giesel', 
'MD
RN',
'Pediatrics
Oncology' ),
('County General',  
'Ben Casey
David Zorbal', 
'MD
MD PhD',
'Neurosurgery
Neurosurgery' ),
('Practice Of None', NULL, NULL, NULL);

Printed out looks like this:

 practice_name   |      emp_name      | emp_credentials | emp_specialty
------------------+--------------------+-----------------+---------------
 Seuss Pediatrics | Theophrastus Seuss+| MD             +| Pediatrics   +
                  | Theodore Giesel    | RN              | Oncology
 County General   | Ben Casey         +| MD             +| Neurosurgery +
                  | David Zorbal       | MD PhD          | Neurosurgery
 Practice Of None |                    |                 |
(3 rows)

Normalizing the data

And I needed to normalize it such that I have one person per row and have at least one row per practice even if no one works there.


SELECT p.practice_name, e.name, e.cred, e.spec
FROM practices AS p
    LEFT JOIN 
    LATERAL 
        UNNEST( string_to_array(emp_name, E'\n'),
                string_to_array(emp_credentials, E'\n'),
              string_to_array(emp_specialty, E'\n') )  AS e(name, cred, spec) ON true;

Which outputs a table that looks like this:

 practice_name   |        name        |  cred  |     spec
------------------+--------------------+--------+--------------
 Seuss Pediatrics | Theophrastus Seuss | MD     | Pediatrics
 Seuss Pediatrics | Theodore Giesel    | RN     | Oncology
 County General   | Ben Casey          | MD     | Neurosurgery
 County General   | David Zorbal       | MD PhD | Neurosurgery
 Practice Of None |                    |        |
(5 rows)

If you don't want to include practices with no staff, you can shorten the code a bit like so. Side note the , is short for CROSS JOIN. I like to spell it out, but people accuse me of being too pedantic about such things. Also note the below is still a LATERAL JOIN, but the word LATERAL is optional for functions.


SELECT p.practice_name, e.name, e.cred, e.spec
FROM practices AS p, 
        UNNEST( string_to_array(emp_name, E'\n'),
                string_to_array(emp_credentials, E'\n'),
              string_to_array(emp_specialty, E'\n') )  AS e(name, cred, spec) ;