One of the main features I love about PostgreSQL is its array support. This is a
feature you won't find in most relational databases, and even databases that support some variant
of it, don't allow you to use it as easily. It is one of the features that makes building
aggregate functions wicked easy in PostgreSQL with no messy compiling required.
Aside from building aggregate functions, it has some other common day uses.
In this article, I'll cover two common ways we use them which I will refer to as the ANY
and Contains tricks.
I like to think of this approach as YeSQL programming style: how SQL can be augmented by more complex data types and index retrieval mechanisms.
Arrays and many other data types (spatial types, keyvalue (hstore), ltree etc) are far from relational structures, yet we can query them easily with SQL and can even relate them.
Using comma separated items in an SQL ANY clause
Ever have a piece of text like this: apple,cherry apple,avocado
or a set of integer ids like this 1,5,6
which perhaps you got from a checkbox picklist? You need to find out the details of the
chosen products.
Well the first steps is to convert your string to an array like so:
SELECT '{apple,cherry apple, avocado}'::text[];
If it's a set of integers you would do.
SELECT '{1,4,5}'::int[];
Which converts your list to an array of: apple,"cherry apple",avocado
or 1,4,5
Now you combine it with your detail query
Let's say your data looked something like this:
CREATE TABLE products(
product_id serial PRIMARY KEY
,product_name varchar(150)
,price numeric(10,2) ) ;
INSERT INTO products(product_name, price)
VALUES
('apple', 0.5)
,('cherry apple', 1.25)
,('avocado', 1.5),('octopus',20.50)
,('watermelon',2.00);
Now to get the details about the products the user selected
SELECT product_name, price
FROM products
WHERE
product_name = ANY('{apple,cherry apple,avocado}'::text[]);
or if you were using ids
SELECT product_name,price
FROM products
WHERE
product_id = ANY('{1,4,5}'::int[]);
Let us say we needed to convert the id array back to a comma delimeted list. This will work in pretty much any version
of PostgreSQL
SELECT array_to_string(ARRAY(SELECT product_name
FROM products
WHERE
product_id = ANY('{1,4,5}'::int[]) ), ',') As prod_list;
Which will give you an output: apple,octopus,watermelon
Now if you are using PostgreSQL 9.0 or higher. You can take advantage
of the very cool string_agg function and combine that with the even cooler ORDER BY of aggregate functions
if you want your list alphabetized. The string_agg approach comes in particularly handy if you are dealing with not one user request but a whole table of
user requests. The string_agg equivalent looks like below --NOTE: it's not much shorter, but is if you are dealing with
sets and can employ the GROUP BY person_id or some such clause. It will also be faster in many cases since it can scan the data
in one step instead of relying on sub queries:
SELECT string_agg(product_name, ',' ORDER BY product_name) As prod_list
FROM products
WHERE product_id = ANY('{1,4,5}'::int[]) ;
Using arrays in column definitions and Contains @> operator
Of course for casual lists, arrays are great for data storage as well. We use them
with caution when portability to other databases is a concern or referential integrity is
a concern. Still they have their place.
Let's say we have a restaurant matching system where the user selects from a
menu of foods they'd like to eat and we try to find a restaurant that has all those
foods. Our very casual database system looks like this.
Our objective is to write the shortest application code we can get away with that is still fairly efficient.
The foods table would be used just as a lookup for our pick list:
CREATE TABLE food(food_name varchar(150) PRIMARY KEY);
INSERT INTO food (food_name)
VALUES ('beef burger'),
('veggie burger'), ('french fries'),
('steak'), ('pizza');
CREATE TABLE restaurants(id serial PRIMARY KEY,
restaurant varchar(100), foods varchar(150)[]);
CREATE INDEX idx_restaurants_foods_gin
ON restaurants USING gin (foods);
INSERT INTO restaurants(restaurant, foods)
VALUES
('Charlie''s', '{beef burger,french fries,pizza}'::varchar[]),
('Rinky Rink', '{beef burger,veggie burger}'::varchar[]),
('International', '{beef burger,spring rolls,egg rolls,pizza}'::varchar[]);
Our application allows users to pick foods they want for a meal and
matches them up with all restaurants that have those items. For this trick we use the array contains operator
to return all restaurants that contain in their food list all the foods the user wants to partake of in this meal.
Our resulting query
would look something like:
SELECT restaurant
FROM restaurants
WHERE foods @> '{beef burger, pizza}'::varchar[];
Which outputs the below. Our dataset is so small that the index doesn't kick in, but if we have a 10000 or more restaurants we'd see the
GIN index doing its magic.
restaurant
---------------
Charlie's
International
As a final food for thought, PostgreSQL supports multi-dimensional arrays as well which has some interesting uses as well. Perhaps we'll
delve into some examples of multi-dimensional arrays in another article.