Whenever you need to split a text into multiple records breaking by some delimeter, there are two common options that PostgreSQL provides. The first is
regpexp_split_to_table and then next popular is using the unnest function in combination with string_to_array.
Here is an example using regexp_split_to_table:
FROM regexp_split_to_table('john,smith,jones', ',') AS a;
You can achieve the same result by using the construct:
FROM unnest(string_to_array('john,smith,jones', ',')) AS a;
With short text you won't notice much perfomance difference. But what happens if we pass in a humungous text?
We'll create a table with one row with one text column. The text column contains 500,000 characters with line breaks thrown in.
DROP TABLE IF EXISTS sample_data;
CREATE UNLOGGED TABLE sample_data AS
SELECT string_agg(CASE WHEN mod(i,64) = 0 THEN E'\n' ELSE CHR(64 + mod(i,64)) END,'') AS data
FROM generate_series(1, 500000) i;
We'll first use regexp_split_to_table to split the single row into multiple rows:
DROP TABLE IF EXISTS sample_data_rows;
CREATE UNLOGGED TABLE sample_data_rows AS
SELECT regexp_split_to_table(data, E'\n')
Observe that unnest(string_to_array) is orders of magnitude faster than the equivalent regexp_spit_to_table.
As the text gets bigger, regexp_split_to_table gets exponentially worse. For example, if you had a 250,000 piece of text, regexp_spit_to_table would take
5.7 secs, compared to the 22.3 secs for the 500,000 character text example.
There are many cases where unnest(string_to_array) can't substitute for regexp_split_to_table.
You need to use regexp_split_to_table in cases where you
have to split by a non-simplistic delimeter, such as a sequence of any specific characters. For example if you wanted to split whenever you have a sequence of spaces or commas as in the case of the following:
FROM regexp_split_to_table('john smith,jones', E'[\\s,]+') AS a;
There is no way to achieve the same result with unnest(string_to_array).