In our last article we talked about String Aggregation implementing in PostgreSQL, SQL Server, and MySQL. This task is one that makes purist relational database programmers
a bit squeamish. In this article we'll talk about the reverse of that, how do you deal with data that someone hands you delimeted in a single field and that you are asked to explode or re-sort based on some lookup table.
What are the benefits of having a structure such as? :
p_name | activities
--------+--------------------------------
Dopey | Tumbling
Humpty | Cracking;Seating;Tumbling
Jack | Fishing;Hiking;Skiing
Jill | Bear Hunting;Hiking
Well for the casual programmer or simple text file database that knows nothing about JOINS and so forth, it makes it simple to pull a list of people who like Tumbling.
You simply do a WHERE ';' || activities || ';' LIKE '%;Tumbling;%'. It's great for security too because you can determine security with a simple like check and also list all the security groups a member belongs in without doing anything.
Quite easy for even the least data-skilled of programmers to work with because most procedural languages have a split function that can easily parse these into an array useful for stuffing into drop down lists and so forth. As a consultant of semi-techie people
I'm often torn by the dilemma of "What is the way I would program for myself vs. the way that provides the most autonomy to the client". By that I mean
for example I try to avoid heavy-weight things like Wizards that add additional bloated dependencies or slow the speed down of an application. These bloated dependencies may provide ease to the client but make my debugging life harder. So I weight the options
and figure out which way works now and also provides me an easy escape route should things like speed or complexity become more of an issue.
This brings us to the topic of, what is wrong with this model? It can be slow because the LIKE condition you have can't easily take advantage of an index unless using a full text index so not ideal where this is the primary filtering factor. It's also prone to pollution because
you can't easily validate that the values in the field are in your valid set of lookups or if your lookup changes, the text can be forced to change with a CASCADE UPDATE/DELETE RULE etc. In cases where this is of minor consequence
which is many if referential integrity is not high on your list of requirements, this design is not bad. It might make a purist throw up but oh well there is always dramamine to fall back on. As long as you have done your cost benefit analysis, I don't think there should be any shame of following this
less than respected route.
While you may despise this model, it has its place and it's a fact of life that one day someone will hand it to you and you may need to flip it around a bit. We shall demonstrate how to do that in this article.