This is a rebuttal to depesz's charx, varcharx, varchar, and text
and David Fetter's varchar(n) considered harmful.
I respect both depesz and David and in fact enjoy reading their blogs. We just have deferring opinions on the topic.
For starters, I am pretty tired of the following sentiments from some PostgreSQL people:
- 99% of the people who choose varchar(x) over text in PostgreSQL in most cases
are just ignorant folk and don't realize that text is just as fast if not faster than varchar in PostgreSQL.
- stuff your most despised database here compatibility is not high on my priority list.
- It is unfortunate you have to work with the crappy tools you work with that can't see the beauty in PostgreSQL text implementation.
Just get something better that treats PostgreSQL as the superior creature it is.
I don't like these sentiments because, it gives the perception of PostgreSQL users as a bunch of prissy
people who expect everyone to step up to their level of misguided sophistication.
We shouldn't cater to the lowest common denominator always, but when the lowest common denominator
approach is easier to write, easier to read, and achieves the same purpose, why should people be forced to do it any other way.
Domains and triggers are great things, but they are overkill in most cases and are harder to inspect.
I think depesz nails it on the head when he points out the main reason why varchar(x) is worse than text in PostgreSQL.
It is because if you have a big table, changing the size of a varchar field takes eons in PostgreSQL. This is not so much the case with other databases we have worked with.
To me this is not a problem with everyone else, but a problem with PostgreSQL, and something that should be improved on. In fact you would think this would be simpler in
PostgreSQL than any other database since varchar and text are implemented essentially the same. The main difference it seems is that a text field always results in creation of an accompanying toast table
where as varchar doesn't always.
Instead newcomers are patted on the head and told
You young'in, just use text and throw a domain around it if you want to limit size.
I am not going to write a domain or trigger everytime I have to limit the size of a field, for several reasons
and listed in priority.
Easily to access and self-enforcing meta data is important. In many cases, its more important than performance.
When I look at a CREATE TABLE statement or open up a table in a designer, the size there gives a sense of what goes in there. One can
query practically any databases system tables (and the ANSI compliant ones have information_schema.columns) and get the size of a field
with a single query. This simplicity in portability is a beautiful thing. This simplicity is important for people developing cross database tools.
gives a sense of what is safe to select and what is not. The size gives auto form generators a sense of the width to make a field and limits to set
on what can be put in it. It allows the tool to quickly determine
whether to put in a regular text box or a text area.
varchar(...) forces everyone to state this exactly the same across all databases. I don't need to figure
out how to make sense of the logic embedded in a domain object or trigger or the fact that people can state the same thing differently with domains.
If you don't care about the ease with which tool developers can extend their tools to work with your DBMS or making
the common place needs easy, then you don't
care about growing your community.
- Using domains when you are talking about 20 fields is way too much typing if all the fields need to be different sizes.
varchar(x) has a semantic meaning that is easy to define in 2 characters or less. Domains do not.
- And to David -- "In the real world things have min sizes as well as max sizes"
Yes, but maxlength in most cases is the most important thing. So important that it dwarfs the relevancy of other minutia.
Most cases we don't bother with all the other nuances because it clutters the most important thing MAX LENGTH. Less is better than more.
I don't want people stuffing
the kitch sink in a first name field. Many people like me have t100s of first names they got in a naming ceremony.
If they want to call themselves ! or @ or some 4 letter word, I really don't care,
but if you are known by a 100 different names, I really only want to know about one or two of them, not all of them :)
- I am a lazy control freak who loves self-documenting structures. I like order. I like things done consistently and I don't expect to work hard to get that. Being
a lazy control freak means I have more time to trick others and machines into doing my work for me so I can lounge around sipping long island iced teas :).
So in short I expect my relational database, not only to provide me good performance, but to provide me structure and ease of use as well.
I like my crappy tools. My tools
do what they are designed to do and I expect PostgreSQL to do the same without me having to jump thru hoops.
I want easy walls I can put up and tear down. I choose my foreign key constraints and my field sizes carefully
-- because they do more than just perform well. They are self-documenting objects that can draw themselves on a screen and are always
consistent with the simple rules I set up.
If I set a cascade update/delete, it means I expect the child data to go away when the parent does. If I set a foreign
key with restrict delete, it means if the child has data, then the parent is way too important to be allowed to be deleted lightly.
If I use a varchar(x) -- it means I want a max of x characters in that field.
If I use a char(n), it means I expect the data to be exactly (n) characters long - which admittedly is rare.