In Defense of varchar(x)

This is a rebuttal to depesz's charx, varcharx, varchar, and text and David Fetter's varchar(n) considered harmful. I respect both depesz and David and in fact enjoy reading their blogs. We just have deferring opinions on the topic.

For starters, I am pretty tired of the following sentiments from some PostgreSQL people:

I don't like these sentiments because, it gives the perception of PostgreSQL users as a bunch of prissy people who expect everyone to step up to their level of misguided sophistication. We shouldn't cater to the lowest common denominator always, but when the lowest common denominator approach is easier to write, easier to read, and achieves the same purpose, why should people be forced to do it any other way. Domains and triggers are great things, but they are overkill in most cases and are harder to inspect.

I think depesz nails it on the head when he points out the main reason why varchar(x) is worse than text in PostgreSQL. It is because if you have a big table, changing the size of a varchar field takes eons in PostgreSQL. This is not so much the case with other databases we have worked with. To me this is not a problem with everyone else, but a problem with PostgreSQL, and something that should be improved on. In fact you would think this would be simpler in PostgreSQL than any other database since varchar and text are implemented essentially the same. The main difference it seems is that a text field always results in creation of an accompanying toast table where as varchar doesn't always.

Instead newcomers are patted on the head and told You young'in, just use text and throw a domain around it if you want to limit size.

I am not going to write a domain or trigger everytime I have to limit the size of a field, for several reasons and listed in priority.

  1. Easily to access and self-enforcing meta data is important. In many cases, its more important than performance.

    When I look at a CREATE TABLE statement or open up a table in a designer, the size there gives a sense of what goes in there. One can query practically any databases system tables (and the ANSI compliant ones have information_schema.columns) and get the size of a field with a single query. This simplicity in portability is a beautiful thing. This simplicity is important for people developing cross database tools. The size gives a sense of what is safe to select and what is not. The size gives auto form generators a sense of the width to make a field and limits to set on what can be put in it. It allows the tool to quickly determine whether to put in a regular text box or a text area. varchar(...) forces everyone to state this exactly the same across all databases. I don't need to figure out how to make sense of the logic embedded in a domain object or trigger or the fact that people can state the same thing differently with domains. If you don't care about the ease with which tool developers can extend their tools to work with your DBMS or making the common place needs easy, then you don't care about growing your community.

  2. Using domains when you are talking about 20 fields is way too much typing if all the fields need to be different sizes. varchar(x) has a semantic meaning that is easy to define in 2 characters or less. Domains do not.
  3. And to David -- "In the real world things have min sizes as well as max sizes" Yes, but maxlength in most cases is the most important thing. So important that it dwarfs the relevancy of other minutia. Most cases we don't bother with all the other nuances because it clutters the most important thing MAX LENGTH. Less is better than more. I don't want people stuffing the kitch sink in a first name field. Many people like me have t100s of first names they got in a naming ceremony. If they want to call themselves ! or @ or some 4 letter word, I really don't care, but if you are known by a 100 different names, I really only want to know about one or two of them, not all of them :)
  4. I am a lazy control freak who loves self-documenting structures. I like order. I like things done consistently and I don't expect to work hard to get that. Being a lazy control freak means I have more time to trick others and machines into doing my work for me so I can lounge around sipping long island iced teas :).

So in short I expect my relational database, not only to provide me good performance, but to provide me structure and ease of use as well. I like my crappy tools. My tools do what they are designed to do and I expect PostgreSQL to do the same without me having to jump thru hoops. I want easy walls I can put up and tear down. I choose my foreign key constraints and my field sizes carefully -- because they do more than just perform well. They are self-documenting objects that can draw themselves on a screen and are always consistent with the simple rules I set up. If I set a cascade update/delete, it means I expect the child data to go away when the parent does. If I set a foreign key with restrict delete, it means if the child has data, then the parent is way too important to be allowed to be deleted lightly. If I use a varchar(x) -- it means I want a max of x characters in that field. If I use a char(n), it means I expect the data to be exactly (n) characters long - which admittedly is rare.