SQL Coding Standards To Each His Own Part II

Both Josh Berkus and Hubert made blog entries about our last excerpt. In general I will start off by saying that we are more or less in agreement on what is good SQL coding format.

Here are the things I think we can all agree on

The major points of contention I think are

So let us start by looking at Josh's last SQL statement and compare to how we would write it. Josh I hope you didn't come up with this table structure. We personally hate the use of id as a field identifier for every table, but that's another story. We prefer something like forum_id, post_id etc. and whats the deal with giving a field name of date. Stay away from data types as field names.

-- Josh
SELECT persons.id, persons.first_name, persons.last_name, forums.category,
   COUNT(DISTINCT posts.id) as num_posts,
   COALESCE(MAX(comments.rating), 0) AS highest_rating,
   COALESCE(MIN(comments.rating), 0) AS lowest_rating
FROM persons JOIN posts ON persons.id = posts.author
   JOIN forums on posts.forum = forums.id
   LEFT OUTER JOIN comments ON posts.id = comments.post
WHERE persons.status > 0
   AND forums.ratings = TRUE
   AND comments.date > ( now() - INTERVAL '1 year')
GROUP BY persons.id, persons.first_name, persons.last_name, forums.category
HAVING count(DISTINCT posts.id) > 0
ORDER BY persons.last_name, persons.first_name; 


-- Us

-- Leo tends to prefer upper case alias for table lower case or proper case for fields.  
-- Leo also likes to see his SELECT fields across one line regardless how many because 
-- he has 2 wide screen ubber huge monitors and wants to take full advantage of them
-- I couldn't demonstrate this without messing up the format of the page.

SELECT P.id, P.first_name, P.last_name, F.category, COUNT(DISTINCT PO.id) as num_posts, 
  COALESCE(MAX(C.rating), 0) AS highest_rating, COALESCE(MIN(C.rating), 0) AS lowest_rating
FROM persons AS P INNER JOIN posts AS PO ON P.id = PO.author
   INNER JOIN forums As F ON PO.forum = F.id
   LEFT JOIN comments AS C ON PO.id = C.post
WHERE P.status > 0
   AND F.ratings = TRUE
   AND C.date > ( CURRENT_TIMESTAMP - INTERVAL '1 year')
GROUP BY P.id, P.first_name, P.last_name, F.category
HAVING COUNT(DISTINCT PO.id) > 0
ORDER BY P.last_name, P.first_name; 


-- Regina flip flops and often depends on database 
-- she is working with that day or moment
-- whether she upper cases or lower cases or proper cases.
-- It must be noted that Regina also has 2 wide  ubber screens to work with
-- because Leo believes every programmer should have at least 36 inches to work with.
-- Once she finally gets adjusted to the enormity of these things, 
-- she will use the space to keep 20 applications open at the same time and train her mind 
-- for massive parallel concentration and get 20 times more work done
SELECT pe.id, pe.first_name, pe.last_name, f.category, COUNT(DISTINCT po.id) AS num_posts,
   COALESCE(MAX(c.rating), 0) AS highest_rating, COALESCE(MIN(c.rating), 0) AS lowest_rating
FROM persons AS pe INNER JOIN posts AS po ON pe.id = po.author
   INNER JOIN forums AS f ON po.forum = f.id
   LEFT JOIN comments AS c ON po.id = c.post
WHERE pe.status > 0
   AND f.ratings = TRUE
   AND c.date > ( CURRENT_TIMESTAMP - INTERVAL '1 year')
GROUP BY pe.id, pe.first_name, pe.last_name, f.category
HAVING COUNT(DISTINCT po.id) > 0
ORDER BY pe.last_name, pe.first_name; 

I will start off by saying for the above example -- Josh is lucky because his table names are naturally short and we are disadvantaged in this example because two tables start with the same letter. Imagine if the table names were much more descriptive how repetitive and noisy Josh's scene would become.

The reason we prefer short aliases over long is that it allows you to fit more fields on the same line and scan them quickly and still prefix with the table names. Leo prefers upper case his table because he can quickly spot where a table name ends and field begins. Generally speaking you usually have 4 tables or fewer in a statement, the FROM clause is not that far away as a legend, and the tables usually start with different letters, so its not a large mind cramp for us to remember F is for Forum and C is for Comments.

Unlike other languages, you then need to keep on repeating this variable over and over again which is much more typing as well as just makes your statement that much longer to digest.

As a last gripe to what Josh said. Just because you have 255 characters at your disposal as opposed to the 8 you once had, doesn't mean you should feel compelled to use them all. Long live Fortran. It is older than any language I can think of including COBOL and has stood the test of time and is still undergoing innovation.