Both Josh Berkus and Hubert made blog entries about our last excerpt. In general I will start off by saying that we are more or less in agreement on what is good SQL coding format.
Here are the things I think we can all agree on
- SQL Keywords should be upper cased
- Field names should be prefixed with their tables especially when you have a multi-join statement involved
- Use JOIN syntax instead of stuffing everything in the WHERE though we are highly against just saying JOIN. We prefer INNER JOIN
The major points of contention I think are
- Should you use aliases over tables and if you use aliases should you keep them short or more descriptive. Josh thinks table names should be used where possible and when aliases are used they should be longer than a few characters
and Hubert agrees with us that short aliases are fine and in fact desirable. I think we all agree aliases should be meaningful when used, but our idea of what it means to be meaningful is a little different.
- In use of JOIN syntax -- we prefer using INNER JOIN instead of using just JOIN and in fact find it quite irritating that PostgreSQL rewrites our INNERs as JOIN. I suspect Hubert and Josh and many other PostgreSQL folk are on
the other side of the fence on this. The reason we feel strongly about this is there are so many kinds of JOINS - INNER JOIN, RIGHT JOIN, LEFT JOIN, CROSS JOIN, FULL JOIN, and the horrid NATURAL JOIN (that should be shot and put out of its misery). To just say JOIN to us is just confusing.
- While you can write LEFT OUTER JOIN, the OUTER is kind of pointless because no one goes around writing LEFT INNER JOINS
- Use well supported standards where possible. This means CURRENT_TIMESTAMP instead of now(). now() is not in all databases, but most relational databases support CURRENT_TIMESTAMP
So let us start by looking at Josh's last SQL statement and compare to how we would write it.
Josh I hope you didn't come up with this table structure. We personally hate the use of id as a field identifier for every table, but that's another story. We prefer something like forum_id, post_id etc. and whats the deal with giving a field
name of date. Stay away from data types as field names.
SELECT persons.id, persons.first_name, persons.last_name, forums.category,
COUNT(DISTINCT posts.id) as num_posts,
COALESCE(MAX(comments.rating), 0) AS highest_rating,
COALESCE(MIN(comments.rating), 0) AS lowest_rating
FROM persons JOIN posts ON persons.id = posts.author
JOIN forums on posts.forum = forums.id
LEFT OUTER JOIN comments ON posts.id = comments.post
WHERE persons.status > 0
AND forums.ratings = TRUE
AND comments.date > ( now() - INTERVAL '1 year')
GROUP BY persons.id, persons.first_name, persons.last_name, forums.category
HAVING count(DISTINCT posts.id) > 0
ORDER BY persons.last_name, persons.first_name;
-- Leo tends to prefer upper case alias for table lower case or proper case for fields.
-- Leo also likes to see his SELECT fields across one line regardless how many because
-- he has 2 wide screen ubber huge monitors and wants to take full advantage of them
-- I couldn't demonstrate this without messing up the format of the page.
SELECT P.id, P.first_name, P.last_name, F.category, COUNT(DISTINCT PO.id) as num_posts,
COALESCE(MAX(C.rating), 0) AS highest_rating, COALESCE(MIN(C.rating), 0) AS lowest_rating
FROM persons AS P INNER JOIN posts AS PO ON P.id = PO.author
INNER JOIN forums As F ON PO.forum = F.id
LEFT JOIN comments AS C ON PO.id = C.post
WHERE P.status > 0
AND F.ratings = TRUE
AND C.date > ( CURRENT_TIMESTAMP - INTERVAL '1 year')
GROUP BY P.id, P.first_name, P.last_name, F.category
HAVING COUNT(DISTINCT PO.id) > 0
ORDER BY P.last_name, P.first_name;
-- Regina flip flops and often depends on database
-- she is working with that day or moment
-- whether she upper cases or lower cases or proper cases.
-- It must be noted that Regina also has 2 wide ubber screens to work with
-- because Leo believes every programmer should have at least 36 inches to work with.
-- Once she finally gets adjusted to the enormity of these things,
-- she will use the space to keep 20 applications open at the same time and train her mind
-- for massive parallel concentration and get 20 times more work done
SELECT pe.id, pe.first_name, pe.last_name, f.category, COUNT(DISTINCT po.id) AS num_posts,
COALESCE(MAX(c.rating), 0) AS highest_rating, COALESCE(MIN(c.rating), 0) AS lowest_rating
FROM persons AS pe INNER JOIN posts AS po ON pe.id = po.author
INNER JOIN forums AS f ON po.forum = f.id
LEFT JOIN comments AS c ON po.id = c.post
WHERE pe.status > 0
AND f.ratings = TRUE
AND c.date > ( CURRENT_TIMESTAMP - INTERVAL '1 year')
GROUP BY pe.id, pe.first_name, pe.last_name, f.category
HAVING COUNT(DISTINCT po.id) > 0
ORDER BY pe.last_name, pe.first_name;
I will start off by saying for the above example -- Josh is lucky because his table names are naturally short and we are
disadvantaged in this example because two tables start with the same letter. Imagine if the table names were much more descriptive how repetitive and noisy Josh's scene would become.
The reason we prefer short aliases over long is that it allows you to fit more fields on the same line and scan them quickly and still prefix with the table names.
Leo prefers upper case his table because he can quickly spot where a table name ends and field begins. Generally speaking you usually have 4 tables or fewer in a statement, the FROM clause is not that far away
as a legend, and the tables
usually start with different letters, so its not a large mind cramp for us to remember F is for Forum and C is for Comments.
Unlike other languages, you then need to keep on repeating this variable
over and over again which is much more typing as well as just makes your statement that much longer to digest.
As a last gripe to what Josh said. Just because you have 255 characters at your disposal as opposed to the 8 you once had, doesn't mean you should feel compelled to use them all. Long live Fortran. It is older than
any language I can think of including COBOL and has stood the test of time and is still undergoing innovation.