SQL Math Idiosyncracies

Question: What is the answer to SELECT 3/2?

Answer: In integer math, it is 1. A lot of people especially those coming from MySQL or MS Access backgrounds are surprised to find out that in PostgreSQL 3/2 = 1. They view this as some sort of bug.

In actuality, the fact that 3/2 = 1 and 1/3 = 0 is part of the ANSI/ISO-SQL standard that states mathematical operations between two values must be of the same data type of one of the values (not necessarily the same scale and precision though). This is not some idiosyncracy specific to PostgreSQL. If you try the same operation in SQL Server, SQLite,FireBird, and some other ANSI/ISO SQL compliant databases, you will get the same results. So it seems MySQL and MS Access are the odd-balls in this arena, but arguably more practical.

Why is this an SQL Standard? We haven't found any definitive answer to that, but we have our guesses. Our guess is because it is less ambiguous (more precise) and speedier processor wise to only offer the level of accuracy specifically requested for. In terms of standards and a lot of domains (e.g. Engineering), precision is more important than accuracy. In the case of 3/2 it is not quite obvious the benefit, but say you have 1/3. MySQL displays that as .3333 (although internally its probably storing 0.333333...), MS Access displays it as 0.333333333333333. Is MS Access more right? Both are not completely accurate and its ambiguous how inaccurate they are. In the case of PostgreSQL and other ANSI/ISO databases its quite clear how accurate. They very precisely discard the remainder.

There is one particular behavior in PostgreSQL that seems somewhat contradictory to the above, and that is the way it treats Averages. It returns averages in much the same way as MySQL where as something like SQL Server or SQLite returns a truncated integer average when averaging integers. For example, lets say you have a table of all integers. If you do an Average e.g.

--Here we are using a more portable example
--instead of our preferred generate_series approach
--so it can be tested on multiple database platforms
CREATE TABLE dumnum(num integer);
INSERT INTO dumnum(num)
INSERT INTO dumnum(num)
SELECT AVG(num) as theavg, AVG(CAST(num As numeric(10,3))) as theavgm,
SUM(num)/COUNT(num) As intavg,
	4/7 As intmath, 4./7 As floatmath, 
		CAST(4./7 As numeric(10,6)) as precmath, 
        4.000/7 As floatmath2,
		CAST(4./7 As integer) As precintmath
FROM dumnum;

--For mysql the implementation of 
--CAST is a little peculiar.  
--Although MySQL happily accepts numeric and integer, int(11) as data types in table creation and converts to decimal
--It doesn't appear to do the same in CAST (e.g. you can't use numeric or integer in CAST)
--so the above example doesn't work 
--Use instead

SELECT AVG(num) as theavg, AVG(CAST(num As decimal(10,3))) as theavgm,
SUM(num)/COUNT(num) As intavg,
	4/7 As intmath, 4./7 As floatmath, 
		CAST(4./7 As decimal(10,6)) as precmath, 
        4.000/7 As floatmath2,
		CAST(4./7 As SIGNED) As precintmath
FROM dumnum;

Speaking of other databases - has anyone seen the FireFox extension for browsing and creating SQLite databases? It is extremely cute. The following tests on SQLite we ran using this FireFox SQLite management tool.

Running the above on PostgreSQL, SQL Server 2005, SQLite, FireBird, and MySQL yields the following

In terms of the number of significant digits displayed, those are more presentational issues than actual storage so all the more reason to stay away from floating point values.

One can argue that PostgreSQL,SQLite, and MySQL are really not in violation of standards here when it comes to averaging, because after all the ANSI/ISO standard talks about operations between numbers to our knowledge, not functions. So presumably Averaging as a function is left up to the implementation discretion of the database vendor. Nevertheless it is still a bit disconcerting to witness these conflicting behaviors.

Given these disparities between databases, the best thing to do when dealing with operations between numbers is to be very precise and there are a couple of ways of doing this.

Here are some guidelines.

SELECT CAST(x*1.0000/y As numeric(10,4)) As thepreciseavg, 
x*1.00000/y As lessprecisebutmoreaccurate
FROM generate_series(1,4) As x, generate_series(3,10) As y

Needless to say the various different behaviors in databases trying to conform to some not so well-defined standard, leaves one feeling a little woozy.