Saturday, December 03. 2011
A lot of redditers took offense at our article XPathing XML data with PostgreSQL with the general consensus, if you are going to be stuffing XML in a relational database where will you stop? That is not what relational databases are designed for. We had comitted a sacrilegious sin and worsed yet encouraging bad habits by forcing people to think more about different options they have for storing data in a relational database and god forbid demonstrating querying such columns with xml specific functions. What were we thinking? How dare we try to query XML data with SQL? Perhaps we were thinking like this guy or this guy, both equally misguided spatial relational database folk. Of course we stepped one foot further by actually defining a column as xml and dare storing data in it for later consumption rather than just an intermediary step.
If I want to store documents, that are navigateable I should be using a document database like MongoDb, CouchDB etc designed for that kind of stuff. If I've got graphs I should be using a graph database. This got me thinking that the "Pure Relational Database" is dead, and I'm surprised most people don't seem to realize it.
So while "Relational databases" have changed over the last 25 years, most people's notions of them have not kept up with the pace of its change.
First let me define what I mean by Pure. A pure relational database is one with standard meat and potato types like text, dates, numbers well suited for counting money and computing how close the world is to total bankruptcy which you store as fields in a row of a table and that you then define foreign keys / constraints / primary keys on to relate them to other tables. You reconstitute your real world objects by stitching these tables together with joins etc and return sets using where conditions, summarize by using group bys or other mathy like constructs. Don't get me wrong; these are very beautiful things because they allow for easy slicing of dimensions and not having to think about all the dimensions that make up an object all at once. In fact it was so beautiful that some people thought, "wow that's cool, but it would be even cooler if I could store more complex objects in those columns with their own specific needs for querying." and so was born the object relational database as some people refer to them that thought relational but also understood that different types had their own unique needs for querying, storage, indexing etc.
Nowadays most, if not all, relational like databases have standardized on some variant of SQL. In essence though, the pure relational database doesn't allow you to define new types or have exotic types such as arrays, xml, graphs, geometries, rasters, sparse matrices etc. Much less thinking involved and less likely you will shoot yourself in the foot by dumping a bunch of xml in a field and trying to do something with it. When it is used to store more complex things such as spreadsheets and other user documents, these are stored as blobs and just retrieved. Even such use is frowned upon.
Well most relational databases I can think of nowadays have richer types: e.g. PostgreSQL, Oracle and Firebird all support arrays as a column type. Some even allow you to define custom types and functions to support your custom types e.g. PostgreSQL (I could go on forever), Oracle has rich user defined type support too, and SQL Server 2005+ with each version getting better and better for user defined custom types and introducing more exotic types and support infrastructure. Even MySQL/Drizzle (mostly in the form of different storage engines). Even my favorite light-weight SQLite under the hood has some tricks that aren't what I would call relational. E.g. Spatialite/RasterLite has a whole geometry type library built on SQLite with functions you can call from SQL and I'm sure there are lots of middleware tools you don't know about using the SQLite and Firebird engine for more than relational tasks (e.g. HTML5 anyone/ CAD anyone).
Posted by Leo Hsu and Regina Obe in editor note, firebird, mysql, oracle, sql server, sqlite at 11:30 | Comments (12) | Trackbacks (0)
Related entries by tags:
Tracked: Dec 11, 12:52
Tracked: Jan 04, 17:36
Display comments as (Linear | Threaded)
You're just pandering to the ignorant, including yourselves. The most logically sparse, and physically sparse, datastore is 5NF. With SSD as prime storage, there's negligible delay in reading/writing.
The xml weenies are hung up on a fallacy: that the "natural" data structure is the tree. They often offer the organizational chart as the paradigm. They are sufficiently ignorant of the real world to skip the fact that real organizations don't fit the paradigm: there's always some for of "matrix management", wherein each worker bee has more than one supervisor. Organization are relational.
As to documents: so what? If you want to through logic out the window, and replace it with mega-lines of app code to manage the data (just like your grandfather did with COBOL/VSAM), go ahead. Just don't lie about what you're doing. You are not progressing, but retrograding to a technology that was bad 40 years ago.
I have no idea what you are arguing about, but I do resent being called ignorant.
I suspect you misunderstood the point of this article and are getting flamed over my use of XML as an example data type as others seem to have harped on. To be honest I rarely store data as xml and when I do I think I've got perfectly good reasons for doing so. I do use other types like PostGIS geometries, ltrees and so forth much more frequently.
My point was that most modern relational databases support all sorts of data types and allow you to define new ones and have perfectly good facilities for handling them (e.g. storage /indexing / specialty functions and operators), so don't go around reinventing database technology every time you want to define a matrix or graph or what not which seems to be the preferred solution nowadays.
Those who cannot remember the past are condemned to repeat it.
-- George Santayana
By pandering to those who would promote xml (or other hierarchical datastore), to primary structure, one is reactionary and ignores what Dr. Codd figured out. It's that simple. I stand with Pascal in that regard. *No one* has yet specified a smarter datastore than the Relational Model. These alternatives share a singular characteristic: they involve lots mo code in applications. For coders, that's the gravy; not so much for the applications, though.
All of the alternatives which some now embrace, are failed predecessors to the RM. IDMS was the original network database, and IMS was the original hierarchical database. Dr. Codd understood their failure, and invented the RM. xml, as datastore, is just a bad copy of IMS. There's no good reason to promote it as an alternative to the RM.
It's the domino theory: accept the Emperor's New Clothes in the form of arbitrary hierarchical (or other) data types, and IMS (by whatever name) is OK as the whole datastore.
As to tone, the use of the word "pure" in the text is clearly pejorative: that old fuddy-duddy structure which is now passe'. The hierarchical/xml datastore was the product of limited hardware, software, and intellect before the RM. We needn't return to those days.
Those who want to use hierarchical data are welcome to do so, so far as I am concerned. Intellectual honesty in that regard demands that those who wish to do so not also claim the mantel of the RM. You can't have it both ways, they don't coexist.
Think of it this way: would you have any regard for your oncologist if s/he told you that leeches were best option for treating your tumor? An ancient, and discredited, mode, rather than the smartest mode available? That's what the xml crowd are doing.
Again have no clue what you are rambling about. It appears I hit a sore nerve by using the word Pure. I apologize for that. But your harping on XML is simply out of place when that was not the point of my article. I also fail to see how having complex data types in a relational database adds to more developer code. In my experience it results in much less since a lot of that messy lifting can be done much shorter with SQL and functions/operators designed for those types.
I don't see the whole point of the discussion. Just read Chris Date's "Database in Depth" to understand that the relational model supports complex types as "domains" and the whole idea that relational databases are only for simple alphanumeric data is broken. It's only historica baggage, since early RDBMS implementations were limited in supported datatypes. There's nothing in the relational model that forbids complex datatypes (event nested relations) in columns.
Perhaps my use of pure was misleading. Note I talked about the Pure Relational Database, not the model. When did I ever mention the model? I'm talking about RDMS implementations not what the theory said they should have been. My point was that extended data types fit nicely in relational databases today and complement them well and people equate relational databases to simple things with no place for complex data types with their own special traversal needs.
I think we agree on the subject. It's true, you never talked about the model but ignorance of the model is the typical reason why people refuse the idea of using complex datatypes in RDBMSs. My point was simply that complex datatypes in RDBMSs are not a retrofit or marketing but a natural fit. So storing complex data in a RDBMS is safe and good if your level of data abstraction is so that the complex data item you store in the column is atomic in your context.And, by the way, don't we have ever stored dates in RDBMSs? Are dates atomic or complex data types?
You have a point there. Someone suggested my title was all wrong and should have been using complex data types in a relational database then it would not have generated so many flames. That makes sense to me. I apologize for the unfortunate choice of words.
I'm not sure how I could rewrite this to make it clearer and I don't think I wrote anything wrong either. I agree my choice of titles was poor and I could have clarified the intent more.
True SQL is not Codd's vision of what the relational language should have been and the relational databases we have today only partially follow Codd's vision and we get into stupid stink fights about the diversion.
I don't really want to turn this into some stupid pedantic fight about what is and is not the relational model vs. relational database. No wonder so many people don't want to use relational databases if all users are caught up in semantic disputes that go nowhere.
What is true is that pretty much all "Relational databases" have standardized on SQL so they are rightfully synonymous in many people's minds with the word "Relational database" and most of them did not start with arbitrary data types or allow creation of new types. So in my mind the "relational model" is very different from "relational database". One just happens to be patterned after the other.
The reason this discussion got bogged down in semantics is because you used a very strongly opinionated headline based on a strange definition of terms.
If you just said "good datatypes are useful, don't limit yourself to the handful of types defined by SQL", that would accomplish two things:
1. You'd get mostly agreement from people you fundamentally agree with, rather than arguing with people you fundamentally agree with which is what's happening now.
2. It would reveal one of the major reasons _why_ postgresql is great: it's thorough support for user-defined types. You can define sophisticated types with sophisticated indexing strategies, etc., to make them _really_ work as well as if you supported the type natively. For instance, as you know PostGIS is a first-class GIS system despite being a user-defined type.
Yes that is EXACTLY what I was trying to say. I realized after the fact that I used the wrong choice of words to say it.
Apologies to everyone. I forgot that "Pure" in many people's minds equates to the relational model.
I cannot agree with the characterisation of wanting to call a spade, a spade, as pedantism. You may disagree with Codd, and even more with Date, Pascal and others, but they sure have a point that we cannot have any serious discussion while using fuzzy terms.
Can we call it algebra if we define that 2+2=5? So why should we tolerate calling ‘relational databases’ systems where the primary objects, called ‘tables’, are not relations (sets) in any meaningful sense, but actually bags?
More to the point, why should we tolerate calling SQL ‘relational’ when (1) the SQL standards not even mention relational theory and, specially, when (2) all the limitations of SQL and its implementations are ascribed to the relational model, thus obscuring the value of the only real ‘silver bullet’ to appear in computing since functional programming set the course for all programming advances since, even outside the functional programming languages set?
In the end, it does not really matter because people are ‘unqualified and unaware of it’, as the famous eponymous study puts it. And given the philistinism evident in your self-defense, which grantedly can be ascribed not to an idiosyncrasy of yours but to the climate of opinion you live in, people will neither want to, nor accept, be educated, but will rather choose Plato’s cave’s dark end. So, yes, we pedants could as well just be quiet. But, if we cannot pastor, at least we can bleat, and that we will do so we cannot be accused of having been silent when we should have warned.
More modestly, even when restricted to SQL systems your argument is lame, as type extension is not incompatible in any way with a ‘pure SQL’ system. If you had said ‘simplistic SQL’ instead of ‘pure relational’ you would have error, but why be correct when you can make a headline? When legend and fact collide, profit-savy publishers choose legend… only that is farwest cinema, not computing science.
Please rewrite the article. It is just wrong, as SQL is not relational and relational databases have user‐defined types of arbitrary complexity.
Is is all right to make mistake and to ignore fundaments, as long as one corrects oneself when the error is pointed out.
Syndicate This Blog
Show tagged entries
Remote RSS/OPML-Blogroll Feed
No RSS/OPML feed selected