I’m greatly looking forward to attending tomorrow’s MongoUK, the UK meeting for MongoDB, one of a new breed of so called document-oriented databases. I thought I’d pen a few thoughts about MongoDB and document-oriented databases in general before the meeting so I could compare them to my thoughts after the meeting.
MongoDB is “a scalable, high-performance, open source, document-oriented database”. The terms scalable, high-performance, document-oriented, object-oriented, non-relational, schema-free, key/value store, and NoSQL are all used interchangeably for this class of database. All serve to describe the qualities of this class of database whilst at the same time muddying the water a little by providing multiple and sometimes conflicting reasons for choosing them, or more commonly, for switching over from the commonplace relational database.
I’ve followed the rise and rise of document-oriented databases with interest over the last few years and I often sense that the reason for making the switch is to latch onto one of the qualities mentioned above then try hard to make the other qualities work. For example, to make the switch to gain massive scalability then work around the loss of SQL and rich query functionality. The other reason to make the switch is that these databases are new, cool, often a joy to work with, and a refreshing change from the confines of the relational database. I’ll be the first to put my hand up and favour these reasons for considering the switch for certain applications. More specifically, I’m investigating the use of a schema-free database whilst contemplating what this might mean with regard to database transactions, referential integrity, and the ability to query your data.
For several years I’ve worked on an application called Asset DB which records the location and attributes of company assets. There’s no end to the different attributes that might belong to an asset, for example, Serial No, Colour, IP Address, Installation Date, and this list very much depends on the type of asset, for example, a piece of artwork will have an Artist but a PC will have a MAC Address. Added to this, there’s no end to the different types of asset which might be recorded from artwork to waste-bins to paper-clips. Asset DB uses a relational database and we were faced with the conundrum of how to structure and store this mish-mash of data. The problem is not unlike the question of how to store tag data in a relational database (see http://forge.mysql.com/wiki/TagSchema for opinions on how this might be done). As it happens, we completely side-stepped the issue and simply lumped the attributes into a single string field structured in our own format, in effect, turning the field into a flat file. The following gives you a rough representation of the schema we used:
| id | date_added | asset_type | attributes |
| 1 | 2010-06-16 | PC | assetno=ABC010,colour=blue,mac=00:11:2:33:44:55 |
| 2 | 2010-06-17 | Artwork | assetno=ABC020,artist=Picasso,value=$20000 |
The clear advantage is that we can store any key / value data for any asset. The disadvantages are numerous:
- There’s no possibility of using standard database constructs for maintaining structural integrity of the data.
- Over time, our needs have changed and we’ve had to devise way of storing more complex data structures such as lists, maps, time-series data etc.
- To read even a small part of the data requires reading in the entire string of attributes then parsing it. Also, any change to a small part of that data, for example, to amend the artist on a piece of artwork, requires reading, parsing, changing then writing back the entire string to the database.
- The ability to query the data is severely limited, for example, to find all artwork by Picasso, we can’t use the database to do the query for us. Instead every reference to artwork in the database has to be read and parsed, or some sort of full-text search is needed. Additionally, opening the database to third parties isn’t as simple as saying “here’s the database, now run some SQL as you please”.
To be fair, we could have formatted our data in something like XML or JSON (although the decision was made prior to the standardisation of JSON), and perhaps we could have used an XML or object database rather than a relational database, but our needs were very simple at the time. I’d be interested to learn how others have implemented the above in their applications. Meanwhile, I’m hoping to learn whether document-oriented databases might be a good solution.
Why MongoDB? I was first introduced to MongoDB about a year ago and I’ve played with it a few times since then. It appears to provide a good balance between providing a schema free database whilst retaining the ability to richly query your data. As the front page of the MongoDB web site says:
The Best Features of Document Databases, Key-Value Stores, and RDBMSes.
MongoDB bridges the gap between key-value stores (which are fast and highly scalable) and traditional RDBMS systems (which provide rich queries and deep functionality).
Perhaps this is exactly what our application needs and I’m looking forward to learning more tomorrow.
safe :: 17 Jun 2010 ::
Posted in Programming :: ::
1 Comment »
Tags: databases, mongodb