Posts RSS Comments RSSTwitter 18 Posts and 32 Comments till now
This wordpress theme is downloaded from wordpress themes website.

MongoUK

MongoUK has been and gone and the day went very well. Many thanks to the 10gen guys who sponsor MongoDB for putting on a great meeting. My head is spinning with thoughts and ideas about MongoDB and the host of possibilities it clearly opens up for many classes of application.

It was interesting to learn that MongoDB was born primarily out of a desire for high performance and horizontal scalability i.e. the ability to scale out to multiple nodes as needed. Other properties such as its non-relational and schema-free nature are happy side effects as joins and the maintenance of referential integrity are difficult in a highly parallel environment. It also explains why there is no strong drive to implement transactions, a feature completely absent from MongoDB which is not only hard to implement in a highly parallel environment but would also be detrimental to performance.

Some stand-out features of MongoDB discussed at the meeting include the following:

  • MongoDB is schema free and far more capable than simply storing arbitrary key / value pairs. MongoDB stores BSON documents, a binary JSON format meaning that data of any complexity and nesting can be represented.
  • Through the use of BSON, MongoDB understands several basic data types including integers, strings, floating point and datetime. Many similar databases only support strings and perhaps numeric data types usually leaving the developer to do some parsing in their application.
  • Indexes can be created at any level of nesting in a document, and significantly, an index can be created on an array. Imagine a database full of objects with nested attributes one of which is an array of tags then rapidly searching for all objects with a given tag.
  • A large number of drivers are already developed and well supported.
  • MongoDB has extremely rich query support. Apparently the developers considered using SQL as the query language but settled on a JSON query format. The result is an expressive query language which is satisfyingly symmetrical to the nature of the data being queried upon.

In my previous post I hoped to learn whether MongoDB would be a good solution to storing and querying our arbitrary key value data. MongoDB would clearly solve some of our issues and I look forward to putting MongoDB though its paces over the coming days and to assess it with respect to other document-oriented databases.

Will MongoDB Save The Day?

I’m greatly looking forward to attending tomorrow’s MongoUK, the UK meeting for MongoDB, one of a new breed of so called document-oriented databases.  I thought I’d pen a few thoughts about MongoDB and document-oriented databases in general before the meeting so I could compare them to my thoughts after the meeting.

MongoDB is “a scalable, high-performance, open source, document-oriented database”.  The terms scalable, high-performance, document-oriented, object-oriented, non-relational, schema-free, key/value store, and NoSQL are all used interchangeably for this class of database.  All serve to describe the qualities of this class of database whilst at the same time muddying the water a little by providing multiple and sometimes conflicting reasons for choosing them, or more commonly, for switching over from the commonplace relational database.

I’ve followed the rise and rise of document-oriented databases with interest over the last few years and I often sense that the reason for making the switch is to latch onto one of the qualities mentioned above then try hard to make the other qualities work.  For example, to make the switch to gain massive scalability then work around the loss of SQL and rich query functionality.  The other reason to make the switch is that these databases are new, cool, often a joy to work with, and a refreshing change from the confines of the relational database.  I’ll be the first to put my hand up and favour these reasons for considering the switch for certain applications.  More specifically, I’m investigating the use of a schema-free database whilst contemplating what this might mean with regard to database transactions, referential integrity, and the ability to query your data.

For several years I’ve worked on an application called Asset DB which records the location and attributes of company assets.  There’s no end to the different attributes that might belong to an asset, for example, Serial No, Colour, IP Address, Installation Date, and this list very much depends on the type of asset, for example, a piece of artwork will have an Artist but a PC will have a MAC Address.  Added to this, there’s no end to the different types of asset which might be recorded from artwork to waste-bins to paper-clips.  Asset DB uses a relational database and we were faced with the conundrum of how to structure and store this mish-mash of data.  The problem is not unlike the question of how to store tag data in a relational database (see http://forge.mysql.com/wiki/TagSchema for opinions on how this might be done).  As it happens, we completely side-stepped the issue and simply lumped the attributes into a single string field structured in our own format, in effect, turning the field into a flat file.  The following gives you a rough representation of the schema we used:

 +----+------------+------------+-------------------------------------------------+
 | id | date_added | asset_type | attributes                                      |
 +====+============+============+=================================================+
 | 1  | 2010-06-16 | PC         | assetno=ABC010,colour=blue,mac=00:11:2:33:44:55 |
 | 2  | 2010-06-17 | Artwork    | assetno=ABC020,artist=Picasso,value=$20000      |
 +----+------------+------------+-------------------------------------------------+

The clear advantage is that we can store any key / value data for any asset.  The disadvantages are numerous:

  • There’s no possibility of using standard database constructs for maintaining structural integrity of the data.
  • Over time, our needs have changed and we’ve had to devise way of storing more complex data structures such as lists, maps, time-series data etc.
  • To read even a small part of the data requires reading in the entire string of attributes then parsing it.  Also, any change to a small part of that data, for example, to amend the artist on a piece of artwork, requires reading, parsing, changing then writing back the entire string to the database.
  • The ability to query the data is severely limited, for example, to find all artwork by Picasso, we can’t use the database to do the query for us.  Instead every reference to artwork in the database has to be read and parsed, or some sort of full-text search is needed.  Additionally, opening the database to third parties isn’t as simple as saying “here’s the database, now run some SQL as you please”.

To be fair, we could have formatted our data in something like XML or JSON (although the decision was made prior to the standardisation of JSON), and perhaps we could have used an XML or object database rather than a relational database, but our needs were very simple at the time.  I’d be interested to learn how others have implemented the above in their applications. Meanwhile, I’m hoping to learn whether document-oriented databases might be a good solution.

Why MongoDB?  I was first introduced to MongoDB about a year ago and I’ve played with it a few times since then.  It appears to provide a good balance between providing a schema free database whilst retaining the ability to richly query your data.  As the front page of the MongoDB web site says:

The Best Features of Document Databases, Key-Value Stores, and RDBMSes.

MongoDB bridges the gap between key-value stores (which are fast and highly scalable) and traditional RDBMS systems (which provide rich queries and deep functionality).

Perhaps this is exactly what our application needs and I’m looking forward to learning more tomorrow.