Amazon SimpleDB : Super! but too simple?
Saturday, December 15th, 2007Finally a big company gets it: Schema-less document oriented databases are the wave of future.
With the announcement yesterday of Amazon SimpleDB, a new way of storing and querying data has finally hit the mainstream so many of us have been trying to reach. I believe this kind of technology is a game-changer since it allows simple flexible storage and retrieval of multi-faceted data (describes most data on the web). That being said there appears to be a number of issues with the beta release that will hopefully be ironed out in months to come.
Here’s what we know so far:
- REST and SOAP APIs
- Domains represent a collection of documents, similar to S3 Buckets
- “items” or documents can contain upto 256 key-value pairs (called attributes)
- Multiple attributes with the same name allowed e.g. (type=flag, color=red, color=white, color=blue)
- Create, remove or update items and item attributes
- Attribute values limited to 1024 characters
- Very simple query language for searching domains; i.e ( =, !=, <, > <=, >=, STARTS-WITH, AND, OR, NOT, INTERSECTION AND UNION )
- No free text search capabilities
- Query time limited to 5 seconds, error thrown if query takes any longer.
- Query results can be limited and paged (total possible results are not returned)
- No sorting capabilities
- Eventual constancy model used for writes. This means if you update a document and instantly query it you may not get the un-updated document.
- Pay as you go based on storage and query utilization.
My big concerns here are the limits on sorting, freetext search, eventual consistency model and attribute size. I would use this for service for things like tag search, user preference storage and other non-critical meta data but not sure it would be useful or reliable enough to store things like a username, encrypted-password and email info.
One great thing that Amazon put into their intro doc which paralells the thrudb design was the following:
Developers can run their applications in Amazon EC2 and store their data objects in Amazon S3. Amazon SimpleDB can then be used to query the object metadata from within the application in Amazon EC2 and return pointers to the objects stored in Amazon S3.
This is exactly the way ThruDB’s thrudoc and thrucene services are intended to work together. However since thrucene is built on lucene they offer atomic writes, no hard limits, free text search and sorting :)
I am excited to get my hands on SimpleDB and I will defiantly use it sometimes as an alternative search interface to thrucene, however, I think the Amazon engineers had to compromise too many things in order to provide a ubiquitous database for everyone. I’m sure they will address a number of the limitations in the months to come. Either way its an exciting time for us data storage geeks :)
Now if only Google would release BigTable…



