Archive

Archive for the ‘ec2’ Category

Moved to Slicehost!

March 8th, 2008

I’m definatly a cloud computing groupie.

Xen has really opened up the options for hosting. For a looong time the only way to get root on your host was to pay for a dedicated machine which was unreliable and hard to reboot / upgrade.

3rd rail has been running on unixshell which pioneered the xen hosting approach back in 2005. They ran into problems though and have been asking us to move to a wack closed source virtualization platform.

We’ve also been using EC2 which is by far the leader in this arena but its got some drawbacks… Mainly cost of ownership $70 a month for a single instance.

This afternoon we moved to a relatively new VPS provider called slicehost. The guys over there have really done a superb job building a scalable hosting solution.

The admin tools are first rate and they are building a great community. Best of all their service starts at $20 a month.

We’ll still use EC2 but not for small things like our blogs and low traffic sites.

jake ec2, hosting, slicehost, xen

Announcing: Thrudb – Document Oriented Database Services

November 4th, 2007

There has been a lot of talk recently about how traditional relational databases no longer fit the bill for web development. This is certainly a bit over the top since every site I’ve ever built or seen built uses a RDBMS. But I think the point is that not a lot has changed in the world of data storage since the 70’s. SQL, DDL and Referential Integrity are ideas that all came before the onset of the web. Databases are just big spreadsheets really but is that the best storage structure web data?.

A new breed of databases and data services have emerged to in recent years to address this. The first product I came across was an XMLDB and XQuery but this system was built to offer everything a regular database offers PLUS a bunch of new features like on the fly indexing of any field. The problem with this kind of approach is it ends up complicating the API. Not to mention XML and performance don’t really fit together. I’m a big believer is simple/fast software components that can be put together to create powerful/fast systems. Google is the best known example of this. They are built to be massively parallel, so much so that there was no way a RDMBS would work. Instead they first built the Google File System which splits their data into 64MB chunks and spreads it across thousands of machines making at least 3 copies of any chunk for redundancy. Then they use techniques like MapReduce to create indexes of these documents, split it into index shards and spread those across their network too. Finally they have services that run on these machines that coordinate searches across their index shards returning the document ids and fetches them from the document store.

They have also built a system called BigTable, which is a Column Oriented Database, which splits a table into columns rather than rows making is much simpler to distribute and parallelize.

So why are these systems any better than a relational database? Well for one thing they make it much easier to scale horizontally, meaning you can slap on another box to the network and increase your database capacity. This is exactly how webservers scale but anyone who has tried to scale their website will tell you it’s never as easy to scale your database as it is your webservers, since traditional databases are inherently monolithic.

Another benefit is your data structures can be sparsely populated and linked across any number of facets in these systems. The story of del.icio.us or flickr trying to scale using tagging and mysql is a great read because it illustrates the problem you run into when using fixed schema’s to hold dynamic/fluid data that wants to be searched, mashed-up, split up and grouped any which way.

Ok, so how do I as a developer address this… Isn’t it obvious? Build a solution from open source components!

I never would have attempted this if it weren’t for Facebook’s Thrift project. It provides much of what I needed to get this off the ground. Specifically the ability to build services that can communicate with almost any language. They used it internally to build much of their infrastructure like search and the Facebook platform itself. Thrift on the surface looks like a stripped down version of CORBA. You define structures and services in a IDL and use its code compiler to generate object definitions and a client/server interface. But Thrift offers soo much more. Most importantly, the ability to transmit your objects over any protocol be it binary, xml, json as well as over any transport (tcp socket, http, file).  Another big benefit of Thrift is you can adjust your structure definitions over time while keeping backwards compatibility with your previous definition. BINGO. This is a big deal because one of big reasons I keep using databases like mysql is so I can adjust my schema as I find bottlenecks or bugs. In fact Google has built a very similar system to Thrift which is how they store data on GFS, using compressed serialized objects they call protocol buffers.

Ok, so I had a development platform, Thrift, now just add a few months of late night coding and a little Memcached, Spread, CLucene and Brackup and I ended up with…

Thrudb is a set of simple services built on top of Facebook’s Thrift framework that provides indexing and document storage services for building and scaling websites. Its purpose is to offer web developers flexible, fast and easy-to-use services which can enhance or replace traditional data storage and access layers.

Thrudb Features:

  • Client libraries for most languages
  • Multi-master replication
  • Incremental backups and redo logging
  • Multiple storage backends (S3 included)
  • Built for horizontal scalability
  • Simple and powerful search api (Lucene)

Thrudb solves a lot of problems for me. Biggest of all is, now with Thrudb, I can use Amazon EC2 as a stable server farm since my backend database writes directly to S3. In fact, I’ve successfully moved Junkdepot from a traditional hosting facility using a mysql database to multiple EC2 instances using Thrudb in a week.

check it out: http://thrudb.googlecode.com

I’m not saying Thrudb is complete and production ready, but I do think its pretty reliable and simple to try out. I’m hoping you the reader can help make it better with your testing, coding and insight…

jake database, ec2, facebook, thrift, thrudb, web

Is Amazon EC2 Worth The Cost?

June 4th, 2007

We have been considering using Amazon EC2 to launch our next Third Rail project, but I’m having trouble justifying the cost.  I was under the impression that Amazon charged you $0.10 per cpu hour but I misread and it’s really just per hour. So that mean to run one instance for a month would cost ~$72 dollars a month. Not that great when you consider other VPS companies charge ~$50 a month for an equivalent spec virtual machine. I suppose the benefit of using Amazon is you can quickly scale you server instances. But since Amazon doesn’t support static IPs you really have to build a lot of logic around configuring a server to plug into your architecture quickly. I guess I’m not completely sold on Amazon EC2 yet. I’ll see how it works with our next project and let you know.

jake amazon, ec2