Home > programming, scaling, web > is scaling easy? it can be.

is scaling easy? it can be.

The ruby on rails folk say scaling is easy, and they are correct, but there are many different components to scale. Scaling web servers horizontally using reverse proxy tools like pound or perlbal and caching with memcached and varnish will get you pretty far but real web applications need to scale their content and not just their service. Take Flickr for example, how do you scale millions of photos? Or Twitter, can’t put all those tweets in a single database. Thats why the key to real scaling is considering your service a year down the road, what considerations can you make now to make it easier to scale later on (if you are lucky enough).

The most surefire way to scale your content is to make it easier to federate or partition your data. That means following these simple rules:

1. Keep away from sequential primary keys in your database. Use UUIDs they can be generated globally from anywhere and with no chance of collision, you can more easily move to a multi-master database model this way if you have to, or split your data into partitioned chunks based on hashing the UUID.

2. Don’t use stored procs (ever!). Thankfully most of us are used to not having stored procs in mysql so this isn’t a big deal, but if you split up your database into smaller pieces you can’t use a traditional stored proc to search across them all, not to mention its bad to put business logic in your model layer.

3. Think about using special search tools, like lucene for searching across specific types of data. Related to the above rule searching across you data is hard when you split it up into pieces but tools like lucene make it easy to create small meta-indexes of your data which can easily fit a lot more info than a big innodb table.

4. Don’t store binary data in your db, unless you like pain you should never store things like images in a database. Just store the path to it. I’ve found it easy to take the MD5 of the image and use that as the name, since you can then partition your images evenly across many directories (and eventually disks). Or just use amazon s3 :)

5. Finally, Only store what you need. Scaling becomes much harder when you build a lot of complexity and normalization into your data model. Keep it simple stupid. People don’t like complicated apps, believe me I know :)

There are some great talks about scaling data here.

jake programming, scaling, web

Viewing 2 Comments

 
close Reblog this comment
blog comments powered by Disqus