Archive

Archive for the ‘coding’ Category

Your tax dollars at work (c++ developers)

January 2nd, 2008

There’s some nice public domain code out there written by government contractors from projects like the human genome project… This once in particular is a nice toolkit for distrbuted programming in c++…

http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC

It has some nice libraries for date math, asn.1 encoding, berkeleydb wrapper, sql database wrapper, fastcgi, etc…

Finally us geeks get something we can use from all the taxes we pay…

jake c++, coding

Announcing Thruqueue: Persistant message queue for Thrudb

December 28th, 2007

I’ve just checked in a new Thrudb service that I’ve been working on for the past few days called Thruqueue. I’m sure you can guess by the name that it’s yet another message queue service. But this one has some great features that I think makes it stand out.

No hard limits – Create as many queues you like, send messages as large as you like, send as many messages as you like.

Persistant queues – Under the hood Thruqueue is exploiting Thrift’s powerful redo logging capabilities so queues are really managed logs, one log per queue. At specified intervals the logs are pruned to maintain disk space, this means the memory profile of thruqueue stays small since only a few items from each queue lives in memory at any given time.

Unique Queues – I’ve also added the ability to create unique queues which essentially means no duplicate messages can exist in the queue at once.

Fast! – I’ve done almost no performance optimization but my initial tests look very promising in 1 second I can write then read ~1200 small messages.

Thrift – Want a client in your favorite language? just run: thrift -favlanguage Thruqueue.thrift

Whats missing:

Replication – Do you really need this? I could hook this puppy up to spread but I’m not sure I see the benifit.

Redundancy – Throxy? TBD

I know that there are certainly a lot of message queues out there but all of then are either non-persistant, cost money, require an underlying rdbms, or cost money. Let me know what you think.

jake coding, thrift, thrudb

Half-Asynch / Half-Synch Processing Model Added to Thrift

June 10th, 2007

Synchronous processing and asynchronous processing have different strengths and weaknesses. Asynchronous processing is often confusing to system developers, but it scales really well. Synchronus processing (i.e. multi-threaded) is easy to add into a traditional program but is often resource intensive.

A good analogy of Asynch vs Synch programming is writing a SAX XML parser vs a DOM parser… DOM is easy to code but heavyweight. SAX is more complicated to code since but way faster and less resource intensive.

When it comes to web services they need to be able to support many simultaneous clients and perform complicated backend processing.

A great backend component we’ve mentioned before is memcached. This uses asynchronus processing so it can support tens of thousands of connections and is extremely fast because it’s actual work it to store and retrieve data from an in memory hash table.

But sometimes you need to perform processing intensive requests like searching a large index or image processing… to do this using an asynchronous processing model would be tricky and ineffective. At the same time, reading and writing the request over a socket using a synchronous processing model (one thread per request) would be a waste (imagine 200 56k clients connecting to you at the same time, that means 200 threads). The best solution is to perform the network IO using asynchronous processing and request processing using multiple threads (synchronus processing).

The ACE toolkit defined this design pattern years ago and I’ve used it quite effectively in the past however ACE is a very heavyweight only c++ library and its learning curve is pretty steep. This is why we are using thrift instead, since its interoperable with many languages and contains a lightweight c++ toolkit.

We are building some pretty cool web services using Thrift we recently worked with facebook to implement Half-Synch/Half-Asynch support to its c++ toolkit. As a result we can now support thousands of long lived connections while processing requests in a large thread pool.. Best of both worlds!

jake c++, coding, facebook, programming, scaling, thrift, web service

memcache++ – a c++ client for memcached

June 2nd, 2007

I’ve been using memcached for a couple years now and it’s by far one of the most useful open source tools. It’s written by Brad Fitzpatrick of livejournal, and used by everyone from, yahoo to facebook.

It so good because does one thing and it does it very well: It provides a distributed in memory cache.

Its super fast and super reliable, and it makes speeding up your site or application simple.

The server is written in c and uses libevent which is an highly scalable, asyncronous event based polling library which allows it to support tens of thousands of concurrent connections.

The client protocol is compact and there are clients in almost every language. The client libraies do most of the work since they can compress the data that goes into memcached and maintain multiple connections to different memcached instances running on your network (this is why the cache is distributed).

One of the few problems I’ve had with memcached have todo with the c based memcache clients libmemcache and apr_memcache.

libmemcache has problems corking the socket in linux, since it’s written for bsd. I’ve had reliability problems under high load where the client garbles the protocol.

apr_memcache works but the memcached protocol isn’t fully supported and it requires apr, which is a dependancy I’d rather not build into a non apache service.

Plus, I really wanted a OO api to use since I’m writing in c++ not c.

So I built memcache++ about a year ago, based origionally on a now defunct c++ client called memcachedpp.

Now that we have launched third rail, I now have a place to release memcache++ so grab it here:

http://3.rdrail.net/code/memcache++

Please send me a mail if you have problems with it, or if you are using it. I’m using it on a few projects with great success!

-Jake

jake TR Site, c++, coding, memcached

Facebook’s Thrifty

May 27th, 2007

When I first read about the facebook platform last week I have to admit I wasn’t all that excited. Sure it’s great to see facebook open its doors to 3rd parties, but I’m not itching to code the next social slideshow widget. What’s got got me all fired up however, is something else facebook launched a couple of months ago which largely went unnoticed, called Thrift. It’s what they built their platform with.

Thrift is essentially a framework for building web services that can be accessed by most languages. It’s similar to CORBA in that you define a interface file and thrift will generate stubs implementing that interface in any of it’s supported languages.

  • C++
  • PHP
  • Python
  • Java
  • Ruby
  • And recently Perl (thanks to us)

Once you have these stubs you can build the service backend in the above language of your choice and access it from any of the other languages.

This brings us to rule #1 of programming. Knowing when to use the right tool for the right job.

You would (hopefully) never write a web frontend layer in c++ when php was built specifically for that purpose. You would also (hopefully) never build a search engine backend in php since it will be difficult to maximize performance while minimizing memory and cpu time.

So with thrift you can build the search engine in c++ and access it via php (which is what facebook does).

The default transport mechanism is a compact binary format but its fully extensible to any format (xml,json).

But thats just the beginning. You can do some really cool things with Thrift, like log all messages to file and play them back (instant redo logs). Version your data structures so you can still keep backwards compatibility with older stored data.

Thrift comes with some top notch c++ code to quickly build scalable backend c++ services. We are using thrift today as the search backend for junkdepot.com.

I hope to start a series of artcles on how to use Thrift as an alternative to standard LAMP. I really think thrift will become the backbone of next generation web services. We also will be releasing some of the services we’ve built with thrift as open source projects soon.

Feel free to ask questions.

-Jake

jake TR Site, coding, facebook, thrift