Archive

Archive for the ‘thrift’ Category

Thrift moving to Apache

January 24th, 2008

Facebook is looking to move thrift to the apache incubator.  They have submitted a proposal to the apache and are awaiting approval.  This is pretty much a shoe in since there are really no dependencies on thrift besides boost.  Plus the hadoop team is very keen on integrating thrift into hbase and probably hadoop as a whole…

Looks like I’m on their list of initial commiters which would be a nice plus, as I will continue contributing features and target languages as thrudb develops.

jake thrift, thrudb

Video: Thrift Technical Discussion

January 21st, 2008

Mark Slee and David Reiss from Facebook explain the design and implementation details of the Thrift project.

One interesting point of this talk… Mark mentions a soon to be open sourced service called “Scribe” that is based their news feed architecture which performs, what sounds like, a distributed work queue for processing thrift message logs… sweet.

Video thumbnail. Click to play
Click To Play

This talk was given at Seneca College Oct 25 2007 as part of the Free Software & Open Source Symposium.original video link

jake thrift, thrudb, video

Announcing Thruqueue: Persistant message queue for Thrudb

December 28th, 2007

I’ve just checked in a new Thrudb service that I’ve been working on for the past few days called Thruqueue. I’m sure you can guess by the name that it’s yet another message queue service. But this one has some great features that I think makes it stand out.

No hard limits – Create as many queues you like, send messages as large as you like, send as many messages as you like.

Persistant queues – Under the hood Thruqueue is exploiting Thrift’s powerful redo logging capabilities so queues are really managed logs, one log per queue. At specified intervals the logs are pruned to maintain disk space, this means the memory profile of thruqueue stays small since only a few items from each queue lives in memory at any given time.

Unique Queues – I’ve also added the ability to create unique queues which essentially means no duplicate messages can exist in the queue at once.

Fast! – I’ve done almost no performance optimization but my initial tests look very promising in 1 second I can write then read ~1200 small messages.

Thrift – Want a client in your favorite language? just run: thrift -favlanguage Thruqueue.thrift

Whats missing:

Replication – Do you really need this? I could hook this puppy up to spread but I’m not sure I see the benifit.

Redundancy – Throxy? TBD

I know that there are certainly a lot of message queues out there but all of then are either non-persistant, cost money, require an underlying rdbms, or cost money. Let me know what you think.

jake coding, thrift, thrudb

Thrudb Tutorials

December 11th, 2007

Working with thrift structures

November 23rd, 2007

While developing thrudb I’ve been using thrift a lot and have a couple tricks to share.

First trick is how to do simple reflection. Thrift lets you do things like serialize a structure to a binary string and store it on disk. The problem is thrift doesn’t store the structure’s definition along with it since this information would bloat the message and frankly goes against the design of thrift, which allows loose structure definitions (see section 4 of the thrift whitepaper)

To get around this we need to encode the type of structure we have as a field in the struct itself.

Lets start with an example: Say I want to store a mixed list of Email and RSS articles in a file for backup purposes or better yet in thrudb.
Heres our thrift definition file:

#this is a thrift definition

enum  ObjectType {
    UNKNOWN      = 0,
    EMAIL            = 1,
    RSS _ARTICLE = 2
}

struct SimpleObject {
   100:ObjectType     type = UNKNOWN
}

struct Email {
    1:string subject,
    2:string to_address,
    3:string from_address,
    4:i32     date,
    5:string body,
    100: ObjectType    type=EMAIL
}

struct RssArticle {
    1:string uri,
    2:string title,
    3:string body,
    4:i32     date,
    100:ObjectType type=RSS_ARTICLE
}

So what we did here is set the 100th parameter to be the struct type, then assigned it an default enumeration key from the list of possible types so when a struct is instantiated its type is automatically set. This information is included when a struct is serialized to disk, so when we read the message back we can use our DUMMY stuct “SimpleObject” to check it’s type. The SimpleObject stuct will ignore all the other fields in the message, only loading the 100th param (enum key). Now we know which structure to allocate.

Heres a pseudo example of this in action:

    $serialized_object = get_random_serialized_object();
    $type_obj = new SimpleObject( $serialized_object );

    switch($type_obj->type){

     case EMAIL:

           return new Email( $serialized_object );

     case RSS_ARTICLE:

           return new RssArticle( $serialized_object );

     default:

           print "Unknown type!";

     };

}

jake programming, thrift, thrudb