Since I’ve been working on Thrudb I’ve become a big twitter user. So naturally I’ve created a Thrudb twitter account for people who are interested in tracking progress on the project. I also wanted to have the changeset tweeted whenever I commit code into subversion.
Subversion has a post-commit hook that will run a script when you commit something so I ended up finding twitvn which does what I wanted but only for trac based projects. Thrudb uses googlecode so it gets more complicated since we can’t install scripts on google hardware :) but they do support a nifty url callback that will post the commit info to you.
So off I went and whipped up a script just for googlecode projects that want to tweet their commits. Just put this in as a cgi on your webserver and tell googlecode the location (under Administration->Source tab). Here is an example of the output.
#!/usr/bin/perl
use strict;
use warnings;
use Net::Twitter;
use JSON::Any;
use Digest::HMAC_MD5 qw(hmac_md5_hex);
use LWP::Simple;
#Just update these
use constant SECRET_KEY => "SECRET_KEY_FROM_GOOGLE";
use constant TWITTER_USER => "TWITTER_USER";
use constant TWITTER_PASS => "TWITTER_PASS";
#check for defined digest from google
my $remote_digest = $ENV{HTTP_GOOGLE_CODE_PROJECT_HOSTING_HOOK_HMAC};
die("missing hmac digest") unless defined $remote_digest;
#fetch json message upto 100k
die("message too long") if $ENV{CONTENT_LENGTH} > 100000;
my $json_str;
read(STDIN, $json_str, $ENV{CONTENT_LENGTH});
#calc local digest and verify
my $digest = hmac_md5_hex($json_str, SECRET_KEY);
die("digests don't match") unless $digest eq $remote_digest;
#construct tweet and shorten changeset url
my $obj = JSON::Any->jsonToObj($json_str);
my $comment = $obj->{revisions}[0]->{message};
my $url = get "http://is.gd/api.php?longurl=".
"http://code.google.com/p/".$obj->{project_name}.
"/source/detail?r=".$obj->{revisions}[0]{revision};
die("problem shortening $url") unless defined $url && $url !~ "Error";
my $tweet = "svn: ".$comment." ".$url;
#shorten?
if(length($tweet) > 140){
$tweet = substr($comment,0,140 - 5 - (length($tweet) - 140))."... ".$url;
}
#tweet!
Net::Twitter->new(username=>TWITTER_USER(), password=>TWITTER_PASS() )
->update($tweet);
print "content-type:plain/textnn";
print "OKn";
jake googlecode, subversion, thrudb, twitter
Earlier this week I showed how to create a twitter search with thrudb.
The service has been running now for about a week and it’s collected over 8 million tweets. I’ve run some stats on the lucene db and these are the top 100 words with more than 3 letters :)
jake lucene, thrudb, twitter
I’ve been using twitter quite a bit lately and really like the simplicity of the service and api. One thing thats missing though is search, but there are some great sites like tweetscan and summize that let you search public tweets in close to real-time.
I decided indexing twitter is a great application for thrudb, specifically the thrudex service. Thrudex is essentially a Thrift service for CLucene with some special sauce added. If you’d like to read about the inner workings read this.
Anyway, I whipped up a demo (in perl) for a realtime twitter search and have indexed a few days of tweets (over 3 million!) . check it out here.

One of our regular contributers, Thai Duong, was kind enough to port it to python+django for you new school folks.
*Note* this is running on a single dev box, so be forgiving… It’s currently polling the public timeline feed so it’s not going to catch every tweet. but It captures ~85%
We’ve added the code as a tutorial for thrudb here. Take it and build your own service… Any takers on building a ruby version or a cross site social search aggregation ala friendfeed?
jake thrudb, twitter
Wanted to officially announce that we’ve moved from
http://thrudb.googlecode.com to http://thrudb.org
The code is now located under http://svn.thrudb.org/thrudb
We have make a lot of big changes to the code.
1. We moved all the services to top level projects, each with their own
configure script. Â This way you can install just the thrudoc service if you don’t want to
install thr others. Â There is also a shared library (thrucommon) .
2. The configure scripts are a lot more forgiving of missing libraries.
3. Ross has taken over thrudoc and now there is a layered backend approach.
Supported backends include:
- Disk Backend
- Mysql Backend
- Berkleydb Backend
- Memcache Backend
- Spread Backend
- Log Backend
- Bloom Filter Backend
- Stats Backend
4. Thrucene has been renamed to thrudex. Â The api is cleaned up and the code
is structured such that we can support multiple storage engines.
I’ve attempted to squeeze the most performance under  a live index scenario
out of the CLucene backend as possible. Â Will be posting perf numbers soon.
This is just a quick summary. Ross and I will be working on building out the
trac wiki. Â Updating the tutorials and providing some pretty charts, graphs
and performance stats.
jake thrudb
Facebook is looking to move thrift to the apache incubator. They have submitted a proposal to the apache and are awaiting approval. This is pretty much a shoe in since there are really no dependencies on thrift besides boost. Plus the hadoop team is very keen on integrating thrift into hbase and probably hadoop as a whole…
Looks like I’m on their list of initial commiters which would be a nice plus, as I will continue contributing features and target languages as thrudb develops.
jake thrift, thrudb