Archive for the ‘thrudb’ Category

100 Most Popular Words Twittered This Week

Thursday, April 24th, 2008

Earlier this week I showed how to create a twitter search with thrudb.

The service has been running now for about a week and it’s collected over 8 million tweets. I’ve run some stats on the lucene db and these are the top 100 words with more than 3 letters :)

http tinyurl.com just quot have from what like about good twitter time your work some today going back more blog don’t think know when home post need love still really here people getting last over been great want night much well would 2008 morning should thanks only right para can’t after make again working watching first down very than nice i’ve tonight them week better trying playing i’ll tomorrow done video could looking take news next long listening where even live cool something come another little watch before lunch thing free doing stuff happy yeah does being having you’re life anyone
[del.icio.us] [Digg] [dzone] [Google] [Mixx] [Reddit] [StumbleUpon]
Writen by jake

Roll your own real-time twitter search with thrudb

Monday, April 21st, 2008

I’ve been using twitter quite a bit lately and really like the simplicity of the service and api. One thing thats missing though is search, but there are some great sites like tweetscan and summize that let you search public tweets in close to real-time.

I decided indexing twitter is a great application for thrudb, specifically the thrudex service. Thrudex is essentially a Thrift service for CLucene with some special sauce added. If you’d like to read about the inner workings read this.

Anyway, I whipped up a demo (in perl) for a realtime twitter search and have indexed a few days of tweets (over 3 million!) . check it out here.

tweetsearch.gif

One of our regular contributers, Thai Duong, was kind enough to port it to python+django for you new school folks.
*Note* this is running on a single dev box, so be forgiving… It’s currently polling the public timeline feed so it’s not going to catch every tweet. but It captures ~85%
We’ve added the code as a tutorial for thrudb here. Take it and build your own service… Any takers on building a ruby version or a cross site social search aggregation ala friendfeed?

[del.icio.us] [Digg] [dzone] [Google] [Mixx] [Reddit] [StumbleUpon]
Writen by jake

thrudb.org - it’s official

Thursday, February 7th, 2008

Wanted to officially announce that we’ve moved from
http://thrudb.googlecode.com to http://thrudb.org
The code is now located under http://svn.thrudb.org/thrudb
We have make a lot of big changes to the code.
1. We moved all the services to top level projects, each with their own
configure script.  This way you can install just the thrudoc service if you don’t want to
install thr others.  There is also a shared library (thrucommon) .

2. The configure scripts are a lot more forgiving of missing libraries.

3. Ross has taken over thrudoc and now there is a layered backend approach.
Supported backends include:
- Disk Backend
- Mysql Backend
- Berkleydb Backend
- Memcache Backend
- Spread Backend
- Log Backend
- Bloom Filter Backend
- Stats Backend
4. Thrucene has been renamed to thrudex.  The api is cleaned up and the code
is structured such that we can support multiple storage engines.
I’ve attempted to squeeze the most performance under  a live index scenario
out of the CLucene backend as possible.  Will be posting perf numbers soon.
This is just a quick summary. Ross and I will be working on building out the
trac wiki.  Updating the tutorials and providing some pretty charts, graphs
and performance stats.

[del.icio.us] [Digg] [dzone] [Google] [Mixx] [Reddit] [StumbleUpon]
Writen by jake

Thrift moving to Apache

Thursday, January 24th, 2008

Facebook is looking to move thrift to the apache incubator.  They have submitted a proposal to the apache and are awaiting approval.  This is pretty much a shoe in since there are really no dependencies on thrift besides boost.  Plus the hadoop team is very keen on integrating thrift into hbase and probably hadoop as a whole…

Looks like I’m on their list of initial commiters which would be a nice plus, as I will continue contributing features and target languages as thrudb develops.

[del.icio.us] [Digg] [dzone] [Google] [Mixx] [Reddit] [StumbleUpon]
Writen by jake

Video: Thrift Technical Discussion

Monday, January 21st, 2008

Mark Slee and David Reiss from Facebook explain the design and implementation details of the Thrift project.

One interesting point of this talk… Mark mentions a soon to be open sourced service called “Scribe” that is based their news feed architecture which performs, what sounds like, a distributed work queue for processing thrift message logs… sweet.

Video thumbnail. Click to play
Click To Play

This talk was given at Seneca College Oct 25 2007 as part of the Free Software & Open Source Symposium.original video link

[del.icio.us] [Digg] [dzone] [Google] [Mixx] [Reddit] [StumbleUpon]
Writen by jake

Rails ODM for Thrudb

Saturday, January 12th, 2008

Wow.

The rails community really knows how to contribute. I’m not personally a rails guy (yet) but I’m very impressed with the open-mindedness of rails developers (except for zed).

There are now three (public) efforts to create ORM (or in this case ODM) packages for thrudb.

http://www.pauldix.net/2008/01/thrudb-orm-for.html
http://www.notsostupid.com/blog/2008/01/10/thrudb-for-rails-activedocument/
http://weblog.techno-weenie.net/2008/1/12/activedocument

Matt Knox introduced Thrudb last week at the nyc.rb group which got things rolling (presentation link).

I’m planning on attending the next meeting to offer my thanks and see how I can help.

For those on the west coast, Ross will be presenting at the next http://superhappydevhouse.com/

Please use the thrudb google group so we can help out.

[del.icio.us] [Digg] [dzone] [Google] [Mixx] [Reddit] [StumbleUpon]
Writen by jake

Thrudb Logo

Thursday, January 10th, 2008

Rich just whipped this up for Thrudb, What do you think? I love it!

Thrudb

Update: After some discussion we’ve decided on this…

thrudb_22.png

[del.icio.us] [Digg] [dzone] [Google] [Mixx] [Reddit] [StumbleUpon]
Writen by jake

OSS: Open Source Stress

Monday, January 7th, 2008

One thing I wasn’t expecting to happen when I open sourced Thrudb is the stress I now feel.

It’s great to receive interest, requests and help. Only now I have people waiting and watching for the next fix or feature. This isn’t something I worried about when it was just me on my couch at night. Now I want folks that take the time to try my software to have what they expect… a great piece of software. I have big plans for Thrudb and even bigger plans for what I’m building with it.

Stress is one of those double edge swords where it’s motivating but also distracting.

Family, Health, Job, Startup and now Thrudb… Let the juggling continue.

[del.icio.us] [Digg] [dzone] [Google] [Mixx] [Reddit] [StumbleUpon]
Writen by jake

Thrudb: Not just me anymore

Sunday, January 6th, 2008

Thrudb has gotten a lot of attention recently (thanks Ilya).

I’m glad people are taking the time to install all of the dependencies and kick the tires. As a result I’ve been fielding a lot of question about Thrudb’s design and implementation. Please keep them coming!

What I didn’t realize was ex-amazon employee Ross McFarland, who architected their digital media content backend, was altering the thrudoc code base to create a similar yet, enhanced thrudoc which he had been thinking about for some time called diststore. When Ross contacted me he had already added a mysql backend and altered the api to add document collections.

Since then hes added a berkleydb backend and ported the thrudoc disk and s3 backends over.

I’m really impressed with the quality of Ross’s code and his clear vision for a document storage service. I’m happy to announce we’ve decided to join forces and will soon be releasing a substantial update to thrudoc incorporating Ross’s changes.

I’m glad to have his support and will soon be focusing my efforts on thrucene and throxy services.

Welcome Ross!

[del.icio.us] [Digg] [dzone] [Google] [Mixx] [Reddit] [StumbleUpon]
Writen by jake

Announcing Thruqueue: Persistant message queue for Thrudb

Friday, December 28th, 2007

I’ve just checked in a new Thrudb service that I’ve been working on for the past few days called Thruqueue. I’m sure you can guess by the name that it’s yet another message queue service. But this one has some great features that I think makes it stand out.

No hard limits - Create as many queues you like, send messages as large as you like, send as many messages as you like.

Persistant queues - Under the hood Thruqueue is exploiting Thrift’s powerful redo logging capabilities so queues are really managed logs, one log per queue. At specified intervals the logs are pruned to maintain disk space, this means the memory profile of thruqueue stays small since only a few items from each queue lives in memory at any given time.

Unique Queues - I’ve also added the ability to create unique queues which essentially means no duplicate messages can exist in the queue at once.

Fast! - I’ve done almost no performance optimization but my initial tests look very promising in 1 second I can write then read ~1200 small messages.

Thrift - Want a client in your favorite language? just run: thrift -favlanguage Thruqueue.thrift

Whats missing:

Replication - Do you really need this? I could hook this puppy up to spread but I’m not sure I see the benifit.

Redundancy - Throxy? TBD

I know that there are certainly a lot of message queues out there but all of then are either non-persistant, cost money, require an underlying rdbms, or cost money. Let me know what you think.

[del.icio.us] [Digg] [dzone] [Google] [Mixx] [Reddit] [StumbleUpon]
Writen by jake