Towards an even more open API

Since we first released the Lingr API, by far the most frequent request that we've had was to allow embedding a chatroom in a web page.  Some of our developers have actually created such things, but they all suffered from one drawback- because our API was fully REST compliant, it wasn't possible to do many interesting things purely from Javascript.  The reason for this is that the REST convention is that any operation that changes state on the server should be an HTTP POST operation.  That makes perfect sense, however, it kills any possibility of a purely Javascript/JSON implementation, because, while Javascript can certainly perform POST operations, it cannot do so asynchronously, nor can it retrieve the results of a POST (well, while this isn't technically true, it is practically true).  The solution that most developers have taken is to proxy the POST calls through their own servers.  This works fine, but, it seems like an unnecessary complication.

We were faced with a conundrum- on the one hand, we want to be REST compliant, because we're engineers and every engineer loves a nice, cut-and-dry specification.  On the other hand, we want to allow pure Javascript clients to access our full API, because, as we've learned, the most interested Lingr API clients don't originate from the lingr.com domain :-)

So today, I'm happy to announce that we've opened up all of our API methods to HTTP GET operations.  Even though the documentation for a particular method may still state that POST is required, in fact, on the backend, we've disabled HTTP method checking for API calls.

In addition, we've published lingr.js, a full read-only Lingr chatroom client in Javascript.  While it is missing some obvious functionality (the ability to say something, for example), it is a great starting point to embedding a chatroom in your web pages.  You can find documentation on lingr.js over at the Lingr Developer Wiki.  We'll continue to add new functionality to lingr.js- this is just the beginning.

If you use lingr.js, please let us know, and add your creation to the Showcase.  We're eager to see how people take this ball and run with it.

- Danny

Update: we have added say capability to lingr.js :-)

A new plugin

As promised over at Ruby Forum, I am proud to say that we have released our multilingual Ferret analyzer as open source.  It's available now at out public subversion repository, packaged as a Rails plugin.  Enjoy and do let us know if you find it useful!

- Danny

We heart ferrets

We recently added full archive search to Lingr, and I thought I'd take a moment to talk about the technical details of that, for those who are interested.

At Lingr, everything said in the chatrooms is saved into our database.  This enables you to browse the archives of a room to recall some recent conversation, or to find out what someone else said about something.  So, we've got around three million user utterances sitting in our database, and, we thought, why not unlock those and let people search them?

Our first thought was to just use MySQL's fulltext indexing system.  But, as it turns out, fulltext indexing only works on MyISAM tables, and our utterance table is InnoDB, so, that was out the window.

So we started looking for a text indexing system that could work for us.  What we found was Ferret, a ruby port of Apache Lucene.  Combined with the excellent acts-as-ferret (AAF) plugin for ActiveRecord, we were able to integrate Ferret/AAF into Lingr in about two weeks. 

The one issue that complicated matters the most is that Lingr hosts conversations in many different languages (English, Japanese, Farsi, etc.).  This presents a unique challenge in terms of tokenizing the utterances before they are indexed.  Ferret provides a very nice tokenizer for most languages based on the Latin alphabet, but other languages such as Japanese which do not delimit tokens by whitespace are problematic.  Also consider the fact that it is quite common to have a single utterance that mixes languages (I guess you can thank the global ubiquity of English for that).  And to put icing on this cake, for a given utterance, we have no idea which language (or languages) it is in- all we have are Unicode codepoints.

So, out of the two weeks required for integration, much of that time was spent writing and tuning our own tokenizer.  Our tokenizer basically spots transitions between Latin text and non-Latin text (based on codepoint value), then applies Ferret's existing Latin tokenizer to the Latin parts, and a simple per-character tokenizer to the non-Latin parts.  Because Ferret uses the same tokenizer when indexing an utterance as it uses when searching for an utterance, this means that your search terms can contain a mix of Latin and non-Latin "words", and we should handle that just fine.

For the metrics-obsessed among you, our Ferret index currently consists of approximately 3 million "documents" (user utterances).  The on-disk size of this index is currently 909 megabytes.  The index is updated once per minute, via a cron job, so there is some short period after an utterance is spoken when it is not indexed (maximum one minute).

Finally, I'd like to thank Jens Kraemer and everyone over at the Ferret Forum for their help and advice.  If you are thinking about using Ferret, you can get some great information there.  It helped us tremendously.

If you have any other questions about the implementation, feel free to ask them here in comments, or through our Feedback form.

Cheers,

Danny

DOS? Or just a mistake?

This morning, our web servers started receiving hundreds of requests per second from two separate IP addresses- 71.39.13.57 and 65.102.12.225.  The cumulative effect of these requests was to degrade service to other users, so, I have blocked those IPs at our firewall.

If these requests came from you, and this was unintentional, please let us know, and we'll remove the blocks once you fix your software.  Otherwise, the IP blocks will remain in place.

- Danny

Going to ETech

The Lingr team (well, all of us except for Satoshi) will be attending ETech next week.  If you'll be there too, please let us know and let's get together!

I have also organized an informal Birds of a Feather session about comet- I'd love to meet any of your interested in the topic and learn how to make Lingr even better.

Lastly, we've opened up an Etech room for discussion.  I'll be live-blogging in that room during the conference, so, you might check that out too.

- Danny

Brain Upgrade!

We will be upgrading our database server tonight- this will result in some downtime, but, if everything goes as planned, the total downtime should be only 15 minutes or so.  During the downtime, you'll see a beautiful maintenance page (thanks Chris!).

- Danny

The Lingr API is born

Today we are proud to announce a new release of Lingr that includes the new Lingr API.  The Lingr API is a simple, HTTP-based REST protocol that enables anyone to interact with, extend, and mashup Lingr in whatever wacky way they want.

You could use the  Lingr API to write an application to monitor activity in chatrooms and notify you when your friends are chatting (whoops, we already did that), to create a chat-bot that automatically responds to other chatters, or anything else you can think of.  Really, the possibilities are endless!

In conjunction with the API's release, we have also established the Lingr Developer Wiki.  There you'll find full documentation on the API, along with tutorials, sample code, and more.  Being a wiki, we  hope that Lingr API developers will add their own content and create a community where people can find interesting new Lingr API applications, as well as get help in writing their own.

Finally, we're also publishing a Ruby Lingr API toolkit in our public subversion repository, along with some sample Javascript code demonstrating how to use the Lingr API from within a web page.  Using the Ruby Lingr API toolkit, a Ruby programmer can be up and talking to our API in just a few minutes time.

We hope you enjoy the Lingr API and we're looking forward to many unexpected and wonderful applications to grow up around it!

- Danny

Scheduled Downtime

We will be deploying a major release tonight, with some really exciting new features (Lingr API, anyone?).  While our deployments normally take only a few minutes, this one might take a bit longer due to its complexity. 

For that reason, please expect Lingr to be down from 8pm to 9pm PST tonight (04:00 to 05:00 UTC).  We hope the deployment will not take nearly that long, but, we like to be overly cautious with our estimates.

UPDATE

We have delayed the release until 10pm-11pm PST tonight (06:00 to 07:00 UTC).

Our first downtime, sort of

Yesterday, certain pages on Lingr were returning "Page Not Found" errors for a period of about four hours.

Chatting was working fine during this period, so, if you entered a room directly from your bookmarks, or from a link on another site, you wouldn't even have noticed the problem.  You wouldn't have been able to visit some pages like the homepage or the hot rooms page, however.

We apologize for this unscheduled downtime!  We have found and corrected the problem, so, downtime due to this specific issue shouldn't occur again.

We're proud of our uptime record so far, and continuously strive to improve the reliability of our site!

- Danny

Finally- giving back

As I mentioned previously, we are greatly indebted to many people who have contributed to such great tools as Ruby on Rails, Jetty, and others that power Lingr.

Today, it's our turn to start giving back.  We are happy to release our first Rails plugin, versioned_urls.

This plugin makes it easy to improve the cache-efficiency of a Rails website, completely eliminating repeat HTTP requests for things like javascript files and stylesheets until they actually change.  This can make a very noticeable improvement in the way a site feels to the user.

For complete details, see the post on my personal blog

While this represents Lingr's first contribution back to the Rails community, we dont' plan for it to be the last :-)

Cheers-

Danny