Category Archives: Uncategorized

The Opportunity in RSS Overload

Dare Obasanjo has an interesting post, with a good comments thread, on overflowing feed readers. He’s quoting from a post by Farhad Manjoo on Slate:

You know that sinking feeling you get when you open your e-mail and discover hundreds of messages you need to respond to…

Well, actually Dare’s post is from two weeks ago. The reason I got to read it only now is exactly that…

Yes, I know I don’t really need to ‘respond’ to subscriptions, and the answer should be – unsubscribe, or go on a feeds (or ‘follow’ edges) social diet. But these binary decisions are not always optimal, as I have plenty of feeds I subscribed to after hitting one or two posts I really liked, but that were not on that author’s main subject (if such exists at all). Thus I have to skim through many un-interesting (for me!) posts, many of them somehow always end up discussing twitter. In fact, that’s how most of my feeds look like (including the twitter part).

We need shades of grey between subscribed and unsubscribed. It would be great to have a feed reader that learns from how you use it. It should be quite clear which posts interest me – ones I took time to read, scroll through, press a link etc. – and which did not. Now train a classifier on that data, preferably per-feed (in addition to a general one), and get some sense of what I’m really looking for.
Mark All As Read Day - flickr/sidereal

Now, I don’t need this smart reader to delete the uninteresting ones, let’s not assume too much on its classification accuracy. Just find the right UI to mark the predicted-to-be-interesting items (or even assign them into a special virtual folder). Then I can read these first, and only if/when have time – read the rest.

I assign this to be my pet project in case I win the lottery next week and go into early retirement. Alternatively, if someone saw this implemented anywhere – let me know!

Update: a related follow-up post on a new filtering product I started using.

Google Labs is now Google

Quick, name this search engine!

public-google-labs

No, not Kumo. That’s Google’s recent launch, trying to compete with Twitter search (“Recent results”), to preempt Microsoft (clustering result types), to show a different, though quite ugly UI metaphor (“wonder wheel”), and generally to roll out a whole bunch of features that should have been Google Labs features before making (or not) their way into a public product. So what’s next? buttons next to search results moving them up or down with no opt-out?? Ah, wait, that waste of real estate is already there.

Flash Gordon Gets the Drop on Arch-Enemy Ming the Mericiless - Flickr/pupleslog

Someone is panicking. OPEN FIRE! ALL WEAPONS!!! DISPATCH WAR ROCKET AJAX!!! The same spirit that brought us the failure of knols, is bringing us yet further unnecessary novelty, but this time it’s a cacophony of features, each deserving a long Google Labs quarantine by itself.

I noticed that much of my recent blog posts have to do with Google criticism :-) . I wrestle with that, there really ought to be more interesting stuff to blog about in the IR world, and there is also great stuff coming from Google (can you imagine the fantastic similar images feature is still in labs? can Google please apply this to the ridiculously useless “similar pages” link in main web search results??), but I truly think we see a trend. Google is dropping the ball, losing the clear and spotless logic we have seen in the past, and the sensible slow graduation of disruptive features from Google Labs. Sadly, though, it’s not clear if anyone is there, ready to pick that ball…

Google converts the converted

I love Google Chrome. It’s super fast, its default home page (showing most visited websites) and searching from the url box are  great, and the javascript experiments really knocked me out.

So Google must know this, as  Chrome does talk to the mothership quite often. Then why-oh-why, whenever Google embarks on a “Get Chrome” campaign and I happen to use IE (say for one of those sites that renders well only in IE), do they not spare us the converted? is it really that hard to put a flag on the Google uber-cookie that Chrome is already installed here?…

get-chrome

 

BTW – all you Firefox users are considered too sophisticated to buy it – this  promotion is not shown to FF users, only IE! :-)

Mechanical Hype, revisited

aardvarkAs I wrote previously, I really like the idea behind Aardvark (previously known as Mechanical Zoo) and it’s a great social Q&A tool, but it simply is notsocial search” (and unlike TechCrunch,  RWW realize that). The Aardvark team still pushes with that terminology, I guess for a good reason given the financial climate, and disperses more of it in a white paper. Once they actually start searching in their aggregated Q&A repository to provide you with an available answer without bothering your network – that would become more of a search solution, rather than Q&A.

Having played with the product a bit, I also see an inherent flaw in the social premise here. Aardvark provides me with answers from friends, or friends-of-friends. Now, it’s more likely I’ll get answers from friends-of-friends, as there are simply a lot more of them. However, these would be people who don’t know me, and will not provide a personal answer that is tailored to my own individual needs.

Still, it’s a great way to make new friends. Not kidding – Aardvark strongly drives conversations, as Danny Sullivan also pointed out, and since this friend-of-friend was the one who responded to my question, I’d feel more comfortable discussing further. Presumably Aardvark will also track this, and practically add this person to my direct social graph.

 

Update: Max Ventilla of Aardvark commented in my previous post that indexing your graph and finding the right person to answer your query has, in fact, the ingredients of social search. He has a point there, but still that search ends in finding a person, not information, so it’s more of a people search. Still, I agree that in executing this task, the varkers face similar difficulties to those we faced in Delver, albeir on much smaller scale.

Building Blocks of Creativity

Long ago I read an interesting book that tried to teach how to actually engineer creativity. One of the simple methods it proposed was – take an existing device, and strip it of a main characteristic. A TV set without a screen, for example. Not good for anything you say? well, if you’re a real soap opera freak, frustrated that they always run when you’re driving back from work, you could imagine installing this in your car and listening to your TV while driving, rather than watching…

So let’s take an iPhone and strip it of its… phone. What do you get besides an eye? Siftables. Got this shared from Oren:

To me, seeing this makes my mind immediately run to how my kids could use it. This is surely a creative Human-Computer Interface, but does that automatically make the applications creative? see the one with the kid injecting characters into a scene played on TV. That’s great, but it’s really limited to the scenarios programmed into this app: the sun can rise, the tractor enters, the dog says hello to the cat – ok, got it. Now what?

My kids and I actually have a non-Siftables version of this, where we took some game that includes plastic blocks with various images on them, and turned it into a storytelling game. Each player stacks a bunch of these blocks and tries to tell a continuous story by picking a block and fitting it into a story he’s improvising as he goes along. That’s a real creative challenge, and it is so because you have nobody to rely on but your own imagination.

Another example is the Lego themed sets, non-creativewhere there’s really just one way to assemble them right, and imagination is out of the equation. As an educational tool, standard plain old Lego blocks are far superior. The less rules and options, the more creatively challenged we are, and the more a Siftables app follows that principle, the more educational it may actually become.

In any case, Siftables are a great idea, and will surely be a great challenge to the creativity of programmers of Siftables apps…

Clustering Search (yet again)

Microsoft is rolling an internal test for a search experience upgrade on Live (codenamed Kumo) that clusters search results by aspect. See internal memo and screenshots covered by Kara Swisher.

As usual, the immediate reaction is – regardless of the actual change, how feasible is it to assume you could make users switch from their Google habit? but let’s try to put that aside and do look at the actual change.

Search results are grouped into clusters based on the aspects of this particular search query. This idea is far from being new, and was attempted in the past by both Vivisimo (at Clusty.com) and by Ask.com. One difference, though, is that Microsoft pushes the aspects further into the experience, by showing a long page of results with several top results from each aspect (similar to Google’s push with spelling mistakes).

At least judging by the (possibly engineered) sample results, the clustering works better than previous attempts. Most search engines take the “related queries” twist on this, while Kumo includes related queries as a separate widget:

kumo-comparisonClusty.com’s  resulting clusters, on the other hand, are far from useful for a serious searcher with enquire/purchase intent.

At least based on these screenshots, it seems like Microsoft succeeded in distilling interesting aspects better, while maintaining useful labels (e.g. “reviews”). Of course, it’s possible this is all done as a “toy”, limited example, e.g. using some predefined ontology. But together with other efforts, such as the “Cashback” push and the excellent product search (including reviews aggregation and sentiment analysis), it seems like Microsoft may be in the process of  positioning Live as the search engine for ecommerce. Surely a good niche to be in…

live-products1

We’re sorry… but we ran out of CAPTCHAs

Sometimes I want to check the exact number of pages indexed in Google for some query. You know how it goes – you enter a query, it says “Results 1 – 10 of about 2468 gazillions“, then when you page forward enough, the number goes slightly down to, say, 37 results. Trouble is, very quickly Google thinks I’m a bot and blocks me:

were-sorry

Now, it’s quite clear Google has to fight tons of spammers and SEO people who bomb them with automatic queries. But that’s what CAPTCHAs are for, isn’t it? well, for some reason Google often saves on them, and instead provides you with the excellent service of referral to CNET to get some antivirus software. Dumb.

The amazing part is that you can get this from a single, well-defined, world-peace-disrupting query, for allintitle:”design”. Booh!

In Authority We Trust (Not)

Product reviews are a great thing.

Fake reviews suck.

In the most recent example, an employee solicited paid reviews for his company’s products on Amazon’s Mechanical Turk – got to appreciate the progress.

How can you tell which reviews to trust? Trust is built out of relationship. You trust a site, a person, a brand, after your interactions accumulated enough positive history to earn that trust.  With review sites, you may learn to trust a specific site, but that still doesn’t mean you trust a specific reviewer. I usually try to look at the reviewer’s history, and to look for the “human” side of them – spelling mistakes, topic changes, findings flaws and not just praising. But naturally, the adversary here is also informed, and will try to imitate these aspects…

"Trust us, we're experts" by flickr/phaulyReview sites attempt to bestow trust of their own on their members, to assist us. Amazon uses badges, and encourages users to provide their real name, using a credit card as the identity proof. Midrag is an Israeli service provider ratings site I recently used, that attaches identity to a cellular phone, with a login token sent over SMS. But when you want to attract a large number of reviews, you want to allow unvalidated identities too. Epinions, for example, builds a “web of trust” model based on reviewers trusting or blocking other reviewers. But with Epinions (and similarly Amazon) keeping their trust calculation formula secret, how can users be convinced that this metric fits their needs?

In reality, my model of trust may be quite different from yours. Two Italian researchers published a paper in AAAI-05 titled “Controversial Users demand Local Trust Metrics“, where they experimented with Epinions’ data on the task of predicting users’  trust score, based on existing trust statements. Their findings show that for some users, trust is not an average quantity, but a very individual one, and therefore requires local methods.

Trust metrics can be classified into global and local ones (Massa & Avesani 2004; Ziegler & Lausen 2004). Local trust metrics take into account the subjective opinions of the active user when predicting the trust she places in unknown users. For this reason, the trust score of a certain user can be different when predicted from the point of view of different users. Instead, global trust metrics compute a trust score that approximates how much the community as a whole trusts a specific user.

Have you spotted a familiar pattern?… Just exchange “trust” with “relevance”, and the paragraph will all of a sudden describe authority-based search (PageRank) versus socially-connected search (Delver). Local metrics were found to be more effective for ranking controversial users, meaning users that are assigned individual trust scores that highly deviate from their average score. The search equivalent can be considered queries that are for subjective information, where opinions may vary and an authority score may not be the best choice for each individual searcher.

To read more about trust metrics, see here: trustlet.org

Why Search Innovation is Dead

We like to think of web search as quite a polished tool. I still find myself amazed at the ease with which difficult questions can be answered just by googling them. Is there really much to go from here?

"Hasn't Google solved search?" by asmythie/Flickr

Brynn Evans has a great post on why social search won’t topple Google anytime soon. In it, she shares some yet to be published results on difficulty in forming the query being a major cause for failed searches. That resonated well with some citations I’m collecting right now for my thesis (on concept-based information retrieval). It also reminded me of a post Marissa Mayer of Google wrote some months ago, titled “The Future of Search“.  One of the main items on that future of hers was natural language search, or as she put it:

This notion brings up yet another way that “modes” of search will change – voice and natural language search. You should be able to talk to a search engine in your voice. You should also be able to ask questions verbally or by typing them in as natural language expressions. You shouldn’t have to break everything down into keywords.

Mayer gives some examples to questions that were difficult to query or formulate by keywords. But clearly she has the question in her head, so why not just type it in? after all, Google does attempt to answer questions. Mayer (and Brynn too) mentions the lack of context as one reason. Some questions, if phrased naively, refer to the user’s location, activities or other context. It’s a reasonable, though somewhat exaggerated point.  Users aren’t really that naive or lazy, if instead of using search they’d call up a friend, they wouldn’t ask “can you tell me the name of that bird flying out there?”. The same info they would provide verbally, they can also provide to a natural-language search engine, if properly guided.

The more significant reason in my eyes revolves around habits. Search is usually a means, rather than a goal. So we don’t want to think where and how to search, we just want to type something quickly into that good old search box and fire it away. It’s no wonder that the search engine most bent on sending you away asap, has most loyal users coming back for more. That same engine even has a button, that hardly anyone uses, and supposedly costs them over 100M$ a year in revenues, that sends users away even faster.  So changing this habit is a tough task for any newcomers.

But these habits go deeper than that. Academic researchers have long studied natural-language search and concept-based search. A combination of effective keyword-based search, together with a more elaborate approach that kicks in when the query is a tough one, could have gained momentum, and some attempts were made for commercial products (most notable Ask, Vivisimo and Powerset). They all failed. Users are so used to the “exact keyword match” paradigm, the total control it provides them with, and its logic (together with its shortcomings) that a switch is nearly impossible, unless Google will drive such a change.

Until that happens, we’ll have to continue limiting innovations to small tweaks over the authorities…

Evaluating Search Engine Relevance

Web search engines must be the most useful tools the Web brought us. We can answer difficult questions in seconds, find obscure pieces of information and stop bothering about organizing data. You would expect that systems with such impact on our lives will be measured, evaluated and compared, so that we can make an informed decision on which one to choose. Nope, nothing there.

Some years ago, search engines competed in size. Danny Sullivan wrote angry pieces on that, and eventually they stopped, but still six months ago Cuil launched and made a fool of itself by boasting size again (BTW – Cuil is still alive, but my blog is not indexed, not much to boast about coverage there).

TRECNow, academic research on search (Information Retrieval, or IR in academic jargon) does have a very long and comprehensive tradition of relevance evaluation methodologies, TREC being the best example. IR systems are evaluated, analyzed, and compared across standard benchmarks, and TREC researchers carry out excellent research into the reliability and soundness of these benchmarks. So why isn’t this applied to evaluating web search engines?

One of the major problems is, yes, size. Much of the challenges TREC organizers are facing, is scaling the evaluation methods and measurements to web size scale. One serious obstacle was the evaluation measure itself. Most IR research uses Mean Average Precision (MAP), which proved to be a very reliable and useful measure, but it requires knowing stuff you just can’t know on the web, such as the total number of relevant documents for the evaluated query. Moreover, with no use case reasoning, there was no indication that it indeed measures true search user satisfaction.

Luckily, the latest volume of TOIS journal (Transactions on Information Systems) included a paper that could change that picture. Justin Zobel and Alistair Moffat, two Australian key figures in IR and IR evaluation, with Zobel a veteran of TREC methodology analysis, suggest a new measure called “Rank-Biased Precision” (RBP). In their words, the model goes as follows:

The user has no desire to examine every answer. Instead, our suggestion is that they progress from one document in the ranked list to the next with persistence (or probability) p, and, conversely, end their examination of the ranking at that point with probability 1− p… That is,we assume that the user always looks at the first document, looks at the second with probability p, at the third with probability p2, and at the ith with probability pi−1. Figure 3 shows this model as a state machine, where the labels on the edges represent the probability of changing state.

The user model assumed by rank-biased precision

They then go to show that the RBP measure,  derived from this user model, does not depend on any unknowns, behaves well with real life uncertainties (e.g. unjudged documents, queries with no relevant documents at all), and is comparable to previous measures in showing statistically significant differences between systems.

Eventually,  beyond presenting an interesting web search user model, RBP also eliminates one more obstacle to true comparison of search engine relevance. The sad reality, though, is that with Yahoo’s and Live’s current poor state of results relevance, such a comparison may not show us anything new, but an objective, visible measurement could at least provide incentive to measurable improvements on their account. Of course, then we’ll get to the other major issue, of what constitutes a relevant result…

Update: I gave a talk on RBP in my research group, slides are here.