Building Blocks of Creativity

Long ago I read an interesting book that tried to teach how to actually engineer creativity. One of the simple methods it proposed was – take an existing device, and strip it of a main characteristic. A TV set without a screen, for example. Not good for anything you say? well, if you’re a real soap opera freak, frustrated that they always run when you’re driving back from work, you could imagine installing this in your car and listening to your TV while driving, rather than watching…

So let’s take an iPhone and strip it of its… phone. What do you get besides an eye? Siftables. Got this shared from Oren:

To me, seeing this makes my mind immediately run to how my kids could use it. This is surely a creative Human-Computer Interface, but does that automatically make the applications creative? see the one with the kid injecting characters into a scene played on TV. That’s great, but it’s really limited to the scenarios programmed into this app: the sun can rise, the tractor enters, the dog says hello to the cat – ok, got it. Now what?

My kids and I actually have a non-Siftables version of this, where we took some game that includes plastic blocks with various images on them, and turned it into a storytelling game. Each player stacks a bunch of these blocks and tries to tell a continuous story by picking a block and fitting it into a story he’s improvising as he goes along. That’s a real creative challenge, and it is so because you have nobody to rely on but your own imagination.

Another example is the Lego themed sets, non-creativewhere there’s really just one way to assemble them right, and imagination is out of the equation. As an educational tool, standard plain old Lego blocks are far superior. The less rules and options, the more creatively challenged we are, and the more a Siftables app follows that principle, the more educational it may actually become.

In any case, Siftables are a great idea, and will surely be a great challenge to the creativity of programmers of Siftables apps…

Clustering Search (yet again)

Microsoft is rolling an internal test for a search experience upgrade on Live (codenamed Kumo) that clusters search results by aspect. See internal memo and screenshots covered by Kara Swisher.

As usual, the immediate reaction is – regardless of the actual change, how feasible is it to assume you could make users switch from their Google habit? but let’s try to put that aside and do look at the actual change.

Search results are grouped into clusters based on the aspects of this particular search query. This idea is far from being new, and was attempted in the past by both Vivisimo (at Clusty.com) and by Ask.com. One difference, though, is that Microsoft pushes the aspects further into the experience, by showing a long page of results with several top results from each aspect (similar to Google’s push with spelling mistakes).

At least judging by the (possibly engineered) sample results, the clustering works better than previous attempts. Most search engines take the “related queries” twist on this, while Kumo includes related queries as a separate widget:

kumo-comparisonClusty.com’s  resulting clusters, on the other hand, are far from useful for a serious searcher with enquire/purchase intent.

At least based on these screenshots, it seems like Microsoft succeeded in distilling interesting aspects better, while maintaining useful labels (e.g. “reviews”). Of course, it’s possible this is all done as a “toy”, limited example, e.g. using some predefined ontology. But together with other efforts, such as the “Cashback” push and the excellent product search (including reviews aggregation and sentiment analysis), it seems like Microsoft may be in the process of  positioning Live as the search engine for ecommerce. Surely a good niche to be in…

live-products1

We’re sorry… but we ran out of CAPTCHAs

Sometimes I want to check the exact number of pages indexed in Google for some query. You know how it goes – you enter a query, it says “Results 1 – 10 of about 2468 gazillions“, then when you page forward enough, the number goes slightly down to, say, 37 results. Trouble is, very quickly Google thinks I’m a bot and blocks me:

were-sorry

Now, it’s quite clear Google has to fight tons of spammers and SEO people who bomb them with automatic queries. But that’s what CAPTCHAs are for, isn’t it? well, for some reason Google often saves on them, and instead provides you with the excellent service of referral to CNET to get some antivirus software. Dumb.

The amazing part is that you can get this from a single, well-defined, world-peace-disrupting query, for allintitle:”design”. Booh!

In Authority We Trust (Not)

Product reviews are a great thing.

Fake reviews suck.

In the most recent example, an employee solicited paid reviews for his company’s products on Amazon’s Mechanical Turk – got to appreciate the progress.

How can you tell which reviews to trust? Trust is built out of relationship. You trust a site, a person, a brand, after your interactions accumulated enough positive history to earn that trust.  With review sites, you may learn to trust a specific site, but that still doesn’t mean you trust a specific reviewer. I usually try to look at the reviewer’s history, and to look for the “human” side of them – spelling mistakes, topic changes, findings flaws and not just praising. But naturally, the adversary here is also informed, and will try to imitate these aspects…

"Trust us, we're experts" by flickr/phaulyReview sites attempt to bestow trust of their own on their members, to assist us. Amazon uses badges, and encourages users to provide their real name, using a credit card as the identity proof. Midrag is an Israeli service provider ratings site I recently used, that attaches identity to a cellular phone, with a login token sent over SMS. But when you want to attract a large number of reviews, you want to allow unvalidated identities too. Epinions, for example, builds a “web of trust” model based on reviewers trusting or blocking other reviewers. But with Epinions (and similarly Amazon) keeping their trust calculation formula secret, how can users be convinced that this metric fits their needs?

In reality, my model of trust may be quite different from yours. Two Italian researchers published a paper in AAAI-05 titled “Controversial Users demand Local Trust Metrics“, where they experimented with Epinions’ data on the task of predicting users’  trust score, based on existing trust statements. Their findings show that for some users, trust is not an average quantity, but a very individual one, and therefore requires local methods.

Trust metrics can be classified into global and local ones (Massa & Avesani 2004; Ziegler & Lausen 2004). Local trust metrics take into account the subjective opinions of the active user when predicting the trust she places in unknown users. For this reason, the trust score of a certain user can be different when predicted from the point of view of different users. Instead, global trust metrics compute a trust score that approximates how much the community as a whole trusts a specific user.

Have you spotted a familiar pattern?… Just exchange “trust” with “relevance”, and the paragraph will all of a sudden describe authority-based search (PageRank) versus socially-connected search (Delver). Local metrics were found to be more effective for ranking controversial users, meaning users that are assigned individual trust scores that highly deviate from their average score. The search equivalent can be considered queries that are for subjective information, where opinions may vary and an authority score may not be the best choice for each individual searcher.

To read more about trust metrics, see here: trustlet.org

Why Search Innovation is Dead

We like to think of web search as quite a polished tool. I still find myself amazed at the ease with which difficult questions can be answered just by googling them. Is there really much to go from here?

"Hasn't Google solved search?" by asmythie/Flickr

Brynn Evans has a great post on why social search won’t topple Google anytime soon. In it, she shares some yet to be published results on difficulty in forming the query being a major cause for failed searches. That resonated well with some citations I’m collecting right now for my thesis (on concept-based information retrieval). It also reminded me of a post Marissa Mayer of Google wrote some months ago, titled “The Future of Search“.  One of the main items on that future of hers was natural language search, or as she put it:

This notion brings up yet another way that “modes” of search will change – voice and natural language search. You should be able to talk to a search engine in your voice. You should also be able to ask questions verbally or by typing them in as natural language expressions. You shouldn’t have to break everything down into keywords.

Mayer gives some examples to questions that were difficult to query or formulate by keywords. But clearly she has the question in her head, so why not just type it in? after all, Google does attempt to answer questions. Mayer (and Brynn too) mentions the lack of context as one reason. Some questions, if phrased naively, refer to the user’s location, activities or other context. It’s a reasonable, though somewhat exaggerated point.  Users aren’t really that naive or lazy, if instead of using search they’d call up a friend, they wouldn’t ask “can you tell me the name of that bird flying out there?”. The same info they would provide verbally, they can also provide to a natural-language search engine, if properly guided.

The more significant reason in my eyes revolves around habits. Search is usually a means, rather than a goal. So we don’t want to think where and how to search, we just want to type something quickly into that good old search box and fire it away. It’s no wonder that the search engine most bent on sending you away asap, has most loyal users coming back for more. That same engine even has a button, that hardly anyone uses, and supposedly costs them over 100M$ a year in revenues, that sends users away even faster.  So changing this habit is a tough task for any newcomers.

But these habits go deeper than that. Academic researchers have long studied natural-language search and concept-based search. A combination of effective keyword-based search, together with a more elaborate approach that kicks in when the query is a tough one, could have gained momentum, and some attempts were made for commercial products (most notable Ask, Vivisimo and Powerset). They all failed. Users are so used to the “exact keyword match” paradigm, the total control it provides them with, and its logic (together with its shortcomings) that a switch is nearly impossible, unless Google will drive such a change.

Until that happens, we’ll have to continue limiting innovations to small tweaks over the authorities…

Evaluating Search Engine Relevance

Web search engines must be the most useful tools the Web brought us. We can answer difficult questions in seconds, find obscure pieces of information and stop bothering about organizing data. You would expect that systems with such impact on our lives will be measured, evaluated and compared, so that we can make an informed decision on which one to choose. Nope, nothing there.

Some years ago, search engines competed in size. Danny Sullivan wrote angry pieces on that, and eventually they stopped, but still six months ago Cuil launched and made a fool of itself by boasting size again (BTW – Cuil is still alive, but my blog is not indexed, not much to boast about coverage there).

TRECNow, academic research on search (Information Retrieval, or IR in academic jargon) does have a very long and comprehensive tradition of relevance evaluation methodologies, TREC being the best example. IR systems are evaluated, analyzed, and compared across standard benchmarks, and TREC researchers carry out excellent research into the reliability and soundness of these benchmarks. So why isn’t this applied to evaluating web search engines?

One of the major problems is, yes, size. Much of the challenges TREC organizers are facing, is scaling the evaluation methods and measurements to web size scale. One serious obstacle was the evaluation measure itself. Most IR research uses Mean Average Precision (MAP), which proved to be a very reliable and useful measure, but it requires knowing stuff you just can’t know on the web, such as the total number of relevant documents for the evaluated query. Moreover, with no use case reasoning, there was no indication that it indeed measures true search user satisfaction.

Luckily, the latest volume of TOIS journal (Transactions on Information Systems) included a paper that could change that picture. Justin Zobel and Alistair Moffat, two Australian key figures in IR and IR evaluation, with Zobel a veteran of TREC methodology analysis, suggest a new measure called “Rank-Biased Precision” (RBP). In their words, the model goes as follows:

The user has no desire to examine every answer. Instead, our suggestion is that they progress from one document in the ranked list to the next with persistence (or probability) p, and, conversely, end their examination of the ranking at that point with probability 1− p… That is,we assume that the user always looks at the first document, looks at the second with probability p, at the third with probability p2, and at the ith with probability pi−1. Figure 3 shows this model as a state machine, where the labels on the edges represent the probability of changing state.

The user model assumed by rank-biased precision

They then go to show that the RBP measure,  derived from this user model, does not depend on any unknowns, behaves well with real life uncertainties (e.g. unjudged documents, queries with no relevant documents at all), and is comparable to previous measures in showing statistically significant differences between systems.

Eventually,  beyond presenting an interesting web search user model, RBP also eliminates one more obstacle to true comparison of search engine relevance. The sad reality, though, is that with Yahoo’s and Live’s current poor state of results relevance, such a comparison may not show us anything new, but an objective, visible measurement could at least provide incentive to measurable improvements on their account. Of course, then we’ll get to the other major issue, of what constitutes a relevant result…

Update: I gave a talk on RBP in my research group, slides are here.

Tantek Çelik’s True Identity Revealed!

This morning I came across a nice little people search demo by Martin Atkins. It’s mainly a wrapper over Social Graph API, but helps illustrate the public social graph. Now, Tantek Çelik is one of the main advocates for Microformats, which in turn generate a lot of the XFN data that feeds SGAPI. So it was quite a surprise to feed his name and see this: 

tantek-search

The horror! the horror!!

What happened here? a quick check on SGAPI led to some strange findings. Turns out that Robert Scoble’s old blog at scoble.weblogs.com is listed as strongly conncted to Tantek’s blog identity. I then went on to check out that blog – no XFN or FOAF to Tantek there. So where did that come from?

A more elaborate dive into SGAPI’s more detailed output showed that Scoble was listed as referencing Tantek with both XFN attributes of “me” and “met“. In plain English, this means that Robert Scoble said “I am Tantek Çelik, plus I also met him in person!”. So what could cause this, except for some serious case of schizophrenia?

My humble guess is that Scoble, some time ago, listed Tantek as a “met” contact on his old blog, but with a magnificient little typo, left out the ‘t’.  He then discovered the mistake and fixed it. But the Googlebot caught both cases, and added them both as relations. Now why would they do that? shouldn’t new data replace old data? well, that’s what other users of SGAPI are asking, see the discussion over at the group. Turns out the SGAPI data is not yet as timely as the main index, and Brad Fitzpatrick promises this will improve soon enough.

Considering the upcoming social diet, it better will…

Update: Hadar pointed out another example, where Chris Messina gets identified with TechCrunch UK… it’s indeed reflected in SGAPI, and I tracked this down to erroneous XFN tagging in an obscure 2006 TCUK post. Indeed demonstrates a weakness of the unmoderated, inherently decentralized XFN-based graph building. Still, for now it’s the only open standard we have, until some higher, post-processing open layer will emerge.

Did Facebook just drop Live Search… again?

Exactly 3 months ago Facebook and Microsoft announced live search’s integration in Facebook. The search functionality was up, down, then up again.

Today, it doesn’t seem to be available anymore, the web tab is simply gone.

facebook-search2

 

There doesn’t seem to be any buzz about it so far, is it just a temporary or local glitch?… 

Update: OK, note to self – when they sayNow Facebook users in the U.S. have the option to “Search Facebook” or “Search the Web.”“, they probably mean it. Oh well. It is strange, though, that 3 months after integration, Live search is still not rolled out in Facebook’s main growth segment, which is outside the US. Surely not a technical difficulty.

New Year’s Resolution: Social Diet

‘Tis the season for predictions (and Schadenfreude over last year’s).

One of the most popular predictions for the social web seems to be a diet.
MY DIET COKE (flickr/wools)One talks about “Social graph shrinkage“, another about “Social Media Indigestion” (both taken from Peter Kim‘s collection of Social Media Predictions 2009), and ReadWriteWeb adds “Friend List Sanitizers” into the whirlwind of diet buzz.

The reason I see sense in this prediction is one – Facebook Connect. So far, we knew what to expect as a result of too many Facebook friends. There was a certain volume of activity stream, and we managed to live with it. With significant adoption of Facebook Connect (which is the main if here), we’ll soon start seeing many external activities being pushed into the stream – comments, locations, recommendations, purchases – and this wave of added content (and clutter) will then result in removal of the noisy and unwanted sources, just like any email marketing campaign brings with it a major bunch of unsubscribes.

im in ur computerz wit 5,000 faceb00k frenz!!!!!! (flickr/debs) I doubt we’ll see any social graph shrinkage any time soon, there are so many new profiles generated every second that this will by far offset any of these filtering (mainly by long time users). But we’ll probably start seeing a major wave of edge removal, which was not at all common so far.

Facebook Connect is definitely an excellent move by Facebook to continue dominating and de-facto owning the social graph, with marketing agencies the first to realize and point out the value beyond single-sign-on convenience. With no open alternative that offers the same value, this trend will only accelerate, unless the new OpenID Foundation board members start moving from enabler technologies into active push of equivalent value proposition.

IBM IR Seminar Highlights (part 2)

The seminar’s third highlight for me (in addition to IBM’s social software and Mor’s talk), was the keynote speech by Human-Computer Interaction (HCI) veteran Professor Ben Schneiderman of UMD. Ben’s presentation was quite an experience, but not in a sophisticated Lessig way (which Dick Hardt adopted so well for identity 2.0), rather by sheer amounts of positive energy and passion streaming out of this 60-year-old.

[Warning – this post turned out longer and heavier than I thought…]

Ben Shneiderman in front of Usenet Treemap - flickr/Marc_SmithBen is one of the founding fathers of HCI, and the main part of his talk focused on how visualization tools can serve as human analysis enhancers, just like the web as a tool enhances our information.

He presented tools such as ManyEyes (IBM’s),  SpotFire (which was his own hitech exit), TreeMap (with many examples of trend and outlier spotting using it) and others. The main point was in what the human eye can do using those tools, that no predefined automated analysis can, especially in fields such as Genomics and Finance.

Then the issue moved to how to put such an approach to work in Search, which like those tools, is also a power multiplier for humans. Ben described today’s search technology as adequate mainly in “known item finding”. The more difficult tasks that can’t be answered well in today’s search, are usually for a task that is not “one-minute job”, such as:

  • Comprehensive search (e.g. Legal or Patent search)
  • Proving negation (Patent search)
  • Finding exceptions (outliers)
  • Finding bridges (connecting two subsets)

The clusters of current and suggested strategies to address such tasks are:

  • Enriching query formulation – non-textual, structured queries, results preview, limiting of result type…
  • Expanding result management – better snippets, clustering, visualization, summarization…
  • Enabling long-term effort – saving/bookmarking, annotation, notebooking/history-keeping, comparing…
  • Enhancing collaboration – sharing, publishing, commenting, blogging, feedback to search provider…

So far, pretty standard HCI ideas, but then Ben started taking this into the second part of the talk. A lot of the experimentation employed in these efforts by web players has built an entire methodology, that is quite different from established research paradigms. Controlled usability tests in the labs are no longer the tool of choice, rather A/B testing on user masses with careful choice of system changes. This is how Google/Yahoo/Live modify their ranking algorithms, how Amazon/NetFlix recommend products, how the Wikipedia collective “decides” on article content.

This is where the term “Science 2.0” pops up. Ben’s thesis is that some of society’s great challenges today have more to learn from Computer Science, rather than traditional Social Science. “On-site” and “interventionist” approaches should take over controlled laboratory approaches when dealing with large social challenges such as security, emergency, health and others. You (government? NGOs? web communities?) could make actual careful changes to how specific social systems work, in real life,  then measure the impact, and repeat.

This may indeed sound like a lot of fluff, as some think, but the collaboration and decentralization demonstrated on the web can be put to real life uses. One example on HCIL is the 911.gov project for emergency response, as emergency is a classic case when centralized systems collapse. Decentralizing the report and response circles can leverage the power of the masses also beyond the twitter journalism effect.