Category Archives: Uncategorized

Web(MD) 2.0

Just when I thought that the uses for recommendation systems were already exhausted…

CureTogether is a site that lets you enter your medical conditions (strictly anonymous, only aggregated data are public), and get recommended for… other “co-morbid” conditions you may have. In other words, “people who have your disease usually also have that one too, perhaps you have it too?

Beyond the obvious jokes, this truly has potential. You don’t only get “recommended” for conditions, but rather also for treatments and causes. We all know that sometimes we have our own personal treatment that works only for us. What if it works for people in our profile, and sharing that profile, anonymously, will help similar people as well? so far this direction is not explicit enough in how the site works, possibly for lack of sufficient data, but you can infer it as you go through the questionnaires.

The data mining aspect of having a resource such as CureTogether’s database is naturally extremely valuable. CureTogether’s founders share some of their findings on their blog. The power of applying computer science analytics and experimentation methodologies – sharpened by web-derived needs – to social sciences and others, reminded me of Ben Schneiderman’s talk on “Science 2.0. The idea that computer science can contribute methodologies that stretch beyond the confines of computing machines is a mind-boggling one, at least for me.

But would you trust collaborative filtering with your health? it’s no wonder that the main popular conditions on the site are far from life threatening, and the popular ones are such with unclear causes and treatments, such as migraines, back pains and allergies. Still, the benefit on these alone will probably be sufficient for most users to justify signing up.

Facebook account is down. Is the Internet down?

My Facebook account was “temporarily unavailable due to site maintenance” today.

Seems like I’m far from the first person this happened to. It’s common enough to make it into Facebook’s FAQ.

So – no big deal, right? just had to wait a while with uploading photos from today’s trip with the kids, a little annoyance, nothing more. Then I wanted to check in the meantime what’s up on another site. Guess what I used as a login there? yep, Facebook connect. No login for you!

Facebook may be getting away with it for now, as it seems like these “maintenance” downtimes didn’t create negative buzz for Facebook connect’s position as the identity of choice for many avid FB users. But watch as more of these incidents start raising awareness to the implication of relying on Facebook as an identity provider. We’ll then realize it’s another point of failure on our way to our favorite sites, and one with no simple workaround.

Truth be told, this is not a Facebook issue, it’s an issue for centralized identity providers. If WordPress.com were down, my OpenID identity would be down just the same. With unified identity comes a unified point of failure…

Yahoo Gives Up on Social Search

In an interview that strangely made headlines only in Indian tech blogs, Yahoo Research Labs’ Chief Prabhakar Raghavan declared that Yahoo will not replace its search with Bing. OK, the Yahoo-Microsoft deal is not really off, but the deal details turn out to imply that Yahoo will only use Microsoft search technology as the backend, and keep building its own smart front-end to it that will make use of Yahoo’s content assets. Raghavan says:

“Yahoo will not use Bing. Bing is a branded search engine that Microsoft is building on top of its search back-end and we will build our own search front-end on that same Microsoft back-end. It (using Bing) is not the case, at least as envisioned at the moment”

This actually makes perfect sense. Stop spending tons of resources on crawling and ranking in a futile war with Google, and focus on building the user experience over it, leveraging Yahoo’s advantage – content. Raghavan mentions scenarios that sound a lot like Yahoo shortcuts (that’s really old news) as one example of how to deliver a more complete experience over commodity search results.

The article then goes on to discuss the second focus for Yahoo, social applications, and mentions Microsoft’s tie-up with Facebook for access to social graph. Raghavan is quoted as saying:

“Social networks are not just a place to hang out, but to get things done. It predates the web.. I’m not sure where the sweet spot is, we’re still doing research on it”

Also makes perfect sense. With Google as a common enemy, and Microsoft a Facebook partner, Yahoo may be better positioned to deliver social applications that leverage the de-facto standard of Facebook graph, rather than push its own failed networks.

So why is my post title suggesting what it’s suggesting??


There is one catch in sub-contracting your search results: you are now limited with what you can do in search ranking. The best you can do is re-rank the set of results Microsoft’s technology supplied you with before presenting it to the user. As I’ve pointed out in the past when talking about Delver’s technology, social (graph-based) search is a game that cannot be played by reranking, since it’s a classic long tail problem. So when you can’t interfere with how search results are ranked, you also can’t deliver true social search, as Google recently did. One less social application Yahoo can build…

Searching for Faceted Search

Just finished reading Daniel Tunkelang’s recently published book on Faceted Search. I read Daniel’s blog (“The Noisy Channel“) regularly, and enjoy his good mix of IR practice with emphasis on Human-Computer Interaction (HCI). With faceted search tasks on the roadmap at work, I wanted to better educate myself on the topic, and this one looked like a good read, with the cover promising:

“… a self-contained treatment of the topic, with an extensive bibliography for those who would like to pursue particular aspects in more depth”

With 70 pages, the book reads quickly and smoothly. Daniel provides a fascinating intro to faceted search, from early taxonomies, to facets, to faceted navigation and on to faceted search. He adds an introductory chapter on IR, which is a worthwhile read even for IR professionals with some interesting insights. One is how ranked retrieval that we all grew so accustomed of, blurred the once clear border of relevant vs. non-relevant that set retrieval enforced. Daniel suggests that this issue is significant for faceted search, being a set-retrieval oriented task, and a pingback on his blog led me to a fascinating elaboration on this pain in another fine search blog (recommended read!).

With such elaborate introductory chapters and more on faceted search history, not much is left though for the actual chapters on research and practice, and as a reader I felt there could be a lot more there. But then, it is reasonable to leave a lot to the reader and just give a taste of the challenges, to be later explored by the curious reader from the bibliography.

However, that promise for extensive bibliography somewhat disappointed me… With 119 references, and only about a quarter being academic publications from the past 5 years, I felt a bit back to square one. I was hoping for more of a literature survey and pointers when discussing the techniques for those tough issues, such as how to choose the most informational facets for a given query or how to extract facets from unstructured fields. Daniel provide some useful tips on those, but reading more on these topics will require doing my own literature scan.

In any case, for a newcomer with little background in search in general and faceted in particular, this book is an excellent introduction. Those more versed with classic IR moving into faceted search, will find the book an interesting read but probably not sufficient as a full reference.

The (Filtered) Web is Your Feed

A few months ago I was complaining here about my rss overload. A commenter suggested that I take a look at my6sense, a browser extension (now also iPhone app) that acts as a smart RSS reader, emphasizing the entries you should be reading. I wanted to give my6sense a go then, but the technical experience was lousy, and moreover – I was expected to migrate my rss reading to it. Too much of a hassle, I gave it up.

In the past few weeks I’ve been test-driving a new player – Genieo, which takes the basic my6sense idea a few steps further. Genieo installs an actual application, not just extension, that plugs into your browser. It tracks your rss feeds automatically, simply by looking for rss feeds in the pages you’re browsing, and learns your feeds without any setup work.

Genieo then goes further to discover feeds on pages you visit even if you’re not subscribed to them, turning your entire browsing history into one big rss feed.  It finally filters this massive pool of content using a semantic profile it builds for your interests, based on analyzing the text you’ve read so far.

For IR people this may sound a lot like Watson, Jay Budzik’s academic project turned contextual search turned an advertising technology acquisition. Watson approached this problem as a search problem: how would I formulate search queries that would run in the background, fetching me the most relevant documents that match the user’s current context? problem is, users are not constantly searching, and would get quickly annoyed by showing general search results when not asked for.

The good thing about an rss feed is that it explicitly says “this is a list of content items to be consumed from this source“, and its temporal nature provides a natural preference ranking (prefer recent items), so a heuristic of “users would be interested in recent and relevant items from feeds in pages they visited” works around the general search difficulty pretty well. Genieo circumvents the expected privacy outcry by running the entire logic on the client side, nothing of the analyzed data leaves your PC (privacy warriors would probably run sniffers to validate that).

In my personal experience, the quality of most results is excellent, and they are almost always posts that would interest me. Genieo quickly picked up my feed subscriptions from clicks I made in my reader to the full article in a browser window (from which it extracted the rss feed), and after a while I could see it gradually picking up on my favorite memes (search, social and others). I did not give up my rss reader for Genieo yet, and I also still have many little annoyances with it, but overall for an initial version, it works surprisingly well.

However, the target audience that is even more suited for Genieo is the not rss-savvy users like me, but the masses out there who don’t know and don’t care about reading feeds. They just want interesting news, and they don’t mind missing on the full list (a-la Dave Winer’s “River of News” concept). Such users will find tools like Genieo as useful as a personal news valet can be.

What is Facebook’s Endgame with Open Graph API?

On Thursday, Facebook outlined some of its platform roadmap plans for developers. One of the items on the long list was called the “Open Graph API“, and with such a name it was sure to raise some interest.

Details were scarce, but the general message coming out of Facebook is that the Open Graph API will allow any site to embed a Facebook page in it, allowing the site owner to set status messages, share links etc., without visiting Facebook itself, and more importantly without sending its visitors to Facebook.

That sounded like a feature aimed primarily at brands, or as Ethan Beard of Facebook presented it: “This will be good for brands like Coke.” Makes perfect sense, as these brands are already using Facebook as part of their social media efforts, but would prefer to have it done on their site rather than on Facebook itself.

Thinking deeper into where Facebook is heading, though, I would think there is a more major endgame to all this. We already know that Facebook wants us to consider it as our online identity. So it allows you to reuse that Facebook identity on other websites and sign in using Facebook Connect. That’s one side of the coin. And then the other side of it is, you have your own website or blog where you may publish thoughts, links and photos that you didn’t publish on Facebook. Facebook would clearly want to bridge that gap as well.

belongs-to-us

Half a year ago, Facebook adopted the emerging Activity Streams standard for publishing and consuming an individual’s lifestream events to lifestreaming frameworks, a standard promoted by open standards evangelist Chris Messina. So that fits in nicely into the puzzle now: wouldn’t it be nicer if you could publish all this non-Facebook activity into your Facebook’s page, which will now be embedded into your personal website, courtesy of the Open Graph API?

The API then is just the funnel through which your activity stream is published back into Facebook. You get to leverage the social graph you already defined and came to like on Facebook, and Facebook gets tighter integration with your life outside of Facebook, if you still had any. Smart move for Facebook.

Google Nails Down Social Search

Google’s Social Search is doing the walk, all the rest are just doing the talk. As soon as I activated the Social Search experiment, my next search yielded a social result. No setting up, showing how I am connected to that result (including friends of friends), showing as part of the standard web results…

google-social-searchContrast this with Microsoft’s poor attempt at “social search” by indexing tweets and status messages and showing them regardless of the actual searcher (example search, you’ve got to be on “United States” locale on bing to see it).

Then also contrast it with Facebook’s announcement back in August of its implementation of searching within friends’ posts – a less grandiose announcement that yet delivered far more social experience than Bing’s. Nevertheless, it’s a very limited experience and far from being a true information source for any serious search need.

So how does Google overcome the main obstaclecollecting your connections?

Google relies on its own sources and on open sources it can obtain by crawling the social graph. That is the true reason why Facebook is not part of Google’s graph (no XFN/FOAF marking on Facebook’s public pages). Google may be counting on Facebook’s inevitable opening up, and with Gmail’s rising popularity it becomes a reasonable alternative even for Facebook users like me.

Sadly, all this great news gave zero credit to Delver, where it all happened first