The Alter Egozi

Entries tagged as ‘IBM’

IBM IR Seminar Highlights (part 2)

December 19, 2008 · Leave a Comment

The seminar’s third highlight for me (in addition to IBM’s social software and Mor’s talk), was the keynote speech by Human-Computer Interaction (HCI) veteran Professor Ben Schneiderman of UMD. Ben’s presentation was quite an experience, but not in a sophisticated Lessig way (which Dick Hardt adopted so well for identity 2.0), rather by sheer amounts of positive energy and passion streaming out of this 60-year-old.

[Warning - this post turned out longer and heavier than I thought...]

Ben Shneiderman in front of Usenet Treemap - flickr/Marc_SmithBen is one of the founding fathers of HCI, and the main part of his talk focused on how visualization tools can serve as human analysis enhancers, just like the web as a tool enhances our information.  

He presented tools such as ManyEyes (IBM’s),  SpotFire (which was his own hitech exit), TreeMap (with many examples of trend and outlier spotting using it) and others. The main point was in what the human eye can do using those tools, that no predefined automated analysis can, especially in fields such as Genomics and Finance.

Then the issue moved to how to put such an approach to work in Search, which like those tools, is also a power multiplier for humans. Ben described today’s search technology as adequate mainly in “known item finding”. The more difficult tasks that can’t be answered well in today’s search, are usually for a task that is not “one-minute job”, such as:

  • Comprehensive search (e.g. Legal or Patent search)
  • Proving negation (Patent search)
  • Finding exceptions (outliers)
  • Finding bridges (connecting two subsets)

The clusters of current and suggested strategies to address such tasks are:

  •  Enriching query formulation - non-textual, structured queries, results preview, limiting of result type…
  • Expanding result management – better snippets, clustering, visualization, summarization…
  • Enabling long-term effort – saving/bookmarking, annotation, notebooking/history-keeping, comparing…
  • Enhancing collaboration – sharing, publishing, commenting, blogging, feedback to search provider…

So far, pretty standard HCI ideas, but then Ben started taking this into the second part of the talk. A lot of the experimentation employed in these efforts by web players has built an entire methodology, that is quite different from established research paradigms. Controlled usability tests in the labs are no longer the tool of choice, rather A/B testing on user masses with careful choice of system changes. This is how Google/Yahoo/Live modify their ranking algorithms, how Amazon/NetFlix recommend products, how the Wikipedia collective “decides” on article content.

This is where the term “Science 2.0″ pops up. Ben’s thesis is that some of society’s great challenges today have more to learn from Computer Science, rather than traditional Social Science. “On-site” and “interventionist” approaches should take over controlled laboratory approaches when dealing with large social challenges such as security, emergency, health and others. You (government? NGOs? web communities?) could make actual careful changes to how specific social systems work, in real life,  then measure the impact, and repeat.

This may indeed sound like a lot of fluff, as some think, but the collaboration and decentralization demonstrated on the web can be put to real life uses. One example on HCIL is the 911.gov project for emergency response, as emergency is a classic case when centralized systems collapse. Decentralizing the report and response circles can leverage the power of the masses also beyond the twitter journalism effect.

Categories: Uncategorized
Tagged: , , ,

IBM IR Seminar Highlights (part 1)

December 17, 2008 · 3 Comments

IBM Haifa Research LabsYesterday’s seminar was also packed with some very interesting talks from a wide range of social aspects to IR and NLP.

Mor Naaman of Rutgers University and formerly at Yahoo! Research gave an excellent talk on using social inputs to improve the experience of multimedia search. The general theme was about discovering metadata for a given multimedia concept from web 2.0 sites, then using those to cluster potential results and choose representative ones.

In one application, this approach was used to identify “representative” photos of a certain landmark, say the Golden Gate bridge, see WorldExplorer for an illustration. So first, you’d find all flickr photos geotagged and/or fickr-tagged by the location and name of the bridge (or any given landmark). Next, image processing (SIFT)  is applied to those images to cluster them into subsets that are likely to be of the same section and/or perspective of the bridge. Finally, relations between the images in each cluster are formed based on the visual relation, and link analysis is employed to find a “canonical view”. The result is what we see on the right sidebar in World Explorer, and described in this WWW’08 paper.

[Update: Mor commented that the content-based analysis part is not yet deployed in World Explorer. Thanks Mor!]

tagmaps1

Another example applied this approach to concerts on YouTube, and the purpose was to find good clips of the concert itself, rather than videos discussing it etc. Metadata describing the event (say, an Iron Maiden concert) was collected from both YouTube and sites such as Upcoming.org, and Audio Fingerprinting was employed to detect overlapping video sections, as it’s quite likely the concert itself would have the most overlap. Note that in both cases, the image/audio processing is a heavy task, and applying it only to a small subset filtered by social tags makes the work involved more feasible.

I’ll talk about the keynote (by Prof. Ben Schneiderman) on another post, this one is already way too long… Here are soundbites from some other talks:

Emil Ismalon of Collarity referred to personalized search (e.g. Google’s) as a form of overfitting, not letting me learn anything new as it trains itself only on my own history. That, of course, as a motivation for community-based personalization. 

Ido Guy of IBM talked about research they did comparing social network extracted from public and private sources. The bottom line is that some forms of social relations are stronger, representing collaboration (working on projects together, co-authoring papers or patents), and others are weaker, being more around the socializing activities (friending/following on SN, commenting on blogs etc) . Of course, that would be relevant for Enterprise social graph, not necessarily personal life…

Daphne Raban of Haifa University summarized her (empirical) research into motivations of participants in Q&A sites. The main bottom lines were: 1) money was less important to people who participate very often, but it’s a catalyst, 2) Being awarded with gratitude and conversation is the main factor driving people to become more frequent participants, and 3) in quality comparison, paid results ranked highest, free community results (Yahoo! Answers) ranked close, and unpaid single experts ranked lowest.

Categories: Uncategorized
Tagged: , ,

IBM IR seminar talk on Socially Connected Search

December 16, 2008 · Leave a Comment

I had the pleasure today of presenting Delver in a talk I gave at IBM Haifa Research Labs IR  seminar. My slides are over here.

The seminar’s focus this year was on social search, and there were quite a few other talks I found very interesting, I’ll blog about those later on too. One of the positive surprises for me was the amount of work carried out at IBM-HRL on social/web 2.0 tools such as SONAR. Impressive social product work for a non-consumer player; I plan to read more of their published work on that.

Categories: Uncategorized
Tagged: , ,