Microsoft Israel ReCon 2015 (or: got to start blogging more often…)

Yes, two consecutive posts on the same annual event are not a good sign to my virtual activity level… point taken.

MSILSo 2 weeks ago, Microsoft Israel held its second ReCon conference on Recommendations and Personalization, turning its fine 2014 start into a tradition worth waiting for. This time it was more condensed than last year (good move!) and just as interesting. So here are three highlights I found worth reporting about:

Uri Barash of the hosting team gave the first keynote on Cortana integration in Windows 10, talking about the challenges and principles used. Microsoft places a high empasis on the user’s trust, hence Cortana does not use any interests that are not explicitly written in Cortana’s notebook, validated by the user. If indeed correct, that’s somewhat surprising, as it limits the recommendation quality and moreover – the discovery experience for the user, picking up potential interests from the user’s activity. I’d still presume that all these implicit interests are probably used behind the scenes, to optimize the content from explicit interests.

ibm_logoIBM Haifa Research Labs have been doing work for some years now on enterprise social networks, and mining connections and knowledge from such networks. In ReCon this year, Roy Levin presented a paper to be published in SIGIR’15, titled “Islands in the Stream: A Study of Item Recommendation within an Enterprise Social Stream“. In the paper, they discuss a feature for a personalized newsfeed included in IBM’s enterprise social network “IBM Connections”, and provide some background and the personalized ranking logic for the feed items.

They then move on to describe a survey they have made among users of the product, to analyze their opinions on specific items recommended for them in their newsfeed, similar to Facebook’s newsfeed surveys. Through these surveys, the IBM researchers attempted to identify correlations between various feed item factors, such as post and author popularity, post personalization score, how surprising an item may be to a user and how likely a user is to want such serevdipity, etc. The actual findings are in the paper, but what may actually be even more interesting is the deep dissection in the paper of the internal workings of the ranking model.

Outbrain-logoAnother interesting talk was by Roy Sasson, Chief Data Scientist at Outbrain. Roy delivered a fascinating talk about learning from lack of signals. He began with an outline of general measurement pitfalls, demonstrating them on Outbrain widgets when analyzing low numbers of of clicks on recommended items. Was the widget visible to the user? where was it positioned in the page (areas of blindness)? what items were next to the analyzed item? were they clicked? and so on.

Roy then proceeded to talk about what we may actually be able to learn from lack of sharing to social networks. We all know that content that gets shared a lot on social networks is considered viral, driving a lot of discussion and engagement. But what about content that gets practically no sharing at all? and more precisely, what kind of content gets a lot of views, but no sharing? Well, if you hadn’t guessed already, that will likely be content users are very interested to see, but would not admit to it, namely provocative and adult material. So in a way, leveraging this reverse correlation helped Outbrain automatically identify porn and other sensitive material. This was then not used to filter all of this content out – after all, users do want to view it… but it was used to make sure that the recommendation strip includes only 1-2 such items so they don’t take over the widget, making it seem like this is all Outbrain has to offer. Smart use of data indeed.

Microsoft Israel ReCon 2014

Microsoft Israel R&D Center held their first Recommendations Technology conference today, ReCon. With an interesting agenda and a location that’s just across the street from my office, I could not skip this one… here are some impressions from talks I found worth mentioning.

The first keynote speaker was Joseph Sirosh, who leads the Cloud Machine Learning team at Microsoft, recently joining from Amazon. Sirosh may have aimed low, not knowing what his audience will be like, but as a keynote this was quite a disappointing talk, full of simplistic statements and buzzwords. I guess he lost me when he stated quite decisively that the big difference about putting your service on the cloud is that it means it will get better the more people use it. Yeah.

Still, there were also some interesting observations he pointed out, worth mentioning:

  • If you’re running a personalization service, benchmarking against most popular items (i.e. Top sellers for commerce) is the best non-personalized option. Might sound trivial, but when coming from an 8-year Amazon VP, that’s a good validation
  • “You get what you measure”: what you choose to measure is what you’re optimizing, make sure it’s indeed your weakest links and the parts you want to improve
  • Improvement depends on being able to run a large number of experiments, especially when you’re in a good position already (the higher you are, the lower your gains, and the more experiments you’ll need to run to keep gaining)
  • When running these large numbers of experiments, good collaboration and knowledge sharing becomes critical, so different people don’t end up running the same experiments without knowing of each other’s past results

Elad Yom-Tov from Microsoft Research described work his team did on enhancing Collaborative Filtering using browse logs. They experimented with adding user browser logs (visited urls) and search queries to the CF matrix in various ways to help bootstrapping users with little data and to better identify short-term (recent) intent for these users.

An interesting observation they reached was that using the raw search queries as matrix columns worked better than trying to generalize or categorize them, although intuitively one would expect this would reduce the sparsity of such otherwise very long-tail attributes. It seems that the potential gain in reducing sparsity is offset by the loss of specificity and granularity of the original queries.

unique

Another related talk which outlined an interesting way to augment CF was by Haggai Roitman of IBM Research. Haggai suggested the feature of “user uniqueness” –  to what extent the user follows the crowd or deliberately looks for the esoteric choices, as a valuable signal in recommendations. This uniqueness would then determine whether to serve the user with results that are primarily popularity-based (e.g. CF) or personalized (e.g. content-based), or a mix of the two.

The second keynote was by Ronny Lempel of Yahoo! Labs in Haifa. Ronny talked about multi-user devices, in particular smart TVs, and how recommendations should take into account the user that is currently in front of the device (although this information is not readily available). The heuristic his team used was that the audience usually doesn’t change in consecutive programs watched, and so using the last program as context to recommending the next program will help model that unknown audience.

Their results indeed showed a significant improvement in recommendations effectiveness when using this context. Another interesting observation was that using a random item from the history, rather than the last one, actually made the recommendations perform worse than no context at all. That’s an interesting result, as it validates the assumption that approximating the right audience is valuable, and if you make recommendations to the parent watching in the evening based on the children’s watched programs in the afternoon, you are likely to make it worse than no such context at all.

Cortana

The final presentation was by Microsoft’s Hadas Bitran, who presented and demonstrated Windows Phone’s Cortana. Microsoft go out of their way to describe Cortana as friendly and non-creepy, and yet the introductory video from Microsoft Hadas presented somehow managed to include a scary robot (from Halo, I presume), dramatic music, and Cortana saying “Now learning about you”. Yep, not creepy at all.

Hadas did present Cortana’s context-keeping session, which looks pretty cool as questions she asked related to previous questions and answers, were followed through nicely by Cortana (all in a controlled demo, of course). Interestingly, this even seemed to work too well, as after getting Cortana’s list of suggested restaurants Hadas asked Cortana to schedule a spec review, and Cortana insisted again and again to book a table at the restaurant instead… nevertheless, I can say the demo actually made the option of buying a Windows Phone pass through my mind, so it does do the job.

All in all, it was an interesting and well-organized conference, with a good mix of academia and industry, a good match to IBM’s workshops. Let’s have many more of these!

The Great Managers Balancing Act

With so many approaches to management – and of software development in particular – there are plenty of authors who write about it. I don’t intend to join that fray. Personally, I enjoy the “What” much more than the “How”, but recently this piece of insight dawned on me.

To be helpful, a good middle manager does one of two things:

  1. Up: Make decisions and be held accountable for their outcome.
  2. Down: Remove obstacles from his team’s path.

Where it gets interesting is where #1 and #2 collide, and how this manager deals with it. Great managers find the right balance. Mediocre managers can only handle this by screwing one at the expense of the other.

For example, a certain middle manager gets some directive handed down from above, while the team is already at full capacity. Rather than trading off another highly prioritized task and facing a tough time with higher management, he prefers to push the requirement down to his team, to try and “make an extra effort.” He even considers it his decision, so he feels that he lives up to #1. But sadly for his team, not only did he not remove obstacles, he also just added more.

Alternatively, such managers try to execute #2 and help their team by making the tough decisions that remove an obstacle. But because they do not realize they’re the ones held accountable on these decisions, they prefer to not communicate them upward to keep their political standing, thus violating #1. This eventually results in the team losing credibility and being considered lower-execution, despite all their hard work.

Of course, how to successfully balance #1 and #2 and still keep your job and sanity as a manager is a separate topic, one I’ll leave to the management experts to discuss…

"life is a great balancing act..."

Mining Wikipedia, or: How I Learned to Stop Worrying and Love Statistical Algorithms

I took my first AI course during my first degree, in the early 90’s. Back then it was all about expert systems, genetic algorithms, search/planning (A* anyone?). It all seemed clear, smart, intuitive, intelligent…

Then, by the time I got to my second degree in the late 00’s, the AI world has changed by a lot. Statistical approaches took over by a storm, and massive amounts of data seemed to trump intuition and smart heuristics anytime.

It took me a while to adjust, I admit, but by the time I completed my thesis I came to appreciate the power of big data. I now can better see this as an evolution, with heuristics and inutions driving how we choose to analyze and process the data, even if afterwards it’s all “just” number-crunching.

So on this note, I gave a talk today at work on the topic of extracting semantic knowledge from Wikipedia, relating also to our work on ESA and to this being an illustration of the above. Enjoy!

 

The secret to Facebook’s growth?

Alteregozi.com has recently also been attacked by the wonderous Facebook profile spam comments (I kept two specimens here and here, but deleted many dozens more in the past weeks). At first, I was amused at this new type of spam comments, but after running a few searches I felt more of disgrace for being so late to the party, seeing mentions of these more than a year ago

So what’s the deal with these comments? they usually don’t include any links, not selling anything, and some are really good comments. If you’d look at the above two you’ll have a very hard time figuring out they are not real comments. Looks like some spammers harvest comments from legit blogs, and then classify your post to find the most similar comment to stick. What is the motivation?

I don’t have the answers myself, but two thoughts:

 

  1. One spam fighting blog claims that the motivation is to establish the credibility of these accounts, so that they can later be used to sell likes on Facebook itself. The plot thickens…
  2. I’ve never seen an account repeating. The amount of fake FB accounts being created is probably huge. How much of Facebook’s recent continued growth is attributed to such fake accounts? nothing you would hear about in Facebook’s earnings calls.

 

fakebook

Amazon, Apple, and Application Platforms

Apple is known for keeping a bustling legal department. Steve Jobs reportedly swore to “destroy Android“, the results of which Samsung has felt very well.

But Apple has more enemies to fight. It holds a complicated relationship with Amazon, who now produces the second most selling tablet after the iPad, claiming it already owns 22% of the US tablet market. That’s a lot of iPads that Apple isn’t selling, and so it readies its own iPad Mini in response.

A less familiar front in this battle is Apple’s “False Advertising” suit against Amazon with regard to the latter’s use of “App Store” for its Android-based application market. Amazon’s response ridiculed this claim, but this does raise the question – what exactly is Amazon’s app store all about?

Amazon’s Kindle store is one strange beast. Kindle apps are in fact re-purposed Android apps, with some added functionality. However, Amazon took care to clearly differentiate the Kindle’s UX and app store from the general Android market. So what is the justification for developing an extra Kindle app?

Every application development platform has its unique core capabilities, which developers can leverage for their own application. Developers get to apply their creative ideas on these assets, while the platform owner enjoys increased engagement for their users, with apps taking these capabilities to places the platform did not even imagine. Facebook’s application platform revolved around the social graph, a unique and very valuable data asset, and Apple provided access to the iPhone’s unique (at the time) features such as its accelerometers and gyroscope, GPS and camera.

Visiting the Amazon Kindle SDK site shows where Amazon feels it has the advantage: 1-click purchasing. This patented Amazon feature (a patent which Apple has actually licensed) can appeal to application developers who feel their application has premium features worth paying for, if only the payment was frictionless. Initial results seemed to validate that, and show excellent revenue per user on Amazon’s platform.

And so, Amazon’s platform says a lot about where Amazon feels its strength lies with the Kindle. Unlike Apple, Amazon builds its success in the tablets market on selling content, much less than selling devices. Hence, expect Kindle to continue beating the iPad on price even when iPad mini launches.

Out of Context

Sponsored Stories are a brilliant advertising model by Facebook. Just like  AdWords in 2000, it’s an example of a model that leverages the core value of the company for advertising, without compromising that value’s authenticity. If your friends liked Starbucks, it was of their own free will and in a public forum, so having Starbucks pay to show this more prominently and to other users can only make sense.

So why is it, then, that a simple amusing case of 55-gallon of lubricant made so many bad headlines for Facebook?

And Facebook has more fronts to fight in its battles for transformation into a revenue-driven company. Timeline may be great for brands, but it’s a magnet for popular revolt. Besides resenting the no-alternative approach Facebook took, why are users so upset about the actual Timeline view, which is surely more visually appealing than the boring wall?

I find the answer to both relates to context.

Out Of Context

For the Sponsored Stories it seems pretty clear. “Yes, I linked to a 55-gallon lubricant product, but I did so as a joke”, well then, Sentiment Analysis still has a long way to go with sarcasm despite some recent advance right here in the Hebrew university. Sarcasm is one extreme example, but that missing context could even just be that you’re no longer fan of that company you liked a month ago, and just didn’t get to unlike yet.

And what about Timeline? isn’t it great that all your previous statuses and photos are there, organized along your timeline and telling your story? well, it is, but only if you care to ensure that it tells the story that you really want to tell. The context of that story may depend on where we were, what we were up to at the time, who our friends were… some of this may not even be possible to reconstruct in the Timeline.

In addition, we are used to our stories dropping off the cliff of the page fold and disappearing into oblivion, so we don’t really care to update them or remove those we don’t feel so proud of anymore. Suddenly, they come back to haunt us with Timeline, and we have to scramble to adjust

And in a final associative thought: the tiled UX of Timeline does remind me of the Pinterest-mania that has taken hold on every new social curation site. So why does this look so so much fun on Pinterest? Context again. Pinterest has none of it, it’s a pure fun/discovery experience, each tile is independent and you’re not really trying to follow up a thread, or cover all that you’ve missed since your last visit. For a social network though, that would be, well, out of context.

Thoughts on Plus

So what’s the deal with Google+? is Google really taking on Facebook? is that a classic “me too” play, or something smarter?

It took me a while to figure out my opinion, but several interesting articles got the stars aligned just right for a split second to make some sense (until some new developments will soon de-align them again :-)).

Take a deep breath. OK, here it comes:

Google+ is Google’s take on Social.

Yes, I know, who would have thought?…
It’s just that Google’s definition of Social is a bit different.

At Facebook (and really, for most of us), Social is about conversations with people you actually know.
At Google, Social is the new alias for Personalization.

It’s pretty simple: Google’s business model has always meant the more I know about you, the better I can monetize through more targeted ads. At first, it was all about the search engine being where you always start your surfing, and Google was well seated. As traffic to social networks grew, culminating with Facebook overtaking Google on March 2010, it became increasingly clear that a larger portion of our information starts being served to us from social networks. Google was left out.

Why was that so important? Google still had tons of searches, an ever-growing email market share, and successful news aggregation and rss reader, among other assets. That’s quite a lot to know about us, isn’t it?

It turned out that the missing link often was the starting point. You would learn about the new thing, the new trend, the new gadget you want to get, while you were out of Google’s reach. By the time you got into the Google web, you may have already got your mind set on what you want to get and even where, making the Google ads a lot less effective.

The Follow versus Friend model is also a huge issue. It means that G+ is about self-publishing and positioning yourself, and not about conversations. That suits Google very well, and is not just a differentiation from FB. This model drives you to follow based on interest, building an interest graph rather than a social graph, and being a lot more useful to profiling you than your social connections.

That interest graph, in turn, makes sure your first encounter with those things that make you tick is inside the Google web. It also links back well to the fine assets that Google holds today, from your docs to your publishing tools. So when Google News announces those funny badges, and you may have thought “Heh, who would want to put these stinking badges on their profiles…” – think again. Their private nature is just fine for Google. It’s a way to ask you to validate your inferred interests: “So tell us, is that interest of yours in US politics that we have inferred from your news reading a real inherent interest, or was it just a transient interest that will melt away after the election?“. Again – big difference for profiling.

Finally, Google+ is positioned to be a professional network. Focusing on interests and having anyone able to follow you, will keep away the teens and lure the self-proclaimed professionals. In that sense, LinkedIn may have more of a reason for concern, at least as the content network it now tries to be. It’s quite likely that G+ does not even aim to unseat Facebook, only to dry it out of its professional appeal, and leave it with what we started with – party/kids photos and keeping track of what those old friends are up to.

I guess I already know what network I’ll be posting a link to this post to…

Farewell Academia

My Master’s thesis (presented here) was finally published in the April issue of TOIS. Good time to recap my second academic adventure.

Six years ago, when I considered graduate studies (10 years after graduating my B.Sc) I was CTO in a company that was at a crossroads, leading to very short term product and technology thinking. Looking for a change, I felt the academic world offered a space where deep, broad thinking was preferred over nearsighted goals. So I reduced my position at work, and took on studies back in the Technion.

I finished the needed courses in a year and a half, but the thesis took much longer. Friends warned me it’s difficult to context switch between work and research, not to mention family, and they were indeed right. Still, I wanted to feel the academic life again, and figure out if I wanted to pursue it full time and continue to a PhD.

The conclusion gradually distilled into a resounding No. I’ll stop at Master’s. One reason was my allergic reactions to too much maths, so prevalent in the Technion, but there was also something deeper. I realized that the user experience is where I’m at, and core computer science research is far from it, except perhaps HCI departments.

There is a significant gap between the cutting edge in academy and in practice. A paper may be worth publishing due to a statistically significant increment of 5% in relevancy (see the major interest around the Netflix prize), whereas actual users will barely feel the difference. On the other hand, stuff that is considered “commodity” in the academic world, can make big waves if implemented well in the industry, and for a good reason. Companies have built a major user following (and a fortune…) just by doing excellent and usable implementation of basic CS algorithms.

So if I have to choose between making a the research community happy, or making end users happy, I definitely choose the latter. Perhaps I’ll go back to do my PhD in another 10 years, but until then, it’s Farewell Academia!

Evaluating algorithms’ quality

As part of a “creativity dojo” we’ve had at work, I finally got to implement something I’ve long felt was needed in our QA – a framework for evaluating algorithms’ quality.

Living on the seam between algorithm development and product management in the past few years, I’ve come to appreciate the need to be able to evaluate not just that it works, but that it works well. A search engine may return results that contain the keywords, but are these the most relevant ones? a recommendation algorithm may return products that are related to the user in some way, but can they be considered “good” recommendations?

During my master’s studies I came to know the work done over at TREC, and was fascinated by the strong emphasis on what we developers often skim over – evaluating results’ quality statistically, and moreover analyzing the evaluation method itself, to ensure that it is sound. So with that approach in mind, I teamed with our talented QA team to create a working framework in 2 days. Here are some lessons and tips learned along the way, that could be useful for others trying to achieve a similar feat:

  1. Create a generic tool. TREC is mostly about search; however, with some imagination, most AI algorithms can be reduced to similar building blocks. Search, recommendation, classification – all could eventually be reduced to taking an input and returning a ranked list of results, on which the same quality metric can be applied. Code-wise, we used a generic scoring class, with a wrapping interface that has different implementations for different algos to provide the varying context.
  2. Use large data. This may sound trivial in the academic world, but when you’re in a QA state of mind, you sometimes tend to get used to creating small worlds that are easy to control. Not here. It’s very important to simulate real-life user scenarios by using data that’s similar to production, so we used out integration environment, which replicates from production data.
  3. Facilitate judging. Obtaining relevance judgments is crucial to getting useful tests. The customer here is a business owner / product manager, who may not appreciate the tedious task of rating results. We created a browser plugin that allows rating from within the actual results page, and accumulates those ratings in a per-test relevance file.
  4. Measure test staleness. The downside of using non-controlled data is that it moves the carpet from under your feet. Data may change over time and your test may become less relevant. We used Buckley’s Binary Preference (bPref) measure that functions well with incomplete judgments, and also introduced a weighted measure of how many unjudged results are found, to trigger a test failure when results become too unreliable (requiring another judging round).