Microsoft Israel ReCon 2014

Microsoft Israel R&D Center held their first Recommendations Technology conference today, ReCon. With an interesting agenda and a location that’s just across the street from my office, I could not skip this one… here are some impressions from talks I found worth mentioning.

The first keynote speaker was Joseph Sirosh, who leads the Cloud Machine Learning team at Microsoft, recently joining from Amazon. Sirosh may have aimed low, not knowing what his audience will be like, but as a keynote this was quite a disappointing talk, full of simplistic statements and buzzwords. I guess he lost me when he stated quite decisively that the big difference about putting your service on the cloud is that it means it will get better the more people use it. Yeah.

Still, there were also some interesting observations he pointed out, worth mentioning:

  • If you’re running a personalization service, benchmarking against most popular items (i.e. Top sellers for commerce) is the best non-personalized option. Might sound trivial, but when coming from an 8-year Amazon VP, that’s a good validation
  • “You get what you measure”: what you choose to measure is what you’re optimizing, make sure it’s indeed your weakest links and the parts you want to improve
  • Improvement depends on being able to run a large number of experiments, especially when you’re in a good position already (the higher you are, the lower your gains, and the more experiments you’ll need to run to keep gaining)
  • When running these large numbers of experiments, good collaboration and knowledge sharing becomes critical, so different people don’t end up running the same experiments without knowing of each other’s past results

Elad Yom-Tov from Microsoft Research described work his team did on enhancing Collaborative Filtering using browse logs. They experimented with adding user browser logs (visited urls) and search queries to the CF matrix in various ways to help bootstrapping users with little data and to better identify short-term (recent) intent for these users.

An interesting observation they reached was that using the raw search queries as matrix columns worked better than trying to generalize or categorize them, although intuitively one would expect this would reduce the sparsity of such otherwise very long-tail attributes. It seems that the potential gain in reducing sparsity is offset by the loss of specificity and granularity of the original queries.

unique

Another related talk which outlined an interesting way to augment CF was by Haggai Roitman of IBM Research. Haggai suggested the feature of “user uniqueness” –  to what extent the user follows the crowd or deliberately looks for the esoteric choices, as a valuable signal in recommendations. This uniqueness would then determine whether to serve the user with results that are primarily popularity-based (e.g. CF) or personalized (e.g. content-based), or a mix of the two.

The second keynote was by Ronny Lempel of Yahoo! Labs in Haifa. Ronny talked about multi-user devices, in particular smart TVs, and how recommendations should take into account the user that is currently in front of the device (although this information is not readily available). The heuristic his team used was that the audience usually doesn’t change in consecutive programs watched, and so using the last program as context to recommending the next program will help model that unknown audience.

Their results indeed showed a significant improvement in recommendations effectiveness when using this context. Another interesting observation was that using a random item from the history, rather than the last one, actually made the recommendations perform worse than no context at all. That’s an interesting result, as it validates the assumption that approximating the right audience is valuable, and if you make recommendations to the parent watching in the evening based on the children’s watched programs in the afternoon, you are likely to make it worse than no such context at all.

Cortana

The final presentation was by Microsoft’s Hadas Bitran, who presented and demonstrated Windows Phone’s Cortana. Microsoft go out of their way to describe Cortana as friendly and non-creepy, and yet the introductory video from Microsoft Hadas presented somehow managed to include a scary robot (from Halo, I presume), dramatic music, and Cortana saying “Now learning about you”. Yep, not creepy at all.

Hadas did present Cortana’s context-keeping session, which looks pretty cool as questions she asked related to previous questions and answers, were followed through nicely by Cortana (all in a controlled demo, of course). Interestingly, this even seemed to work too well, as after getting Cortana’s list of suggested restaurants Hadas asked Cortana to schedule a spec review, and Cortana insisted again and again to book a table at the restaurant instead… nevertheless, I can say the demo actually made the option of buying a Windows Phone pass through my mind, so it does do the job.

All in all, it was an interesting and well-organized conference, with a good mix of academia and industry, a good match to IBM’s workshops. Let’s have many more of these!

The Great Managers Balancing Act

With so many approaches to management, and of software development in particular, there are plenty of authors to write about it. I don’t intend to join that fray, I enjoy the ‘what’ much more than the ‘how’, but recently this piece of insight dawned on me.

To be helpful, a good middle manager does one of these two:

  1. Up: make decisions, and be held accountable to their outcome
  2. Down: remove obstacles from his team’s path

Where it gets interesting is where #1 and #2 collide, and how this manager deals with it. Great managers find the right balance. Mediocre managers can only handle this by screwing one on the expense of the other.

For example, a certain middle manager gets some directive handed down from above, while the team is already in full capacity. Rather than trading off another highly prioritized task and facing a tough time with higher management, he prefers to push the requirement down to his team, to try and “make an extra effort”. He even considers it as his decision, so he feels that he lives up to #1, but sadly for his team he not only not remove obstacles but has just added more.

Alternatively, such managers try to execute #2 and help their team by making the tough decisions that remove an obstacle, but because they do not realize they are the ones held accountable on these decisions, they prefer to not communicate them upwards to keep their political standing, thus violating #1. This eventually results in the team losing credibility and considered lower-execution, despite all of their hard work.

Of course, how to balance #1 and #2 well and still keep your job and sanity as a manager is a separate topic, which I’ll leave to the management experts to discuss…

"life is a great balancing act..."

Mining Wikipedia, or: How I Learned to Stop Worrying and Love Statistical Algorithms

I took my first AI course during my first degree, in the early 90′s. Back then it was all about expert systems, genetic algorithms, search/planning (A* anyone?). It all seemed clear, smart, intuitive, intelligent…

Then, by the time I got to my second degree in the late 00′s, the AI world has changed by a lot. Statistical approaches took over by a storm, and massive amounts of data seemed to trump intuition and smart heuristics anytime.

It took me a while to adjust, I admit, but by the time I completed my thesis I came to appreciate the power of big data. I now can better see this as an evolution, with heuristics and inutions driving how we choose to analyze and process the data, even if afterwards it’s all “just” number-crunching.

So on this note, I gave a talk today at work on the topic of extracting semantic knowledge from Wikipedia, relating also to our work on ESA and to this being an illustration of the above. Enjoy!

 

The secret to Facebook’s growth?

Alteregozi.com has recently also been attacked by the wonderous Facebook profile spam comments (I kept two specimens here and here, but deleted many dozens more in the past weeks). At first, I was amused at this new type of spam comments, but after running a few searches I felt more of disgrace for being so late to the party, seeing mentions of these more than a year ago

So what’s the deal with these comments? they usually don’t include any links, not selling anything, and some are really good comments. If you’d look at the above two you’ll have a very hard time figuring out they are not real comments. Looks like some spammers harvest comments from legit blogs, and then classify your post to find the most similar comment to stick. What is the motivation?

I don’t have the answers myself, but two thoughts:

 

  1. One spam fighting blog claims that the motivation is to establish the credibility of these accounts, so that they can later be used to sell likes on Facebook itself. The plot thickens…
  2. I’ve never seen an account repeating. The amount of fake FB accounts being created is probably huge. How much of Facebook’s recent continued growth is attributed to such fake accounts? nothing you would hear about in Facebook’s earnings calls.

 

fakebook

Amazon, Apple, and Application Platforms

Apple is known for keeping a bustling legal department. Steve Jobs reportedly swore to “destroy Android“, the results of which Samsung has felt very well.

But Apple has more enemies to fight. It holds a complicated relationship with Amazon, who now produces the second most selling tablet after the iPad, claiming it already owns 22% of the US tablet market. That’s a lot of iPads that Apple isn’t selling, and so it readies its own iPad Mini in response.

A less familiar front in this battle is Apple’s “False Advertising” suit against Amazon with regard to the latter’s use of “App Store” for its Android-based application market. Amazon’s response ridiculed this claim, but this does raise the question – what exactly is Amazon’s app store all about?

Amazon’s Kindle store is one strange beast. Kindle apps are in fact re-purposed Android apps, with some added functionality. However, Amazon took care to clearly differentiate the Kindle’s UX and app store from the general Android market. So what is the justification for developing an extra Kindle app?

Every application development platform has its unique core capabilities, which developers can leverage for their own application. Developers get to apply their creative ideas on these assets, while the platform owner enjoys increased engagement for their users, with apps taking these capabilities to places the platform did not even imagine. Facebook’s application platform revolved around the social graph, a unique and very valuable data asset, and Apple provided access to the iPhone’s unique (at the time) features such as its accelerometers and gyroscope, GPS and camera.

Visiting the Amazon Kindle SDK site shows where Amazon feels it has the advantage: 1-click purchasing. This patented Amazon feature (a patent which Apple has actually licensed) can appeal to application developers who feel their application has premium features worth paying for, if only the payment was frictionless. Initial results seemed to validate that, and show excellent revenue per user on Amazon’s platform.

And so, Amazon’s platform says a lot about where Amazon feels its strength lies with the Kindle. Unlike Apple, Amazon builds its success in the tablets market on selling content, much less than selling devices. Hence, expect Kindle to continue beating the iPad on price even when iPad mini launches.

Out of Context

Sponsored Stories are a brilliant advertising model by Facebook. Just like  AdWords in 2000, it’s an example of a model that leverages the core value of the company for advertising, without compromising that value’s authenticity. If your friends liked Starbucks, it was of their own free will and in a public forum, so having Starbucks pay to show this more prominently and to other users can only make sense.

So why is it, then, that a simple amusing case of 55-gallon of lubricant made so many bad headlines for Facebook?

And Facebook has more fronts to fight in its battles for transformation into a revenue-driven company. Timeline may be great for brands, but it’s a magnet for popular revolt. Besides resenting the no-alternative approach Facebook took, why are users so upset about the actual Timeline view, which is surely more visually appealing than the boring wall?

I find the answer to both relates to context.

Out Of Context

For the Sponsored Stories it seems pretty clear. “Yes, I linked to a 55-gallon lubricant product, but I did so as a joke”, well then, Sentiment Analysis still has a long way to go with sarcasm despite some recent advance right here in the Hebrew university. Sarcasm is one extreme example, but that missing context could even just be that you’re no longer fan of that company you liked a month ago, and just didn’t get to unlike yet.

And what about Timeline? isn’t it great that all your previous statuses and photos are there, organized along your timeline and telling your story? well, it is, but only if you care to ensure that it tells the story that you really want to tell. The context of that story may depend on where we were, what we were up to at the time, who our friends were… some of this may not even be possible to reconstruct in the Timeline.

In addition, we are used to our stories dropping off the cliff of the page fold and disappearing into oblivion, so we don’t really care to update them or remove those we don’t feel so proud of anymore. Suddenly, they come back to haunt us with Timeline, and we have to scramble to adjust

And in a final associative thought: the tiled UX of Timeline does remind me of the Pinterest-mania that has taken hold on every new social curation site. So why does this look so so much fun on Pinterest? Context again. Pinterest has none of it, it’s a pure fun/discovery experience, each tile is independent and you’re not really trying to follow up a thread, or cover all that you’ve missed since your last visit. For a social network though, that would be, well, out of context.

Thoughts on Plus

So what’s the deal with Google+? is Google really taking on Facebook? is that a classic “me too” play, or something smarter?

It took me a while to figure out my opinion, but several interesting articles got the stars aligned just right for a split second to make some sense (until some new developments will soon de-align them again :-)).

Take a deep breath. OK, here it comes:

Google+ is Google’s take on Social.

Yes, I know, who would have thought?…
It’s just that Google’s definition of Social is a bit different.

At Facebook (and really, for most of us), Social is about conversations with people you actually know.
At Google, Social is the new alias for Personalization.

It’s pretty simple: Google’s business model has always meant the more I know about you, the better I can monetize through more targeted ads. At first, it was all about the search engine being where you always start your surfing, and Google was well seated. As traffic to social networks grew, culminating with Facebook overtaking Google on March 2010, it became increasingly clear that a larger portion of our information starts being served to us from social networks. Google was left out.

Why was that so important? Google still had tons of searches, an ever-growing email market share, and successful news aggregation and rss reader, among other assets. That’s quite a lot to know about us, isn’t it?

It turned out that the missing link often was the starting point. You would learn about the new thing, the new trend, the new gadget you want to get, while you were out of Google’s reach. By the time you got into the Google web, you may have already got your mind set on what you want to get and even where, making the Google ads a lot less effective.

The Follow versus Friend model is also a huge issue. It means that G+ is about self-publishing and positioning yourself, and not about conversations. That suits Google very well, and is not just a differentiation from FB. This model drives you to follow based on interest, building an interest graph rather than a social graph, and being a lot more useful to profiling you than your social connections.

That interest graph, in turn, makes sure your first encounter with those things that make you tick is inside the Google web. It also links back well to the fine assets that Google holds today, from your docs to your publishing tools. So when Google News announces those funny badges, and you may have thought “Heh, who would want to put these stinking badges on their profiles…” – think again. Their private nature is just fine for Google. It’s a way to ask you to validate your inferred interests: “So tell us, is that interest of yours in US politics that we have inferred from your news reading a real inherent interest, or was it just a transient interest that will melt away after the election?“. Again – big difference for profiling.

Finally, Google+ is positioned to be a professional network. Focusing on interests and having anyone able to follow you, will keep away the teens and lure the self-proclaimed professionals. In that sense, LinkedIn may have more of a reason for concern, at least as the content network it now tries to be. It’s quite likely that G+ does not even aim to unseat Facebook, only to dry it out of its professional appeal, and leave it with what we started with – party/kids photos and keeping track of what those old friends are up to.

I guess I already know what network I’ll be posting a link to this post to…