Tag Archives: Data Mining

Amazon Go isn’t about no-checkout. Here’s what we missed.

After long anticipation and a whole year of delay, Amazon Go was finally launched to the general public with much fanfare three weeks ago. Well, that is if you call opening one small store on the ground floor of your giant corporate building – a “launch”.

Amazon Go in Seattle, December 2016.jpgThe prototype Amazon Go store at Day One, Seattle. By SounderBruce – CC BY-SA 4.0


The move has reignited the debate about what impact the new technology will have on 2.3M cashiers in the U.S., whose jobs might be eliminated, and on the retail industry as a whole.

But the real question runs deeper. It’s very clear that operating an Amazon Go store comes at a major cost. If all the cost saving is the paychecks of a few cashiers, but the price to pay is installing and maintaining a very large number of advanced sensors – not to mention the initial price of developing the technology – the bottom line is clearly not a profitable one.

Furthermore, if all Amazon wants is to remove the need for cashiers, self-checkout has existed for quite some time and is likely much cheaper to install. Walmart took care to announce that it expanded its “scan and go” app in advance of the Amazon Go launch – yet another alternative. Why is it so critical for Amazon to eliminate the explicit checkout step altogether?

Is it all perhaps just a publicity stunt? Amazon is not known for pulling such stunts; when it launches something big, it’s usually to truly make a strategic move in that direction.

To better understand Amazon’s motivations, we need to go back to how we, the target audience, have been looking at this story, and in particular – the user data story.

If no cashier or RFID readers scan your (virtual) shopping bag, it inherently means that Amazon should have a different way to know what you put in it. Using cameras means it will need even more than that – it will need to know which item you picked up and only looked at, and which item you did decide to purchase eventually.

So, Amazon Go means Amazon should be watching every move you make in the store. In fact, it means Amazon must watch everything you do, or else the whole concept will not work. This requirement that we accept so naturally, this seemingly so obvious conclusion… this is the heart of Amazon’s strategy.

Just think about it: You walk into an Amazon store; your body and face are being scanned and tracked to the point that you can be perfectly recognized; your every move is being tracked and monitored, and all are mapped to a personally identifiable profile tied to your credit card, managed by a powerful private corporation. In any other context, this would trigger a firestorm of privacy and security charges, but for Amazon Go – well, that’s what it takes to deliver on its promise, isn’t it?

What does Amazon gain from this data?

What’s fascinating to notice is that this data enables the transfer of an entire stack of technologies and methodology from online to offline, from the app or site to the brick-and-mortar, which is a dramatic gain. Think about the parallels to what we already came to expect from online –  browse sessions, recommendations, abandoned cart flows… That item you were considering? Amazon now knows you considered it. Data scientists all over the retail world would love to put their hands on such physical, in-store behavioral data.

For now, the technology may be limited to groceries as a first step. But we could expect Amazon to work to expand it – rather than to more locations – to further verticals. Just think of the personalized recommendations and subscription services such a technology could drive in high-end wine stores, as one example.

One indication that may show Amazon is truly after the data rather than the stores themselves will be if Amazon licenses Go to other retailers or small players. This will immediately position it as a data broker. In any case, retailers have yet another good reason to keep a close look on Amazon’s disruptive moves.

Mining Wikipedia, or: How I Learned to Stop Worrying and Love Statistical Algorithms

I took my first AI course during my first degree, in the early 90’s. Back then it was all about expert systems, genetic algorithms, search/planning (A* anyone?). It all seemed clear, smart, intuitive, intelligent…

Then, by the time I got to my second degree in the late 00’s, the AI world has changed by a lot. Statistical approaches took over by a storm, and massive amounts of data seemed to trump intuition and smart heuristics anytime.

It took me a while to adjust, I admit, but by the time I completed my thesis I came to appreciate the power of big data. I now can better see this as an evolution, with heuristics and inutions driving how we choose to analyze and process the data, even if afterwards it’s all “just” number-crunching.

So on this note, I gave a talk today at work on the topic of extracting semantic knowledge from Wikipedia, relating also to our work on ESA and to this being an illustration of the above. Enjoy!


Web(MD) 2.0

Just when I thought that the uses for recommendation systems were already exhausted…

CureTogether is a site that lets you enter your medical conditions (strictly anonymous, only aggregated data are public), and get recommended for… other “co-morbid” conditions you may have. In other words, “people who have your disease usually also have that one too, perhaps you have it too?

Beyond the obvious jokes, this truly has potential. You don’t only get “recommended” for conditions, but rather also for treatments and causes. We all know that sometimes we have our own personal treatment that works only for us. What if it works for people in our profile, and sharing that profile, anonymously, will help similar people as well? so far this direction is not explicit enough in how the site works, possibly for lack of sufficient data, but you can infer it as you go through the questionnaires.

The data mining aspect of having a resource such as CureTogether’s database is naturally extremely valuable. CureTogether’s founders share some of their findings on their blog. The power of applying computer science analytics and experimentation methodologies – sharpened by web-derived needs – to social sciences and others, reminded me of Ben Schneiderman’s talk on “Science 2.0. The idea that computer science can contribute methodologies that stretch beyond the confines of computing machines is a mind-boggling one, at least for me.

But would you trust collaborative filtering with your health? it’s no wonder that the main popular conditions on the site are far from life threatening, and the popular ones are such with unclear causes and treatments, such as migraines, back pains and allergies. Still, the benefit on these alone will probably be sufficient for most users to justify signing up.

IBM IR Seminar Highlights (part 2)

The seminar’s third highlight for me (in addition to IBM’s social software and Mor’s talk), was the keynote speech by Human-Computer Interaction (HCI) veteran Professor Ben Schneiderman of UMD. Ben’s presentation was quite an experience, but not in a sophisticated Lessig way (which Dick Hardt adopted so well for identity 2.0), rather by sheer amounts of positive energy and passion streaming out of this 60-year-old.

[Warning – this post turned out longer and heavier than I thought…]

Ben Shneiderman in front of Usenet Treemap - flickr/Marc_SmithBen is one of the founding fathers of HCI, and the main part of his talk focused on how visualization tools can serve as human analysis enhancers, just like the web as a tool enhances our information.

He presented tools such as ManyEyes (IBM’s),  SpotFire (which was his own hitech exit), TreeMap (with many examples of trend and outlier spotting using it) and others. The main point was in what the human eye can do using those tools, that no predefined automated analysis can, especially in fields such as Genomics and Finance.

Then the issue moved to how to put such an approach to work in Search, which like those tools, is also a power multiplier for humans. Ben described today’s search technology as adequate mainly in “known item finding”. The more difficult tasks that can’t be answered well in today’s search, are usually for a task that is not “one-minute job”, such as:

  • Comprehensive search (e.g. Legal or Patent search)
  • Proving negation (Patent search)
  • Finding exceptions (outliers)
  • Finding bridges (connecting two subsets)

The clusters of current and suggested strategies to address such tasks are:

  • Enriching query formulation – non-textual, structured queries, results preview, limiting of result type…
  • Expanding result management – better snippets, clustering, visualization, summarization…
  • Enabling long-term effort – saving/bookmarking, annotation, notebooking/history-keeping, comparing…
  • Enhancing collaboration – sharing, publishing, commenting, blogging, feedback to search provider…

So far, pretty standard HCI ideas, but then Ben started taking this into the second part of the talk. A lot of the experimentation employed in these efforts by web players has built an entire methodology, that is quite different from established research paradigms. Controlled usability tests in the labs are no longer the tool of choice, rather A/B testing on user masses with careful choice of system changes. This is how Google/Yahoo/Live modify their ranking algorithms, how Amazon/NetFlix recommend products, how the Wikipedia collective “decides” on article content.

This is where the term “Science 2.0” pops up. Ben’s thesis is that some of society’s great challenges today have more to learn from Computer Science, rather than traditional Social Science. “On-site” and “interventionist” approaches should take over controlled laboratory approaches when dealing with large social challenges such as security, emergency, health and others. You (government? NGOs? web communities?) could make actual careful changes to how specific social systems work, in real life,  then measure the impact, and repeat.

This may indeed sound like a lot of fluff, as some think, but the collaboration and decentralization demonstrated on the web can be put to real life uses. One example on HCIL is the 911.gov project for emergency response, as emergency is a classic case when centralized systems collapse. Decentralizing the report and response circles can leverage the power of the masses also beyond the twitter journalism effect.