Vanity Press or Monopoly Busters?

A few months ago, I got an email from “iConcept Press” inviting me to write a book chapter in their IR journal based on my AAAI paper. I ignored it, like I ignored another email in a similar vein from another “publishing house”, and found at least one blogger who was just as suspicious at this seemingly mass solicitation.

You see, in the academy we are conditioned to believe that the lower chances of acceptance, the better the venue for publishing, so if you’re willing to accept me to your club right from the start – huh, forget it!

A couple of weeks ago I got another mail from them. This time, the happy bunch invited me to be a reviewer on one of their books. Now, that was really amusing – if not a writer, then I’d be a reviewer? pathetic, I thought. But is the picture really this simple?

It was interesting, first, to see that they do actually use a peer-review system, even if perhaps not a super-duper double-blind system. And then I started wondering, is that conditioning for favoring low-acceptance publications really still relevant in the self-publishing era?

I remember when I published my first paper on AAAI, I was quite outraged at the idea that you have to pay, then to give away all copyrights, and then be used as a money bait for readers, as the publication meant I could not give free access to my own readers, unless I pay again. In a time when publishing your words on the web is such a common privilege, that seems plain wrong.

Back in the times when publishing was a costly process, high selection rate guaranteed that subscribers won’t waste their money sponsoring the print of low-quality papers. Furthermore, anything not printed had a very low chance of getting read by other researchers, not to mention cited, and so readers relied on editors to indeed include only the best. Nowadays, papers are read mostly online, and if your paper is accessible to search engines, that suffices – whoever finds your research useful will read and cite it. This Wikipedia entry has the whole story in a nutshell.

So as for myself – I still did not publish or review in iConcept press, but I am now less dismissive of this somewhat disruptive industry; not because it will win over the established venues, but because it will accelerate the move towards decentralized and online publishing, better fit for our era.

Death of a News Reader

Dave Winer says I don’t read his posts. He’s right, I admit. I skim.

I’m overloaded. So in the past few months I’ve gradually reduced my subscription list from over 50 feeds to around a dozen, and at the same time increased my reliance on Genieo, which claims to be tracking already 537 feeds for me (though not all are ones I really would fully subscribe to, but that’s the beauty of it…)

When trying to understand what had happened, I came to realize my reader subscriptions list was made of two types of feeds:

  1. Feeds that are generally on topics I’m interested in
  2. Blogs where I thought the author was interesting or smart

Type #1 is, being practical, simply not scalable. There are just too many good sources out there, and not all posts in them are really read-worthy for me, even if just to skim through. So I let Genieo discover those feeds (just clicking through to some posts) and then removed them from my subscription list. It’s amazing how good it feels to safely eliminate a feed from your reader (“…yes, I am sure I want to delete!” :))

Type #2 is more tricky as I would usually be interested in all of the posts even if not in my topics of interest. These include blogs by friends, and blogs by smart people I stumbled upon who seemed worth following. I also wouldn’t want Genieo (or any other learning reader for that matter) to think I’m generally interested in those more random topics and clutter my personalized feed. So I still kept this much shorter list in my reader, but I know I can visit them a lot less frequently and not lose anything.

This combination has been working well for me in recent months. Social diet hurray!

Web(MD) 2.0

Just when I thought that the uses for recommendation systems were already exhausted…

CureTogether is a site that lets you enter your medical conditions (strictly anonymous, only aggregated data are public), and get recommended for… other “co-morbid” conditions you may have. In other words, “people who have your disease usually also have that one too, perhaps you have it too?

Beyond the obvious jokes, this truly has potential. You don’t only get “recommended” for conditions, but rather also for treatments and causes. We all know that sometimes we have our own personal treatment that works only for us. What if it works for people in our profile, and sharing that profile, anonymously, will help similar people as well? so far this direction is not explicit enough in how the site works, possibly for lack of sufficient data, but you can infer it as you go through the questionnaires.

The data mining aspect of having a resource such as CureTogether’s database is naturally extremely valuable. CureTogether’s founders share some of their findings on their blog. The power of applying computer science analytics and experimentation methodologies – sharpened by web-derived needs – to social sciences and others, reminded me of Ben Schneiderman’s talk on “Science 2.0. The idea that computer science can contribute methodologies that stretch beyond the confines of computing machines is a mind-boggling one, at least for me.

But would you trust collaborative filtering with your health? it’s no wonder that the main popular conditions on the site are far from life threatening, and the popular ones are such with unclear causes and treatments, such as migraines, back pains and allergies. Still, the benefit on these alone will probably be sufficient for most users to justify signing up.

Facebook account is down. Is the Internet down?

My Facebook account was “temporarily unavailable due to site maintenance” today.

Seems like I’m far from the first person this happened to. It’s common enough to make it into Facebook’s FAQ.

So – no big deal, right? just had to wait a while with uploading photos from today’s trip with the kids, a little annoyance, nothing more. Then I wanted to check in the meantime what’s up on another site. Guess what I used as a login there? yep, Facebook connect. No login for you!

Facebook may be getting away with it for now, as it seems like these “maintenance” downtimes didn’t create negative buzz for Facebook connect’s position as the identity of choice for many avid FB users. But watch as more of these incidents start raising awareness to the implication of relying on Facebook as an identity provider. We’ll then realize it’s another point of failure on our way to our favorite sites, and one with no simple workaround.

Truth be told, this is not a Facebook issue, it’s an issue for centralized identity providers. If WordPress.com were down, my OpenID identity would be down just the same. With unified identity comes a unified point of failure…

Yahoo Gives Up on Social Search

In an interview that strangely made headlines only in Indian tech blogs, Yahoo Research Labs’ Chief Prabhakar Raghavan declared that Yahoo will not replace its search with Bing. OK, the Yahoo-Microsoft deal is not really off, but the deal details turn out to imply that Yahoo will only use Microsoft search technology as the backend, and keep building its own smart front-end to it that will make use of Yahoo’s content assets. Raghavan says:

“Yahoo will not use Bing. Bing is a branded search engine that Microsoft is building on top of its search back-end and we will build our own search front-end on that same Microsoft back-end. It (using Bing) is not the case, at least as envisioned at the moment”

This actually makes perfect sense. Stop spending tons of resources on crawling and ranking in a futile war with Google, and focus on building the user experience over it, leveraging Yahoo’s advantage – content. Raghavan mentions scenarios that sound a lot like Yahoo shortcuts (that’s really old news) as one example of how to deliver a more complete experience over commodity search results.

The article then goes on to discuss the second focus for Yahoo, social applications, and mentions Microsoft’s tie-up with Facebook for access to social graph. Raghavan is quoted as saying:

“Social networks are not just a place to hang out, but to get things done. It predates the web.. I’m not sure where the sweet spot is, we’re still doing research on it”

Also makes perfect sense. With Google as a common enemy, and Microsoft a Facebook partner, Yahoo may be better positioned to deliver social applications that leverage the de-facto standard of Facebook graph, rather than push its own failed networks.

So why is my post title suggesting what it’s suggesting??


There is one catch in sub-contracting your search results: you are now limited with what you can do in search ranking. The best you can do is re-rank the set of results Microsoft’s technology supplied you with before presenting it to the user. As I’ve pointed out in the past when talking about Delver’s technology, social (graph-based) search is a game that cannot be played by reranking, since it’s a classic long tail problem. So when you can’t interfere with how search results are ranked, you also can’t deliver true social search, as Google recently did. One less social application Yahoo can build…

Searching for Faceted Search

Just finished reading Daniel Tunkelang’s recently published book on Faceted Search. I read Daniel’s blog (“The Noisy Channel“) regularly, and enjoy his good mix of IR practice with emphasis on Human-Computer Interaction (HCI). With faceted search tasks on the roadmap at work, I wanted to better educate myself on the topic, and this one looked like a good read, with the cover promising:

“… a self-contained treatment of the topic, with an extensive bibliography for those who would like to pursue particular aspects in more depth”

With 70 pages, the book reads quickly and smoothly. Daniel provides a fascinating intro to faceted search, from early taxonomies, to facets, to faceted navigation and on to faceted search. He adds an introductory chapter on IR, which is a worthwhile read even for IR professionals with some interesting insights. One is how ranked retrieval that we all grew so accustomed of, blurred the once clear border of relevant vs. non-relevant that set retrieval enforced. Daniel suggests that this issue is significant for faceted search, being a set-retrieval oriented task, and a pingback on his blog led me to a fascinating elaboration on this pain in another fine search blog (recommended read!).

With such elaborate introductory chapters and more on faceted search history, not much is left though for the actual chapters on research and practice, and as a reader I felt there could be a lot more there. But then, it is reasonable to leave a lot to the reader and just give a taste of the challenges, to be later explored by the curious reader from the bibliography.

However, that promise for extensive bibliography somewhat disappointed me… With 119 references, and only about a quarter being academic publications from the past 5 years, I felt a bit back to square one. I was hoping for more of a literature survey and pointers when discussing the techniques for those tough issues, such as how to choose the most informational facets for a given query or how to extract facets from unstructured fields. Daniel provide some useful tips on those, but reading more on these topics will require doing my own literature scan.

In any case, for a newcomer with little background in search in general and faceted in particular, this book is an excellent introduction. Those more versed with classic IR moving into faceted search, will find the book an interesting read but probably not sufficient as a full reference.

The (Filtered) Web is Your Feed

A few months ago I was complaining here about my rss overload. A commenter suggested that I take a look at my6sense, a browser extension (now also iPhone app) that acts as a smart RSS reader, emphasizing the entries you should be reading. I wanted to give my6sense a go then, but the technical experience was lousy, and moreover – I was expected to migrate my rss reading to it. Too much of a hassle, I gave it up.

In the past few weeks I’ve been test-driving a new player – Genieo, which takes the basic my6sense idea a few steps further. Genieo installs an actual application, not just extension, that plugs into your browser. It tracks your rss feeds automatically, simply by looking for rss feeds in the pages you’re browsing, and learns your feeds without any setup work.

Genieo then goes further to discover feeds on pages you visit even if you’re not subscribed to them, turning your entire browsing history into one big rss feed.  It finally filters this massive pool of content using a semantic profile it builds for your interests, based on analyzing the text you’ve read so far.

For IR people this may sound a lot like Watson, Jay Budzik’s academic project turned contextual search turned an advertising technology acquisition. Watson approached this problem as a search problem: how would I formulate search queries that would run in the background, fetching me the most relevant documents that match the user’s current context? problem is, users are not constantly searching, and would get quickly annoyed by showing general search results when not asked for.

The good thing about an rss feed is that it explicitly says “this is a list of content items to be consumed from this source“, and its temporal nature provides a natural preference ranking (prefer recent items), so a heuristic of “users would be interested in recent and relevant items from feeds in pages they visited” works around the general search difficulty pretty well. Genieo circumvents the expected privacy outcry by running the entire logic on the client side, nothing of the analyzed data leaves your PC (privacy warriors would probably run sniffers to validate that).

In my personal experience, the quality of most results is excellent, and they are almost always posts that would interest me. Genieo quickly picked up my feed subscriptions from clicks I made in my reader to the full article in a browser window (from which it extracted the rss feed), and after a while I could see it gradually picking up on my favorite memes (search, social and others). I did not give up my rss reader for Genieo yet, and I also still have many little annoyances with it, but overall for an initial version, it works surprisingly well.

However, the target audience that is even more suited for Genieo is the not rss-savvy users like me, but the masses out there who don’t know and don’t care about reading feeds. They just want interesting news, and they don’t mind missing on the full list (a-la Dave Winer’s “River of News” concept). Such users will find tools like Genieo as useful as a personal news valet can be.

What is Facebook’s Endgame with Open Graph API?

On Thursday, Facebook outlined some of its platform roadmap plans for developers. One of the items on the long list was called the “Open Graph API“, and with such a name it was sure to raise some interest.

Details were scarce, but the general message coming out of Facebook is that the Open Graph API will allow any site to embed a Facebook page in it, allowing the site owner to set status messages, share links etc., without visiting Facebook itself, and more importantly without sending its visitors to Facebook.

That sounded like a feature aimed primarily at brands, or as Ethan Beard of Facebook presented it: “This will be good for brands like Coke.” Makes perfect sense, as these brands are already using Facebook as part of their social media efforts, but would prefer to have it done on their site rather than on Facebook itself.

Thinking deeper into where Facebook is heading, though, I would think there is a more major endgame to all this. We already know that Facebook wants us to consider it as our online identity. So it allows you to reuse that Facebook identity on other websites and sign in using Facebook Connect. That’s one side of the coin. And then the other side of it is, you have your own website or blog where you may publish thoughts, links and photos that you didn’t publish on Facebook. Facebook would clearly want to bridge that gap as well.

belongs-to-us

Half a year ago, Facebook adopted the emerging Activity Streams standard for publishing and consuming an individual’s lifestream events to lifestreaming frameworks, a standard promoted by open standards evangelist Chris Messina. So that fits in nicely into the puzzle now: wouldn’t it be nicer if you could publish all this non-Facebook activity into your Facebook’s page, which will now be embedded into your personal website, courtesy of the Open Graph API?

The API then is just the funnel through which your activity stream is published back into Facebook. You get to leverage the social graph you already defined and came to like on Facebook, and Facebook gets tighter integration with your life outside of Facebook, if you still had any. Smart move for Facebook.

Google Nails Down Social Search

Google’s Social Search is doing the walk, all the rest are just doing the talk. As soon as I activated the Social Search experiment, my next search yielded a social result. No setting up, showing how I am connected to that result (including friends of friends), showing as part of the standard web results…

google-social-searchContrast this with Microsoft’s poor attempt at “social search” by indexing tweets and status messages and showing them regardless of the actual searcher (example search, you’ve got to be on “United States” locale on bing to see it).

Then also contrast it with Facebook’s announcement back in August of its implementation of searching within friends’ posts – a less grandiose announcement that yet delivered far more social experience than Bing’s. Nevertheless, it’s a very limited experience and far from being a true information source for any serious search need.

So how does Google overcome the main obstaclecollecting your connections?

Google relies on its own sources and on open sources it can obtain by crawling the social graph. That is the true reason why Facebook is not part of Google’s graph (no XFN/FOAF marking on Facebook’s public pages). Google may be counting on Facebook’s inevitable opening up, and with Gmail’s rising popularity it becomes a reasonable alternative even for Facebook users like me.

Sadly, all this great news gave zero credit to Delver, where it all happened first

To Tweet or Not to Tweet (hint: that’s not the question)

I was catching up on my RSS overload the other day, when this side note in a post by Naaman on Social Media Multitaskers caught my attention:

“I find that I now blog thoughts that are too long to fit in a tweet; so feel free to follow my tweets…”

"I am the man. I suffered. I was there." CC by 'Kalense Kid'/Flickr

I’m not too much of a media multitasker myself, so I don’t experience this duality first hand, but I can imagine it: you get an interesting thought or experience, then you think is this major enough to develop into a blog post, for which I’ll go over here, or is it not that heavy / can’t be bothered, in which case I flutter my wings over there. Actually I do experience these, just that in the other case I simply drop it (and excuse me for not considering Facebook status updates an option, that’s stuff for another post…)

This should not have been a dilemma at all, had blogging platforms evolved to accommodate microblogging, which today is somehow seen as the centralized domain of a single commercial company. You really should be able to hop on your publishing platform, write that thought down, regardless of length, and fire it out. No need to figure out which channel to use, and whether the intended readers are indeed following you there. Similarly, your friends/readers should not have to register to your feeds on different platforms but rather consume one only, and rely on a powerful set of rules to filter your stream as they find fit.

posterous-mediumPosterous is a great (and fast growing) example of how easy it can be from the blogger’s perspective. Just post it (or rather, email it) and it will get published as needed (e.g. shortened for twitter). But it does not make it any easier on the consumer, who still needs to decide where to best follow this blogger (does he perhaps write additional blog posts directly on his blog that won’t show up on his twitter? or vice versa?’) and reduces the basic filtering capability that may have existed when different post types were distributed into the different services.

No need to reinvent the wheel here, blogging platforms are abundant, decentralized and perfectly fit to remain our publishing hub, with their developed CMS and the loose but well-defined social networks. What blogging platforms should do – heck, what Automattic should do to evolve, is:

  1. Conversation support the realtime conversational nature of short posts, with the right UI and notifications mechanisms. The “P2” microblogging-optimized theme released almost two years ago was a good start, sadly it still followed the line of thought of “blog or microblog, not both”. To move forward, Automattic need to realize that Twitter is not a personality, it’s a state of mind, hence also P2 can’t be a permanent theme, it should be a contextual theme.
  2. Publishingacquire Posterous. As simple as that. These guys got their fame by understanding the pains of publishing anytime anywhere, they know a thing or two on usability and persuasion, and they have great buzz. The latter is not luxury – a buzzed-up acquisition makes it very clear that this is a major strategy for you, a lot more than if you’d develop the same changes yourself.
  3. Consuming – that’s the tricky part… how do you embed Twitter and WordPress into the same stream, when each consumer has their own desired blend of it. We don’t want to invent a new technology, RSS is here to stay. We do want better ways of filtering our floods using better tagging coupled with more clever feed options. How exactly – I do hope there’s an entire team at Automattic working exactly on that…