Tantek Çelik’s True Identity Revealed!

This morning I came across a nice little people search demo by Martin Atkins. It’s mainly a wrapper over Social Graph API, but helps illustrate the public social graph. Now, Tantek Çelik is one of the main advocates for Microformats, which in turn generate a lot of the XFN data that feeds SGAPI. So it was quite a surprise to feed his name and see this: 

tantek-search

The horror! the horror!!

What happened here? a quick check on SGAPI led to some strange findings. Turns out that Robert Scoble’s old blog at scoble.weblogs.com is listed as strongly conncted to Tantek’s blog identity. I then went on to check out that blog – no XFN or FOAF to Tantek there. So where did that come from?

A more elaborate dive into SGAPI’s more detailed output showed that Scoble was listed as referencing Tantek with both XFN attributes of “me” and “met“. In plain English, this means that Robert Scoble said “I am Tantek Çelik, plus I also met him in person!”. So what could cause this, except for some serious case of schizophrenia?

My humble guess is that Scoble, some time ago, listed Tantek as a “met” contact on his old blog, but with a magnificient little typo, left out the ‘t’.  He then discovered the mistake and fixed it. But the Googlebot caught both cases, and added them both as relations. Now why would they do that? shouldn’t new data replace old data? well, that’s what other users of SGAPI are asking, see the discussion over at the group. Turns out the SGAPI data is not yet as timely as the main index, and Brad Fitzpatrick promises this will improve soon enough.

Considering the upcoming social diet, it better will…

Update: Hadar pointed out another example, where Chris Messina gets identified with TechCrunch UK… it’s indeed reflected in SGAPI, and I tracked this down to erroneous XFN tagging in an obscure 2006 TCUK post. Indeed demonstrates a weakness of the unmoderated, inherently decentralized XFN-based graph building. Still, for now it’s the only open standard we have, until some higher, post-processing open layer will emerge.

Advertisements

6 responses to “Tantek Çelik’s True Identity Revealed!

  1. 🙂 Great post.

    I recently discovered this people search tool under the name of ” itswhoyouknow” on Google Code.
    Cool Thing, but how can you avoid these horrors?
    Another example – searching “techcrunch” will show Chris Messina in the results as well.

  2. Hi Hadar, thanks.
    Indeed not simple at all. Have a look here for more such potential horrors. What you mention on techcrunch happens when people mistakenly use rel=”me” for general links in their blogroll (for example wordpress or google), but what query did you use? I don’t see Messina’s result there…

  3. The query was simply “techcrunch”
    There are probably more examples…
    IMHO XFN is too sensitive and error prone.
    The more it will be specified by humans (and not
    machines), the more we’ll get such mistakes.

    Also, the slideshow looks interesting, and I read Karmona’s post about the SG challenges…
    Thanks for the info.
    I wonder, except from these great tips, do you (Delver) have an API for developers? Help us overcome the mighty graph 🙂

    Thanks,
    Hadar

  4. I see it now, excellent spotting, I narrowed it down to a real page in TCUK and updated the post. Thanks!

    I agree that XFN has its shortcomings but most of the mistakes I see come from those darn humans, who misinterpret what it means, not the machines. The answer to this is to provide them in a context where they immediately become useful, thus mistakes will immediately harm your experience, so you go and fix it.
    As for API, well we’re not there yet, still so much to process before we can offer something useful, but it will come. In the meantime, SGAPI is a fantastic resource, perhaps you want to share where it doesn’t answer your need and perhaps I can offer an idea.

  5. FWIW FOAF aggregators run into very similar problems. In FOAF we uniquely identify people through any of several properties that are marked ‘inverse functional’ in the schema, eg. homepage, personal mailbox, weblog, etc. Inevitably these are sometimes published incorrectly, screwing up data merging. Pretty much inevitable, but I hope better validators / checking tools, and in general having more code that does stuff with the data, will help to keep things reasonably tidy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s