Over The Counter Culture

Staring at the sun
Latest Posts »
Popular »
» Beijing/Shanghai
» A-nyhao!
» India on the road - Part 2
» India - a summary
» Google Friend Connect - part 2: The largest Social Network ever built
» Social networking dividend of open conversations
» Conversation platforms will make blogs redundant
» Arsenal FC transfer budget to be cut ‘because of property market slowdown’
« Cool tshirt
Should new media actually try to compete with piracy? »

The semantic elephant in the room - Google will settle the "top down vs. bottom up" debate for us

Here is a useful primer into what some people (perhaps not the best advised) are calling Web3.0.

The fundamental principle of semantifying data is that information becomes more easily found and understood by computers. Mix that with AI and you’ve got some very, very powerful, useful tools for information gathering, processing and decision making!

So why is Google - the information lynchpin of the Internet, and thus, of modern society - not THE focus of attention in all this hubris about Web3.0?

This is a company with around five THOUSAND(1) computer scientists devoted to improving their search engine (~35,000 man hours a day). SURELY they’re building some amazing semantic IP that will help cement their dominance.

A big debate in the semantic field at the moment is whether the best approach is ‘top-down’ or ‘bottom-up’

  1. Bottom-up: when information is created, it is annotated by machine-readable tags. Technologies like RDF, OWL and microformats (to a basic extent, XML) do this. Bottom-up semantics got a big boost this week when Yahoo announced it was adding RDF descriptors to its pages
  2. Top-down: when a Google machine finds a document on the web, it reads it and understands the information. That’s very, very advanced computer science (according to my housemate), but that way, when a machine reads a page about Gash, it figures out whether the page is talking about a physical injury, a woman, or a vagina. That’s important if your kid is using Google to learn about first aid… an example of a top-down semantic tool is Dapper.net

Bottom up requires everyone on the Web to ‘play ball’ and change their site. There are big discussions about what format to use, etc. But Google’s withdrawal from these debates suggest that it’s working on top-down semantics and doesn’t need to weigh in on what people do to their sites.

  1. Google knows that humans are frankly crap at describing and organising things. That’s why Google search worked in the first place, and human-edited directories (like DMOZ, which I once was an editor for, or early-days Yahoo. It went out and found pages, and decided their relative importance, so humans don’t have to. Likewise, with Gmail, it pioneered the folder-less email service - you just search for the email you need, you don’t sort it into folders each time you want it
  2. For all this talk of Web3.0, Google is actually quite far down the road with understanding the closeness of a website’s content to what you searched for, and discarding irrelevant results. It doesn’t have to change a THING about the Internet, or the way Internet users behave, by incorporating better top-down semantics into it’s search algorithm.    Google.com will still look the same; the only difference is that you will be able to use full sentences when you search, to better describe what you want it to find; e.g. “pages about animals like my goldfish’ (would return results about angel fish, clown fish, etc)
  3. If Google encourages bottom-up, it means each website does the heavy lifting; and any jackass coder can build a tool to leverage that, without too much difficulty. But with top-down, Google retains scarcity/monopoly power, because nobody (except Microsoft) can match the manpower needed to build that kind of IP. Top-down semantics are a technical challenge for Google. But bottom-up semantics would challenge Google’s business. It has the workforce to deal with technical challenges better than anyone. But marketplace evolution? Trickier.

If you take it as given that Google will succeed at whichever semantic approach it chooses, and you accept my reasoning that it can only opt for top-down semantics, and you accept that Google is a major Internet trendsetter (e.g. what Gmail did for inbox storage allowances), you reach the following plausible conclusion:

Google will settle the semantic web debate once and for all, kill bottom-up initiatives dead in the water, and build a top-down semantic web search engine that will cement the big G’s position as a market leader in web search.

That’s a warning to investors and coders who are interested in any bottom up (and even to an extent, top-down) semantic web startup. And if it settles the debate, perhaps man hours won’t be wasted on the wrong approach to organising information on the web. Far better the Dapper approach.

(1) 16,805 total employees (source: http://www.google.com/press/pressrel/revenues_q407.html) times “We’re so serious about improving search that more than a third of our people are working on it” (http://graemethickins.typepad.com/graeme_blogs_here/2008/03/googles-annual.html)

del.icio.us Tags: semantic web,web 3.0,dapper,google,search engines
Bookmark/Share:

Related:

Is Google using your brain as you browse?
I just stumbled across a research paper published by a Google employee and a Microsoft employee entitled "A Case for Usage Tracking to Relate Digital Objects". I have no idea who Elin Rønby Pedersen is but she's published both on this and on Google's much vaunted foray into organising your health data. The paper highlights an interesting idea, potentially just as important to Future Google as Pagerank has been to Google so far. It's not groundbreaking - you see it on, for example, Amazon. But it's worth thinking about, applied to the whole web. The idea is that related objects - and I use the term extremely loosely here - can be identified because you looked at them during a session of Internet browsing; you started with one, and your later browsing takes you to related objects - blog posts or news articles on the same or related subject; similar videos; etc. Your brain does the hard work of deciding what objects you're looking for; average that with other similar datasets and Google has a pretty damn good idea of what objects on the web are related, no matter what format the object has (could be visual, textual, a flash...
Google Friend Connect - part 2: The largest Social Network ever built
Having originally assumed that the reason Facebook, Hi5 and LinkedIn (FHL), amongst others, were involved in the Google Friend Connect (GFC) service, I initially wanted to write this post to argue that this was the biggest strategic mistake of their lives. Turns out, Google is involving them whether they like it or not - using their APIs to let you pull in your friend data to your Google Friend Connect profile from your other social networks. In light of this, the point I'll argue is therefore that not slamming the door on GFC's scraping of their data would be a fatal mistake for FHL. Needless to say, deprived of their data, GFC loses all its value to users - so this is a zero-sum game. I argued yesterday that all FHL could possibly gain from this is more information about you as you browse around the web and use social features on various websites. That's an interesting datapoint (which they may not even have access to because they're unwilling participants in this scheme), but long term, being part of GFC means their sites will be abandoned as Google rolls out the biggest social network mankind has ever seen, building...
Google Friend Connect - part I: it’s about the data
This week, Google announced a new tool to help me and all other website owners create social  features in our sites. It's a library of javascript gadgets that I link to (in the Google library) from my site, and loads up in the site (imagine it instead of the Disqus comments system I currently have installed) to add features for visitors which they can use by signing in - like comments, a chatroom, a photo gallery for people to upload photos to, product reviews, whatever. Blogopunditry and civil rights hippies are pleased that you can log in with a google account, or OpenID, AIM, Yahoo, maybe others in future - so this isn't a straight-up move to get people to sign up Google Accounts. No, it's far more clever than that. According to their demo video, once you have a Google Friend Connect (GFC) account (having logged in with yahoo, google, openID, whatever), you can tell it who all your friends are - you simply link to your Facebook, Hi5, Orkut and/or LinkedIn social networks and it sucks that information out. For you, that's cool, because when you use the chatroom on my site, it will tell you which of...

Related posts brought to you by Yet Another Related Posts Plugin.

This entry was posted on Wednesday, March 26th, 2008 at 8:03 pm and is filed under Musings. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

discussion by DISQUS

Add New Comment

  • Subscribe:  This Thread
  • Go to:  My Comments ·  Community Page
  • Sort thread by:

    Viewing 6 Comments

    Thanks. Your comment is awaiting approval by a moderator.

    Do you already have an account? Log in and claim this comment.

      • ^
      • v
      • Permalink
      • Admin
        • Remove Post
        • Block email
        • Block IP address
      Stephen Paul Weber 7 months ago 1 point

      Please login to rate.

      Do you already have an account? Log in and claim this comment.

      One problem with your argument -- Google supports microformats (hAtom in Blogger, SGAPI) and APP
      reply  edit  reblog  flag
      http://singpolyma.net/ /people/3ab4d3a66e470ce10eb7ec812fab3c46/
      • ^
      • v
      • Permalink
      • Admin
        • Remove Post
        • Block email
        • Block IP address
      Pete 7 months ago 1 point

      Please login to rate.

      Do you already have an account? Log in and claim this comment.

      Philippe

      Most of the development on the web is bootom up and will be regardless of google or even the money involved. There are over a billion people on the we and even if a small percentage of hobbyists are working on bottom up semantic search this dwarfs google. I am currently adding a sparql to my own project that has yet to turn a dime:

      http://localhero.biz/

      Semantic search kills one of google advantages (accuracy) but it doesn't really have trust or relevance built in so time will tell......
      reply  edit  reblog  flag
      http://localhero.biz/ /people/fb52bc47132c10e3c57820005dfde9e2/
      • ^
      • v
      • Parent
      • Permalink
      • Admin
        • Remove Post
        • Block username
        • Block email
        • Block IP address
      Philippe Bradley 6 months ago 1 point

      Please login to rate.

      Do you already have an account? Log in and claim this comment.

      what do you class as being different between accuracy and relevance?
      reply  edit  reblog  flag
      11 /people/phbradley/ /people/phbradley/following/ http://www.overthecounterculture.com 36800994 in/pbradley flipbrad
      • ^
      • v
      • Permalink
      • Admin
        • Remove Post
        • Block email
        • Block IP address
      Pete 6 months ago 1 point

      Please login to rate.

      Do you already have an account? Log in and claim this comment.

      Maybe precision is a better word. An example should suffice:

      If someone is looking for rugby teams in Melbourne a search of Petes semantic store would probably return the website of the Old Xavs rugby team here:

      http://shawcup.localhero.biz/

      This is 100% precise in that I do assert that the old xavs are a rugby team in Melbourne. Its probably not relevant to most people nor is my assertion neccessarily trustworthy (I could be lying). A perfect search mechanism would nail all three aspects.
      reply  edit  reblog  flag
      http://localhero.biz /people/ee6cccae40a29f2fc886d06c55b48426/
      • ^
      • v
      • Parent
      • Permalink
      • Admin
        • Remove Post
        • Block username
        • Block email
        • Block IP address
      Philippe Bradley 6 months ago 2 points

      Please login to rate.

      Do you already have an account? Log in and claim this comment.

      ah, I understand now, thanks Pete.
      reply  edit  reblog  flag
      11 /people/phbradley/ /people/phbradley/following/ http://www.overthecounterculture.com 36800994 in/pbradley flipbrad
      • ^
      • v
      • Permalink
      • Admin
        • Remove Post
        • Block username
        • Block email
        • Block IP address
      Q dub 6 months ago 1 point

      Please login to rate.

      Do you already have an account? Log in and claim this comment.

      Sharp point about Google's competitive advantage.

      Top-down and bottom-up can work together in clever ways. By inserting non-disruptive prompts when content is being generated (e.g. "Wordpress thinks this is an event. Yes/No?") we can create microformatted content for the basics and leave the heavy lifting such as the "Gash" problem to to the AI geniuses.
      reply  edit  reblog  flag
      1 /people/qdub/ /people/qdub/following/ http://qwang.net qdub
     
    discussion by DISQUS

    Add New Comment

    Trackbacks

    (Trackback URL)

    • Semantic Web « Richard Eskins

      March 27, 2008 at 2:27 pm

      [...] Semantic Web Patterns: A Guide to Semantic Technologies by Alex Iskold http://www.readwriteweb.com/archives/semantic_web_patterns.php#comments-open The semantic elephant in the room - ...

    close ()

    status via twitter

    recent comments (follow comments)

      View Profile »
      Powered by Disqus · Learn more
      close Reblog this comment
      Powered by Disqus · Learn more
      blog comments powered by Disqus
      • Home
      • About
      • List all posts
      • Current Reading
      • Categories
        • Culture bucket
        • Lifestream
        • Musings
        • New science
      • Search

      Over The Counter Culture is proudly powered by WordPress
      Entries (RSS) and Comments (RSS).