Over The Counter Culture

Staring at the sun
Latest Posts »
Popular »
» Getting a cutting edge Android smartphone for £85
» Vast EU research grant fraud uncovered, millions lost
» Stewart Brand, on viruses and the scale of things
» UK government amends data protection and cookies law
» Adam Curtis Greencine interview on media elitism, the US and the UK
» NSFW: Oklahoma judge used penis pump during trials
» The Fred Wilson Effect: the benefits of open conversations online
» The Facebook Data Protection Act letter
« Cool tshirt
Should new media actually try to compete with piracy? »

The semantic elephant in the room – Google will settle the "top down vs. bottom up" debate for us

Here is a useful primer into what some people (perhaps not the best advised) are calling Web3.0.

The fundamental principle of semantifying data is that information becomes more easily found and understood by computers. Mix that with AI and you’ve got some very, very powerful, useful tools for information gathering, processing and decision making!

So why is Google – the information lynchpin of the Internet, and thus, of modern society – not THE focus of attention in all this hubris about Web3.0?

This is a company with around five THOUSAND(1) computer scientists devoted to improving their search engine (~35,000 man hours a day). SURELY they’re building some amazing semantic IP that will help cement their dominance.

A big debate in the semantic field at the moment is whether the best approach is ‘top-down’ or ‘bottom-up’

  1. Bottom-up: when information is created, it is annotated by machine-readable tags. Technologies like RDF, OWL and microformats (to a basic extent, XML) do this. Bottom-up semantics got a big boost this week when Yahoo announced it was adding RDF descriptors to its pages
  2. Top-down: when a Google machine finds a document on the web, it reads it and understands the information. That’s very, very advanced computer science (according to my housemate), but that way, when a machine reads a page about Gash, it figures out whether the page is talking about a physical injury, a woman, or a vagina. That’s important if your kid is using Google to learn about first aid… an example of a top-down semantic tool is Dapper.net

Bottom up requires everyone on the Web to ‘play ball’ and change their site. There are big discussions about what format to use, etc. But Google’s withdrawal from these debates suggest that it’s working on top-down semantics and doesn’t need to weigh in on what people do to their sites.

  1. Google knows that humans are frankly crap at describing and organising things. That’s why Google search worked in the first place, and human-edited directories (like DMOZ, which I once was an editor for, or early-days Yahoo. It went out and found pages, and decided their relative importance, so humans don’t have to. Likewise, with Gmail, it pioneered the folder-less email service – you just search for the email you need, you don’t sort it into folders each time you want it
  2. For all this talk of Web3.0, Google is actually quite far down the road with understanding the closeness of a website’s content to what you searched for, and discarding irrelevant results. It doesn’t have to change a THING about the Internet, or the way Internet users behave, by incorporating better top-down semantics into it’s search algorithm.    Google.com will still look the same; the only difference is that you will be able to use full sentences when you search, to better describe what you want it to find; e.g. “pages about animals like my goldfish’ (would return results about angel fish, clown fish, etc)
  3. If Google encourages bottom-up, it means each website does the heavy lifting; and any jackass coder can build a tool to leverage that, without too much difficulty. But with top-down, Google retains scarcity/monopoly power, because nobody (except Microsoft) can match the manpower needed to build that kind of IP. Top-down semantics are a technical challenge for Google. But bottom-up semantics would challenge Google’s business. It has the workforce to deal with technical challenges better than anyone. But marketplace evolution? Trickier.

If you take it as given that Google will succeed at whichever semantic approach it chooses, and you accept my reasoning that it can only opt for top-down semantics, and you accept that Google is a major Internet trendsetter (e.g. what Gmail did for inbox storage allowances), you reach the following plausible conclusion:

Google will settle the semantic web debate once and for all, kill bottom-up initiatives dead in the water, and build a top-down semantic web search engine that will cement the big G’s position as a market leader in web search.

That’s a warning to investors and coders who are interested in any bottom up (and even to an extent, top-down) semantic web startup. And if it settles the debate, perhaps man hours won’t be wasted on the wrong approach to organising information on the web. Far better the Dapper approach.

(1) 16,805 total employees (source: http://www.google.com/press/pressrel/revenues_q407.html) times “We’re so serious about improving search that more than a third of our people are working on it” (http://graemethickins.typepad.com/graeme_blogs_here/2008/03/googles-annual.html)

del.icio.us Tags: semantic web,web 3.0,dapper,google,search engines
Bookmark/Share:

Related:

Is Google using your brain as you browse?
I just stumbled across a research paper published by a Google employee and a Microsoft employee entitled “A Case for Usage Tracking to Relate Digital Objects“. I have no idea who Elin Rønby Pedersen is but she’s published both on this and on Google’s much vaunted foray into organising your health data. The paper highlights [...]...
Google Friend Connect – part 2: The largest Social Network ever built
Having originally assumed that the reason Facebook, Hi5 and LinkedIn (FHL), amongst others, were involved in the Google Friend Connect (GFC) service, I initially wanted to write this post to argue that this was the biggest strategic mistake of their lives. Turns out, Google is involving them whether they like it or not – using [...]...

YARPP powered by AdBistro
Powered by

This entry was posted on Wednesday, March 26th, 2008 at 8:03 pm and is filed under Musings. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

  • Pingback: Semantic Web « Richard Eskins()

  • http://singpolyma.net/ Stephen Paul Weber

    One problem with your argument — Google supports microformats (hAtom in Blogger, SGAPI) and APP

  • http://singpolyma.net/ Stephen Paul Weber

    One problem with your argument — Google supports microformats (hAtom in Blogger, SGAPI) and APP

  • http://localhero.biz/ Pete

    Philippe

    Most of the development on the web is bootom up and will be regardless of google or even the money involved. There are over a billion people on the we and even if a small percentage of hobbyists are working on bottom up semantic search this dwarfs google. I am currently adding a sparql to my own project that has yet to turn a dime:

    http://localhero.biz/

    Semantic search kills one of google advantages (accuracy) but it doesn't really have trust or relevance built in so time will tell……

  • http://www.overthecounterculture.com Philippe Bradley

    what do you class as being different between accuracy and relevance?

  • http://localhero.biz Pete

    Maybe precision is a better word. An example should suffice:

    If someone is looking for rugby teams in Melbourne a search of Petes semantic store would probably return the website of the Old Xavs rugby team here:

    http://shawcup.localhero.biz/

    This is 100% precise in that I do assert that the old xavs are a rugby team in Melbourne. Its probably not relevant to most people nor is my assertion neccessarily trustworthy (I could be lying). A perfect search mechanism would nail all three aspects.

  • http://www.overthecounterculture.com Philippe Bradley

    ah, I understand now, thanks Pete.

  • http://qwang.net Q dub

    Sharp point about Google's competitive advantage.

    Top-down and bottom-up can work together in clever ways. By inserting non-disruptive prompts when content is being generated (e.g. “WordPress thinks this is an event. Yes/No?”) we can create microformatted content for the basics and leave the heavy lifting such as the “Gash” problem to to the AI geniuses.

  • http://friendfeed.com/qdub Q dub

    Sharp point about Google's competitive advantage.

    Top-down and bottom-up can work together in clever ways. By inserting non-disruptive prompts when content is being generated (e.g. “WordPress thinks this is an event. Yes/No?”) we can create microformatted content for the basics and leave the heavy lifting such as the “Gash” problem to to the AI geniuses.

  • Home
  • About
  • List all posts
  • Current Reading
  • Search

Over The Counter Culture is proudly powered by WordPress
Entries (RSS) and Comments (RSS).