The semantic elephant in the room – Google will settle the "top down vs. bottom up" debate for us
Here is a useful primer into what some people (perhaps not the best advised) are calling Web3.0.
The fundamental principle of semantifying data is that information becomes more easily found and understood by computers. Mix that with AI and you’ve got some very, very powerful, useful tools for information gathering, processing and decision making!
So why is Google – the information lynchpin of the Internet, and thus, of modern society – not THE focus of attention in all this hubris about Web3.0?
This is a company with around five THOUSAND(1) computer scientists devoted to improving their search engine (~35,000 man hours a day). SURELY they’re building some amazing semantic IP that will help cement their dominance.
A big debate in the semantic field at the moment is whether the best approach is ‘top-down’ or ‘bottom-up’
- Bottom-up: when information is created, it is annotated by machine-readable tags. Technologies like RDF, OWL and microformats (to a basic extent, XML) do this. Bottom-up semantics got a big boost this week when Yahoo announced it was adding RDF descriptors to its pages
- Top-down: when a Google machine finds a document on the web, it reads it and understands the information. That’s very, very advanced computer science (according to my housemate), but that way, when a machine reads a page about Gash, it figures out whether the page is talking about a physical injury, a woman, or a vagina. That’s important if your kid is using Google to learn about first aid… an example of a top-down semantic tool is Dapper.net
Bottom up requires everyone on the Web to ‘play ball’ and change their site. There are big discussions about what format to use, etc. But Google’s withdrawal from these debates suggest that it’s working on top-down semantics and doesn’t need to weigh in on what people do to their sites.
- Google knows that humans are frankly crap at describing and organising things. That’s why Google search worked in the first place, and human-edited directories (like DMOZ, which I once was an editor for, or early-days Yahoo. It went out and found pages, and decided their relative importance, so humans don’t have to. Likewise, with Gmail, it pioneered the folder-less email service – you just search for the email you need, you don’t sort it into folders each time you want it
- For all this talk of Web3.0, Google is actually quite far down the road with understanding the closeness of a website’s content to what you searched for, and discarding irrelevant results. It doesn’t have to change a THING about the Internet, or the way Internet users behave, by incorporating better top-down semantics into it’s search algorithm. Google.com will still look the same; the only difference is that you will be able to use full sentences when you search, to better describe what you want it to find; e.g. “pages about animals like my goldfish’ (would return results about angel fish, clown fish, etc)
- If Google encourages bottom-up, it means each website does the heavy lifting; and any jackass coder can build a tool to leverage that, without too much difficulty. But with top-down, Google retains scarcity/monopoly power, because nobody (except Microsoft) can match the manpower needed to build that kind of IP. Top-down semantics are a technical challenge for Google. But bottom-up semantics would challenge Google’s business. It has the workforce to deal with technical challenges better than anyone. But marketplace evolution? Trickier.
If you take it as given that Google will succeed at whichever semantic approach it chooses, and you accept my reasoning that it can only opt for top-down semantics, and you accept that Google is a major Internet trendsetter (e.g. what Gmail did for inbox storage allowances), you reach the following plausible conclusion:
Google will settle the semantic web debate once and for all, kill bottom-up initiatives dead in the water, and build a top-down semantic web search engine that will cement the big G’s position as a market leader in web search.
That’s a warning to investors and coders who are interested in any bottom up (and even to an extent, top-down) semantic web startup. And if it settles the debate, perhaps man hours won’t be wasted on the wrong approach to organising information on the web. Far better the Dapper approach.
(1) 16,805 total employees (source: http://www.google.com/press/pressrel/revenues_q407.html) times “We’re so serious about improving search that more than a third of our people are working on it” (http://graemethickins.typepad.com/graeme_blogs_here/2008/03/googles-annual.html)
Pingback: Semantic Web « Richard Eskins()