The semantic elephant in the room - Google will settle the "top down vs. bottom up" debate for us
Here is a useful primer into what some people (perhaps not the best advised) are calling Web3.0.
The fundamental principle of semantifying data is that information becomes more easily found and understood by computers. Mix that with AI and you’ve got some very, very powerful, useful tools for information gathering, processing and decision making!
So why is Google - the information lynchpin of the Internet, and thus, of modern society - not THE focus of attention in all this hubris about Web3.0?
This is a company with around five THOUSAND(1) computer scientists devoted to improving their search engine (~35,000 man hours a day). SURELY they’re building some amazing semantic IP that will help cement their dominance.
A big debate in the semantic field at the moment is whether the best approach is ‘top-down’ or ‘bottom-up’
- Bottom-up: when information is created, it is annotated by machine-readable tags. Technologies like RDF, OWL and microformats (to a basic extent, XML) do this. Bottom-up semantics got a big boost this week when Yahoo announced it was adding RDF descriptors to its pages
- Top-down: when a Google machine finds a document on the web, it reads it and understands the information. That’s very, very advanced computer science (according to my housemate), but that way, when a machine reads a page about Gash, it figures out whether the page is talking about a physical injury, a woman, or a vagina. That’s important if your kid is using Google to learn about first aid… an example of a top-down semantic tool is Dapper.net
Bottom up requires everyone on the Web to ‘play ball’ and change their site. There are big discussions about what format to use, etc. But Google’s withdrawal from these debates suggest that it’s working on top-down semantics and doesn’t need to weigh in on what people do to their sites.
- Google knows that humans are frankly crap at describing and organising things. That’s why Google search worked in the first place, and human-edited directories (like DMOZ, which I once was an editor for, or early-days Yahoo. It went out and found pages, and decided their relative importance, so humans don’t have to. Likewise, with Gmail, it pioneered the folder-less email service - you just search for the email you need, you don’t sort it into folders each time you want it
- For all this talk of Web3.0, Google is actually quite far down the road with understanding the closeness of a website’s content to what you searched for, and discarding irrelevant results. It doesn’t have to change a THING about the Internet, or the way Internet users behave, by incorporating better top-down semantics into it’s search algorithm. Google.com will still look the same; the only difference is that you will be able to use full sentences when you search, to better describe what you want it to find; e.g. “pages about animals like my goldfish’ (would return results about angel fish, clown fish, etc)
- If Google encourages bottom-up, it means each website does the heavy lifting; and any jackass coder can build a tool to leverage that, without too much difficulty. But with top-down, Google retains scarcity/monopoly power, because nobody (except Microsoft) can match the manpower needed to build that kind of IP. Top-down semantics are a technical challenge for Google. But bottom-up semantics would challenge Google’s business. It has the workforce to deal with technical challenges better than anyone. But marketplace evolution? Trickier.
If you take it as given that Google will succeed at whichever semantic approach it chooses, and you accept my reasoning that it can only opt for top-down semantics, and you accept that Google is a major Internet trendsetter (e.g. what Gmail did for inbox storage allowances), you reach the following plausible conclusion:
Google will settle the semantic web debate once and for all, kill bottom-up initiatives dead in the water, and build a top-down semantic web search engine that will cement the big G’s position as a market leader in web search.
That’s a warning to investors and coders who are interested in any bottom up (and even to an extent, top-down) semantic web startup. And if it settles the debate, perhaps man hours won’t be wasted on the wrong approach to organising information on the web. Far better the Dapper approach.
(1) 16,805 total employees (source: http://www.google.com/press/pressrel/revenues_q407.html) times “We’re so serious about improving search that more than a third of our people are working on it” (http://graemethickins.typepad.com/graeme_blogs_here/2008/03/googles-annual.html)
Related:
- Is Google using your brain as you browse?
- I just stumbled across a research paper published by a Google employee and a Microsoft employee entitled "A Case for Usage Tracking to Relate Digital Objects". I have no idea who Elin Rønby Pedersen is but she's published both on this and on Google's much vaunted foray into organising your health data. The paper highlights an interesting idea, potentially just as important to Future Google as Pagerank has been to Google so far. It's not groundbreaking - you see it on, for example, Amazon. But it's worth thinking about, applied to the whole web. The idea is that related objects - and I use the term extremely loosely here - can be identified because you looked at them during a session of Internet browsing; you started with one, and your later browsing takes you to related objects - blog posts or news articles on the same or related subject; similar videos; etc. Your brain does the hard work of deciding what objects you're looking for; average that with other similar datasets and Google has a pretty damn good idea of what objects on the web are related, no matter what format the object has (could be visual, textual, a flash...
- Google Friend Connect - part 2: The largest Social Network ever built
- Having originally assumed that the reason Facebook, Hi5 and LinkedIn (FHL), amongst others, were involved in the Google Friend Connect (GFC) service, I initially wanted to write this post to argue that this was the biggest strategic mistake of their lives. Turns out, Google is involving them whether they like it or not - using their APIs to let you pull in your friend data to your Google Friend Connect profile from your other social networks. In light of this, the point I'll argue is therefore that not slamming the door on GFC's scraping of their data would be a fatal mistake for FHL. Needless to say, deprived of their data, GFC loses all its value to users - so this is a zero-sum game. I argued yesterday that all FHL could possibly gain from this is more information about you as you browse around the web and use social features on various websites. That's an interesting datapoint (which they may not even have access to because they're unwilling participants in this scheme), but long term, being part of GFC means their sites will be abandoned as Google rolls out the biggest social network mankind has ever seen, building...
- Google Friend Connect - part I: it’s about the data
- This week, Google announced a new tool to help me and all other website owners create social features in our sites. It's a library of javascript gadgets that I link to (in the Google library) from my site, and loads up in the site (imagine it instead of the Disqus comments system I currently have installed) to add features for visitors which they can use by signing in - like comments, a chatroom, a photo gallery for people to upload photos to, product reviews, whatever. Blogopunditry and civil rights hippies are pleased that you can log in with a google account, or OpenID, AIM, Yahoo, maybe others in future - so this isn't a straight-up move to get people to sign up Google Accounts. No, it's far more clever than that. According to their demo video, once you have a Google Friend Connect (GFC) account (having logged in with yahoo, google, openID, whatever), you can tell it who all your friends are - you simply link to your Facebook, Hi5, Orkut and/or LinkedIn social networks and it sucks that information out. For you, that's cool, because when you use the chatroom on my site, it will tell you which of...
Related posts brought to you by Yet Another Related Posts Plugin.
Add New Comment
Viewing 6 Comments
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks
(Trackback URL)
March 27, 2008 at 2:27 pm
[...] Semantic Web Patterns: A Guide to Semantic Technologies by Alex Iskold http://www.readwriteweb.com/archives/semantic_web_patterns.php#comments-open The semantic elephant in the room - ...