Technology18 Nov 2005 01:33 am

Back in 2000, in an article titled “Not Your Father’s Internet”, Bill Gates wrote

In many respects, today’s Internet actually mirrors the old mainframe model, with the browser playing the role of “dumb terminal.” All the information you want is located in centralized databases, and served up a page at a time (from a single Web site at a time) to individual users. Web pages are simply an HTML “picture” of the data you need, not the underlying data itself.

What Gates is describing here is the fundamental difference between the Internet infrastructure which stores and exchanges raw information and the Web whose purpose is to convey this information to humans.

Currently, for any type of information, there are often multiple sites each with their own database containing information of that type. Let’s take a simple type of information like classifieds, specifically auto classifieds. There are several sites on the Web that have auto classifieds listings: AutoTrader.com, Craigslist, Cars.com, and many others. Now, if you need to search these classifieds to find a 2001 Honda Civic in your area, you will need to go to each site and perform a search. Horrible.

To be more efficient, you could try a classifieds meta-search site like Oodle which will automate the process of searching several classifieds sites for you and return you a single aggregated result. Sure this is a time saver but there are inherent limitations to meta-search engines. Meta-engines do not, of course, have access to AutoTrader’s database or Cars.com’s database, all they can do is crawl and scrape these sites which is an imperfect process. No matter how much intelligence you can build into the scraper, it will never provide a superbly accurate, comprehensive, or up-to-date set of results. There are other limitations like being only able to search the common denominator of information (if Cars.com differentiates between transmission type but AutoTrader.com does not, then Oodle can’t offer transmission-type search refinement).

This same auto classifieds example can be applied to many types of information: product data, job listings, news articles, etc. It is a coincidence that these are the same information types found on Google Base? Of course not.

Ultimately what we humans want is the perfect set of information matching a given search. Any search engine, if limited to searching humanly readable documents (e.g. HTML, PDF, etc.) will never be able to provide perfect information. A better search engine will have access to raw, unadulterated, structured information.

Google Base is simply an attempt to unify the data found in the databases of the world. It’s not sexy, but raw information isn’t sexy. While you and I can add our own data to Google Base, the real power is in the bulk data upload. Imagine if the major classifieds sites continuously uploaded their data to Google Base. Google Base would then become the ultimate classified search. Now, of course, that’s not going to happen so easily because a site like Craigslist, whose value comes entirely from the information in their database (some would argue that Craigslist has other significant value-adds like its user community and simplistic interface), will effectively be putting itself in the fast-lane towards extinction.

However, if eBay were to upload auction listings to Google Base, that would be great for eBay because it would allow Google to more effectively search eBay auction listings. Unlike in Craigslist’s case, it would not threaten eBay’s existence. That’s because for eBay, the auction data is just one part of the puzzle in the auction process. eBay still owns the surrounding processes, like bidding and payment, which are necessary for the auction data to be significant. I doubt Google really has any desire to get into the auction vertical. Google just wants to organize information, not build verticals around this information.

Tags:, ,

WordPress database error: [Can't open file: 'wp_comments.MYI' (errno: 144)]
SELECT * FROM wp_comments WHERE comment_post_ID = '9' AND comment_approved = '1' ORDER BY comment_date

Leave a Reply