It's Rishi

Thought streams on the future of tech and media

Archive for the ‘Google’ tag

Google PR5?!

with 2 comments

In the past 24 hours, I’ve suddenly experienced a huge influx of spam comments on this blog. At first I was very annoyed. My e-mail inbox was flooded with new comment notifications from WordPress. I had to login and delete (and subsquently blacklist) them all. After about an hour of being annoyed, a thought occurred to me. “Why am I getting attention from spammers all of a sudden?” In a way, it was actually kind of flattering. In the past, I have gotten just a handful of spam comments presumably because the spammers were busy spamming other sites that were more worth their (bots’) time. Then, I realized that maybe…just maybe.. my Google PageRank had gone up and thus made my blog more appealing to spammers. I checked and, in fact, www.itsrishi.com, is now marked as PR5. Now, to be honest, I’m not sure why I have a PageRank as high as 5. My backlinks don’t really seem to merit it. I’ve had some big (PR7-ish) blogs and aggregators point to some of my posts but those sites also have a large number of outgoing links. PR5 really does seem high. Is it a glitch? Possibly.

I’ll check my PR again later and monitor my metrics to see if I have any boost in traffic referrals from Google search result pages. If, in fact, itsrishi.com is a PR5, I should notice a boost in search referrals especially for popular keywords for which in the past I would be shoved waaayyy to the bottom, even if my post had good relevance.

Written by Rishi

April 5th, 2006 at 2:47 am

Posted in Uncategorized

Tagged with

Google gives BMW.de the “death penalty”

with one comment

According to this article in Google Blogoscoped, BMW’s German site, BMW.de, got banned from Google’s search index because the webmasters of the site apparently tried to fool Google’s crawler, GoogleBot, into assigning it better relevancy than it deserves for several terms. In other words, their site detects if the visitor is GoogleBot (I’m assuming by just checking the User-Agent request header) and if it is returns a page packed with tons of keywords. If it’s a human visitor, then the normal homepage is returned.

The bottom line is, BMW.de tried to cheat, got caught, and now they pay the price: “A search for BMW Germany, which only days ago yielded BMW.de as a top result, now doesn’t show any sign of BMW.de at all.”
You can bet that some BMW IT guys over in Deustcheland are looking for a new job. =)

I found this story interesting because it was the first time I’ve heard of a major, A-list site getting the boot from Google for using shady SEO techniques.

Food for thought: Google penalizing sites that try to game it’s system does help to ensure relevant search results for users. However, by Google removing BMW.de (and other legitimate, useful sites) from their search results, does the user really win? In this example, a Googler looking for BMW’s German site will be at a loss. Ideally, Google needs to use a penalty which hurts only the website, not the user.

Written by Rishi

February 4th, 2006 at 3:30 pm

Posted in Uncategorized

Tagged with , ,

A sobering picture of Internet censorship in China

with one comment

So the big news of the day is that Google has agreed to censor its search results in China. Because Google had not previously complied with China’s policies, the Chinese government either blocked or severely crippled access to Google. China’s a huge, growing market and being shut out of that market was obviously not good for Google. Immediately after the announcement was made, tons of negative headlines both in the mainstream media and the blogosphere sprouted up. The announcement caught everyone by surprise since agreeing to censor its results is so seemingly un-Google like. It’s kinda hard to “do no evil” if you’re implicitly supporting the Chinese government violate the human rights of their citizens. Google’s counter argument is that their hope is to influence a loosening of Chinese policies over the long-term.

To be honest, while I was surprised by the announcement, I wasn’t planning on losing sleep over the matter. After all, what do I care, right? I’ve got the beautiful 1st Amendment by my side (although we do unfortunately have the FCC). Well…

Other search engines, including Y! and MSN, already censor results in China. Out of curiosity, I searched for ‘tiananmen square’ on both Yahoo! and Yahoo! China. The result was sickening.

‘tiananmen square’ search on Yahoo!
1,790,000 hits. Pages of links describing the 1989 protest/massacre.

‘tiananmen square’ search on Yahoo! China
4 hits. Some junk spam.

Unbelievable. I’m finding it difficult for me not to care. It seems like Google’s need to satisfy its shareholders is corrupting the company’s values again.

UPDATE:

Earlier when I was writing this post, it seemed like Google China had not yet enabled censoring. Although, I just checked again and it looks like in fact it is now.

‘tiananmen square’ search on Google
1,810,000 hits. Pages of links describing the 1989 protest/massacre.

‘tiananmen square’ search on Google China
10 hits. Chinese tourist guides.

I can’t read Chinese but it does look like Google China is in fact printing an alert indicating to the user that the results have been censored.

Written by Rishi

January 25th, 2006 at 10:37 pm

Posted in Uncategorized

Tagged with , ,

How search engines build advertising space

with 3 comments

In my prior post, I briefly noted how search-engines/aggregators profit off of content that publishers create for them. I shared my thoughts on this topic with a couple friends today and while nothing discussed was enlightening, I figured I might as well throw up a quick post. Since Google is the largest search engine, I’ll use them as an example.

Google is so profitable because they control lots of Web advertising space. They build control of advertising space 3 ways (listed in order of descending profitability):
Inherit – Google inherits ad space from every person who has ever published any sort of Web content and made it accessible. The easy way to think about it is that the more content there is on the Web, the more possible search result pages exist. Each search result page has advertising space. Cost: Developing and operating the search engine which is expensive. But, on a per (ad space) unit basis, the cost is very tiny fractions of a penny.
Create – Google creates ad space by building it’s own applications and sticking AdSense on it. The goal is to build applications/services which result in disproportionately large # of page views and whose pages would likely to contain content for which there would be advertising interest for. E.g. Search, GMail, Maps (for the purpose of Local info and advertising), Froogle. Cost: Same logic as above but the per unit cost is not quite as small because page view volume is not nearly as high compared to Search. Also AdSense ad space is not as lucrative as AdWords because it is less targeted.
Buy – Offering AdSense to publishers. Cost: % of ad revenue. Much less profitable compared to the prior two methods.

While each of these three methods have varying profit margins, they all share one thing in common: making ad space out of content that other people create. It’s the exact opposite of traditional media companies. It’s such a great business to be in and that’s why entrepreneurs get excited about the notion of structured data on the Web because it means they can build great aggregators and get rich. =)

Quick blabber about Google:
Google began life as a search company. Google built a fantastic rapport with its users by offering a tremendously useful, free service. Since then, they have evolved into an advertising company. When Google added advertising, they were able convince users that the ads actually helped the user because the ads were contextual. All advertising is contextual to some extent, but people aren’t thanking TV networks for airing pizza commercials during football games. The commercials help you figure out what to eat while watching the game… right? Google has become the only advertising company in the world that people love. It’s both brilliant and fascinating.

Written by Rishi

December 16th, 2005 at 3:26 am

Posted in Uncategorized

Tagged with , ,

Google increases font size for AdWords ads. Ugh.

with 3 comments

Back in 1998, Google founders Page and Brin published The Anatomy of a Large-Scale Hypertextual Web Search Engine describing the fundamentals behind their new Web search engine. While I would certainly recommend everyone to read the paper, for the purpose of this post, Appendix A: Advertising and Mixed Motives is particularly worth a read. The section describes how search engines who have a business model that relies on advertising revenue are likely to be conflicted. They describe some situations where search results might be altered to please advertisers. As a result, they write “we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers”.

Fast forward a few years and as it turns out, Page and Brin’s Google dominates the search scene and is massively profitable. Wait..what was that? Profit? Where’s this money coming from? Oh that’s right: advertisers. Well, to Google’s credit, they have not (as far as we know) allowed advertisers to directly influence their search results. Instead, AdWords ads (paid results) show up off to the right side of the (organic) search results page, clearly marked under a “Sponsored Links” section. Google’s position on this is that the ads don’t interfere with organic search results, but if you do want to see the ads they are clearly marked and since the ads are targeted to your search, there’s a good chance that they may actually help your search. Fair enough.

As of last week, something changed. The font size used for AdWords advertisements on Google search results pages got larger. The size is now the same size that is used for organic search results. While many users may not even notice the change, I personally find it very dubious. The motivation for increasing font size is to make ads more noticeable and thus more likely to be clicked. This increase in overall CTR brings more advertising dollars to Google and that’s certainly good news for the business.

However, is this just one of the first moves of many to increase ad CTR? I doubt it. Shareholder pressure to maintain profit growth may force the company to employ more tactics to drive ad revenue. Furthermore, what is the message that Google is sending to web publishers? Until now, Google (and of course all the SEO guys) emphasize that most users click on the organic results and not paid results (statistics show that the ratio of organic clicks to paid clicks is about 4:1-5:1) and so to generate traffic to your site, the most effective approach is to produce relevant, quality content and effectively structure it. Google prides itself on rewarding publishers who create good content. But will Google succumb to advertisers over the long-term? I guess we’ll have to wait and see if Brin and Page’s comments back in 1998 hold true.

Written by Rishi

December 14th, 2005 at 4:03 am

Posted in Uncategorized

Tagged with ,

Google Base: the process of unifying data on the Internet

with one comment

Back in 2000, in an article titled “Not Your Father’s Internet”, Bill Gates wrote

In many respects, today’s Internet actually mirrors the old mainframe model, with the browser playing the role of “dumb terminal.” All the information you want is located in centralized databases, and served up a page at a time (from a single Web site at a time) to individual users. Web pages are simply an HTML “picture” of the data you need, not the underlying data itself.

What Gates is describing here is the fundamental difference between the Internet infrastructure which stores and exchanges raw information and the Web whose purpose is to convey this information to humans.

Currently, for any type of information, there are often multiple sites each with their own database containing information of that type. Let’s take a simple type of information like classifieds, specifically auto classifieds. There are several sites on the Web that have auto classifieds listings: AutoTrader.com, Craigslist, Cars.com, and many others. Now, if you need to search these classifieds to find a 2001 Honda Civic in your area, you will need to go to each site and perform a search. Horrible.

To be more efficient, you could try a classifieds meta-search site like Oodle which will automate the process of searching several classifieds sites for you and return you a single aggregated result. Sure this is a time saver but there are inherent limitations to meta-search engines. Meta-engines do not, of course, have access to AutoTrader’s database or Cars.com’s database, all they can do is crawl and scrape these sites which is an imperfect process. No matter how much intelligence you can build into the scraper, it will never provide a superbly accurate, comprehensive, or up-to-date set of results. There are other limitations like being only able to search the common denominator of information (if Cars.com differentiates between transmission type but AutoTrader.com does not, then Oodle can’t offer transmission-type search refinement).

This same auto classifieds example can be applied to many types of information: product data, job listings, news articles, etc. It is a coincidence that these are the same information types found on Google Base? Of course not.

Ultimately what we humans want is the perfect set of information matching a given search. Any search engine, if limited to searching humanly readable documents (e.g. HTML, PDF, etc.) will never be able to provide perfect information. A better search engine will have access to raw, unadulterated, structured information.

Google Base is simply an attempt to unify the data found in the databases of the world. It’s not sexy, but raw information isn’t sexy. While you and I can add our own data to Google Base, the real power is in the bulk data upload. Imagine if the major classifieds sites continuously uploaded their data to Google Base. Google Base would then become the ultimate classified search. Now, of course, that’s not going to happen so easily because a site like Craigslist, whose value comes entirely from the information in their database (some would argue that Craigslist has other significant value-adds like its user community and simplistic interface), will effectively be putting itself in the fast-lane towards extinction.

However, if eBay were to upload auction listings to Google Base, that would be great for eBay because it would allow Google to more effectively search eBay auction listings. Unlike in Craigslist’s case, it would not threaten eBay’s existence. That’s because for eBay, the auction data is just one part of the puzzle in the auction process. eBay still owns the surrounding processes, like bidding and payment, which are necessary for the auction data to be significant. I doubt Google really has any desire to get into the auction vertical. Google just wants to organize information, not build verticals around this information.

Written by Rishi

November 18th, 2005 at 1:33 am