Archive for the ‘aggregators’ tag
Finding Alpha on the Web
Alpha, in the world of asset management, is the measure of the difference between a portfolio’s actual returns and its expected performance, given its level of risk as measured by beta (beta is basically a measure of a portfolio’s volatility as compared to the market i.e. the S&P 500 index). For alternative investment funds, such as hedge funds, alpha is how one fund is compared to another. Furthermore, a significant chunk of the fees that a fund manager earns is dependent on the alpha that his fund returns. (Let’s ignore the fact that much of the “alpha” was probably unaccounted for beta and fund managers were likely overcompensated). Thus, alpha generation, is critical to a fund manager’s success.
So..where does alpha come from? Benjamin Graham, the father of value investing, found excess returns by buying attractive stocks that were very underpriced — priced far below their tangible book value with no warning signs to justify it. Back then (in the early to mid 20th century), analyzing securities was a challenge. Access to financial statements and timely market news was difficult. Thus for people like Graham — and his disciples — who 1) put in the effort to obtain the relevant data 2) crunched the numbers (remember no Excel!) 3) had the proper analytical framework (courtesy of Graham) and 4) were patient there were bonafide bargains to be had.
Over the course of the 20th century, access to market information improved substantially and the number of “sophisticated” investors skyrocketed. This trend continued until the point where toward the end of the century, all investors both big and small were essentially on an even playing field. Obvious bargains in the equity markets dried up. Given that classic Graham-esque bargains were sparse and that his assets under management was ever increasing, Buffett describes how his investing philosophy evolved over time to focus on concepts such as intrinsic value and economic goodwill.
Today, with ubiquitous access to real-time market information, markets are efficient. An important note here is that I define efficient to mean that the price of a security on the open market is an accurate, without-time-lag reflection of the collective sentiment of investors’ opinion on where the price should be based on news and forecasts as well as emotions/irrationalities. I am not using the word efficient to imply that market prices are always rational. Humans, for reasons that only some of which are scientifically understood, are susceptible to biases and irrational behavior. The bottom line here is that investing, even for the most disciplined strategies, has become difficult. Wouldn’t it be nice to rewind the clock a hundred years back to the Benjamin Graham’s day when access to information was difficult?
I think so.. and this leads me to wonder that maybe investors need to dig deeper to find new sources of information that aren’t obvious to other investors. As I’ve discussed many times before on this blog, there has been an explosion of news publishing because of the Web in many forms, the most obvious perhaps being blogs, both professional and personal, and the less obvious mediums such as twitter and message boards.
One concrete example that comes to mind is a blog post from January 2008 by Markus Frind (founder of plentyoffish.com, one of the largest dating web sites in the world). The post talked about how because of a subtle design change by Google, his AdSense CTR (click-thru rate) dropped by 60%. The change was that in late ‘07, Google changed AdSense so that only the link in the ad is clickable, not the whole ad area. Google had presumably done this to reduce accidental clicks. In the long term this is a good thing for everyone because it reduces click-fraud issues. However, near-term CTR drops a bit, thus # of clicks drops and thus ad revenue drops. Now, ultimately, advertisers should see a rise in their conversion rates (since the quality of clicks goes up) and thus be willing to pay more for each click, evening out the ad revenue. However, there is a lead time for that to happen. Near-term ad revenue drops and there should, in theory, be a negative impact on Google’s quarter. TechCrunch had actually reported on this change on November 11, 2007 but the post came and went without much drama.
Now, fast forward to February 26, 2008, comScore publishes a report indicating that Google’s CTR may be dropping and GOOG drops 4% to close at $471..down from the mid $600 range in January. On February 29, 2008, TechCrunch posts Google CTR Down Due to Click Area Changes referencing the ealrier post in January by Markus Frind.
I remember thinking in January that I should take a short position in GOOG. Of course, I didn’t really take myself seriously, but when the comScore report came out and GOOG dropped sharply, my jaw dropped. At that moment I realized that my idea of finding these nuggets of gold on the Web isn’t crazy. In fact, I’m not the only one with the idea. Roger Ehrenberg, a wall street veteran, had co-founded a company named Monitor110. The company has since went under, but here’s an excerpt from a TechCrunch article on the company:
Monitor110 gathers information from 40 million sources of various types (100 million by the end of next year they say), ranked by financial market knowledge through a proprietary algorithm that takes 50 factors into account – inbound links being just one reputation metric. Users can chose between top sources preselected for their market sector and subscribe to sources of their own. Static sites can be monitored for changes with good granularity. Premium subscription and other deep web sources, blogs, forums, news and regulatory filings are among the sources included.
Here’s an image that used to be on Monitor110’s homepage. It elucidates the concept very well:

Think about an engineer who blogs about how he’s working like crazy because his project at work is behind schedule. If we know that he works at THQ, this information could be valuable to an investor. That’s precisely what Monitor110’s business model was. Sell to hedge funds and other folks desperate for alpha. Is this scalable? I’m not sure. Probably? Maybe too early for it’s time?…it was for Monitor110.
A Detailed Review of Recommendation Systems on the Web
People Who Read This Article Also Read… by Greg Linden of Microsoft Live Labs (and formerly of Findory.com) is a comprehensive review of the uses of recommendation systems on the Web and their implementations. Recommendation systems is a topic that I love and Greg’s descriptions of systems such as that of Google News was very educational.
I’m a huge proponent of the idea that the newspaper, with it’s one-size-fits-all news, is dead. I discussed this in my prior post, Ok, I admit it one size fits all news will die. In this prior post, I discussed the fact that I consume most of my news today using my RSS reader. I’ve added several news feeds, from many topic areas, that I respect and enjoy to my reader and I check it every few hours. I have found that over the past couple years, my awareness of current events in topic areas that I am interested in has risen considerably.
However, there are limitations to the RSS reader. “Rolling” your own news feed takes time to create and maintain. I don’t expect that many will do this. More importantly, though, the scope of the news that is available to me is bounded by the content of those news feeds which I have explicitly included. I don’t doubt that every day I miss news stories that would be of high interest to me because they originate from news sources that I am not following. A news application that can show me news from both my explicitly chosen news sources as well as news stories that come by leveraging recommendation technologies (e.g. “Story X is similar to news stories which Rishi typically reads” and “Story X is being read by many people who have similar news tastes to Rishi”) will be the ultimate solution for me. What’s exciting is that I expect such a news application to be available very soon…
Seeqpod Gets Sued: I knew this was coming
Seeqpod is a music search engine that crawls the web and finds music files. I have used it a few times recently and was pleasantly surprised with the results. Many of the songs that I was looking for were found. Full DRM-free mp3s. Where does Seeqpod find these files? From what’s often called “open directories”. Open directories are typically user directories on web servers that have inadvertently been made public. They often aren’t publicly available for long since once they are found, they are leeched like crazy by users, which drives up bandwidth usage on the user account (which eventually leads to the account being suspended).
Savvy users have been finding open directories for years. With the right search parameters, Google is a great tool for finding such open directories. However, Seeqpod is an ideal tool for this. Not only is it laser focused on finding music, it mashes up relevant discography data and can even stream the search results so you can listen before you download.
The problem is that Seeqpod is essentially a Napster for the Web. Whereas the real Napster searched people’s own local computers for music, Seeqpod searches the Web for music that people have uploaded to servers. While there may be some legitimate content that Seeqpod is crawling, I think it will be very difficult for the Company to defent itself against a new lawsuit from Warner Music which claims that Seeqpod directly contributes to copyright infringement by helping people locate pirated content.
As usual, I think the record labels are picking the wrong battles and need to focus their resources on figuring out how they can add value, and build closer relationships, with music listeners. The recent developments at Last.FM makes me hopeful that the record labels are in fact seeing the light.
Ok, I admit it. One-size-fits-all news will die.
The goal of any news delivery medium is to provide maximum signal-to-noise ratio to its target audience. “Signal” is the set of news items that is of interest to a person. “Noise” is everything else. The reality is that an infinitesimally small percentage of news is interesting to any given person. And that percentage is shrinking every day because more news is being created on a daily basis: more frequently are more people documenting more people who are doing more newsworthy stuff every day.
In order to keep SNR high, news mediums need to focus on the news interests of their audiences more intensely than ever before. However, trying to create a single focus for a group of individuals, each of whose interests differ somewhat, is not a long-term solution. Sites like PerezHilton.com, a leading Hollywood gossip blog, and TechMeme, a leading (especially here in the SV) tech news aggregator, provide a certain segment of the news to an audience specifically interested in that segment. However, over time, the amount of news created in the news segment grows and the the segment bulges. The news publisher either must choose to further narrow their segment, which will alienate some of their existing audience, or publish a higher volume of news, which ultimately lowers the SNR to any given audience member. Either of these options is not a good choice.
Long-term, the only news deliver medium which is viable is the roll-your-own news concept. Geeks here this and start throwing out terms like RSS and OPML but the bottom line is that you don’t have to know technology in order to determine whether a piece of news is interesting to you. Over the past months, I’ve found myself going to news sites, including TechMeme, less and instead refreshing Google Reader more. I’ve added many feeds and the news that arrives is astonishingly interesting to me. Most importantly, my Reader is astonishingly uninteresting to most other people. This kind of relevance is ultimately impossible to achieve by any news publisher that tries to appeal to more than a handful of people.
I don’t want you to conclude from this that I think the penultimate solution is the RSS Reader. The concept of explicitly adding feeds to a reader is just not going to fly with mainstream folks. So what is the perfect news medium that allows you to roll your own news but doesn’t require any tech savvy? Attempts have been made (NewsVine, etc..) but I think we have yet to see the killer news app.
My impressions of FriendFeed
For a long time I have been fascinated by the idea of a friend activity feed for the web. With the explosion of social/ugc websites in recent years, the web is increasingly a 2-way conversation between a website and a user. At the same time, as the rate of growth of content on the web continues to skyrocket, the need to filter new content by relevance is ever greater. One bucket that defines relevance is the bucket which contains the content (I’m using content somewhat loosely here to include activity that doesn’t necessarily generate meaningful content) created by a person’s friends and other important contacts. Finally, it’s impossible to ignore the emergence of the feed – whether it be RSS, pseudo-RSS, email or whatever – in mainstream products. A lot of people “get” the idea of a feed. Put all of these trends together and the result is, essentially, what I refer to as “life streams”: a stream that represents the activity of a person’s life. To be a bit more specific, the activity of a person’s online life. Every person who does anything interactive on the Web implicitly has such a feed and the aggregate of our friend’s streams keeps us up to date with what our friends are up to on the Web.
Now, any facebook user is quite familiar with the concept of a friend activity feed. The Mini-Feed/News Feed feature launched back in the fall of ‘06. The Mini-Feed is a log of a user’s activity on facebook and the News-Feed is an intelligently filtered aggregate of all your friend’s Mini-Feeds. Although these feeds were met with much initial controversy, a facebook without them now seems impossible. For me, the primary entry point into facebook is the news feed. I can see what’s going on with my friends and click deeper into what I find interesting. I can’t imagine having to click on each of my friend’s profile pages to check for updates. Because the News Feeds allows a user to easily discover fresh content in their networks, engagement metrics on facebook increased dramatically.
Amidst the incessant facebook buzz, it can be easy to forget that there exists a social Web outside of the facebook.com domain. Yes, outside facebook is a glorious and interesting world, a world with countless social websites where hundreds of millions of people interact. These social websites collectively face the same problem that faced the pre-feed facebook: in order to find out what my friends are doing I need to go to each of my friend’s pages. Except on the Web the problem is an order of magnitude worse! It’s not just a matter of pulling up my friend’s page, I first need to navigate to a different website. That’s a huge pain in the ass. So much so that I don’t recall in recent weeks going to YouTube, Flickr,

Enter FriendFeed. I first heard of FriendFeed when it was written up on TechCrunch. Basically FriendFeed brings facebook’s News Feed like functionality to the Web. I immediately requested an invite and to my surprise was granted one in a few hours. Setting up FriendFeed is a two step process. First you add all the services that you use. Of course they don’t support every website out there, but they mostly support the ones I use. Adding a service involves clicking the icon for that service and entering either your username or your personal url for that website. I added all my services, from del.icio.us to LinkedIn to this blog, in less than five minutes. It really could not have been easier. The second step is adding your friends. Of course, a service such as FriendFeed faces a classic chicken-and-egg problem and it’s growth depends on users inviting (and even compelling) their friends to join.
Pro’s:
1) Easy to set up – Like I said, I had all my services added in just a few minutes and it all worked perfectly.
2) You don’t have to change your behavior for it to work. Unlike other services in the past which have attempted to do similar things, there is nothing special that you have to do to have your activity published to your feed. FriendFeed grabs the RSS feed of your activity that the website publishes. Many services in the past have followed the bookmarking paradigm and forced the user to install and use a browser plugin or bookmarklet to make the service work. And, because of this nuisance, (surprise!) they didn’t work. FriendFeed takes advantage of the fact that every website worth its salt publishes an RSS feed for each user.
3) Social websites will love this and want to be included. FriendFeed helps people discover fresh, relevant (following the assumption that relevance correlates with proximit on a social graph). The more you can push relevant content to users, the more they will engage with your site. This has been proven in many shapes and forms.
4) Privacy from the get go. As was learned from the facebook News Feed launch, . Even if it is the case that few users will really fine tune their privacy settings, FriendFeed’s legitimate privacy controls will prevent it from receiving damning reviews from users, bloggers and the media.
Con’s:
1) FriendFeed.com is not my homepage and may never be. This is possibly the big reason why FriendFeed won’t catch on. A key reason why the News Feed is so effective on Facebook is that when you go to www.facebook.com, you get the page with the News Feed. As I said earlier, it’s a starting point on Facebook. However, FriendFeed is not my starting point on the Web. I suppose it could be if I change my browser’s setting but it’s not yet. I suppose FriendFeed can start by developing widgets for the popular homepages, but I doubt the effectiveness of that strategy for a variety of reasons.
2) Adding your friends to FriendFeed feels a bit creepy. “Hey join this service called FriendFeed so I can stalk what you’re doing on the Web..k thanks!”
3) Content may not be just a click away. On Facebook, feed events from applications are only visible to users who have that application installed. On FriendFeed, that concept is not currently present. I see all feed items for each user regardless of whether I have added that service. Right now, I’m looking at my feed and I see a bunch of Last.fm entries. The headlines sound moderately interesting but I noticed I was hesitant to click because I’m not a Last.fm user and I feel like once I click I’ll eventually be nagged about registering. Not worth the hassle me unconsciously thinks.
My bottom line assessment of FriendFeed is fantastic product execution (great site usability and the product “just works” without requiring the user to change their behavior) on a concept that is sorely missing from the Web. However, I find it difficult to be super bullish because of the homepage issue. It’s going to take a while for the average user to warm up to the idea of making FriendFeed.com their homepage and without this presence, I’m not sure if will grab the mindshare necessary to demonstrate the same success for the social Web as the News Feed did for facebook.
A couple interviews worth reading
Danny Sullivan interviews Gabe Rivera – Gabe is the creator of memeorandum.com/TechMeme, one of the premier blog/news aggregators on the Web. Ever since I first found memeorandum a couple years ago, I have been a multiple-times-per-day reader. My routine is: open laptop, check e-mail, check RSS reader, check TechMeme. You may also notice that Memeorandum is the only company besides Google and Yahoo that has its own tag on my blog! It’s indispensable for me. I’ve also had the opportunity to meet Gabe at various geek social events and he’s always struck me as someone who is purely focused on methodically building the perfect product. He keeps a low-profile and is easy to approach. On more than one occcasion I’ve rambled off my ideas for to him and he’s always been kind enough not to interrupt and beg me to stop boring him. =)
The second interview worth reading today is TorrentFreak’s interview of Bram Cohen, the creator of BitTorrent, the now ubiquitous P2P file distribution protocol, and the founder of BitTorrent, Inc (recently in the news for purchasing popular BitTorrent client, uTorrent). If you are a frequent torrent user, you owe it to yourself to learn more from the man who brought about the current revolution of P2P file-sharing.
How news aggregators might filter out discussion noise
Memeorandum and other news aggregators are focused on what’s going on NOW which isn’t really always super interesting. However, geek bloggers know that news aggregators are a great way to build traffic to their own blogs. A simple trackback post to the news item will likely result in publicity for the blog in the form of a link in the discussion for that news item on the aggregator’s site. Keen to this, many bloggers are quick to post such follow-ups. Sometimes the blogger adds some great insight to the news item in which case their presence in the discussion is certainly merited, but often it’s the case that the follow-up post is light on value to the reader and thus adds little to the discussion.
Gabe, if you’re somehow reading this, my suggestion would be to track out-going clicks such that if there is a link to a post in a discussion where many users are clicking and then within a few seconds clicking browser-back and returning to Memeorandum, then treat that as a negative importance vote for that link (post). If the post has a high-frequency of users that do click thru and quickly return to Memeorandum and also if the post has 0 comments, just go ahead and boot it from the discussion links for the news item. Hopefully that would encourage bloggers to only post follow-ups if they truly have something meaningful to add. Furthermore, it would help keep a high signal/noise ratio for Memeorandum readers.
In my own experience, I have noticed that the most insightful and mind-tingling posts are those where the author/blogger has clearly spent some time composing their thoughts and not just trying to garner some quick attention. For some good reads, I suggest you check out some of the blogs listed over on the right column under “Some Feeds I Read”. Happy reading!
Structured Blogging. If only the answer was that simple…
First of all, what is Structured Blogging? Right now, blog posts are physically just free-form text entries in plain english paragraphs. But logically speaking, a blog post might be a movie review, an editorial on a recent news bit, description of an upcoming event, etc. While plain old english prose is the optimal mode of comprehension for us humans, machines have a tough time figuring out what the heck you’re talking about unless the content of the entry is tagged or categorized in some way. Structured Blogging is all about incorporating microformats into blog posts in order to structure (aka. tag, but not tagging in the folksonomy sense but tagging in the tagged-data xml sense). Basically, let’s say I posted
“I saw Syriana last night and it was thrilling and though-provoking. Go see it this weekend.”
From these two sentences, you likely had no problem understanding that:
1) Syriana is a movie currently in theaters.
2) I saw Syriana and my review of it is: “thrilling and though-provoking”
3) I am recommending people to go see it.
For a machine to correctly recognize these exact two sentences as a review for a movie named Syriana is difficult. Furthermore, for the machine to find meaning in what I wrote is another problem in itself. Instead, if I published my post using the hReview microformat, a machine could easily recognize that my post is a review for an item – in this case this item is a movie named Syriana – and know what exactly my review is of the item – “thrilling and though-provoking”. Structured Blogging has partnered (it’s not clear how deep these partnerships really are) with all the major blogging tool companies to presumably integrate these formats into the popular blogging software so that the blogger need not know the exact syntax and tags of each format. Tagging your movie review post with the hReview format shouldn’t be more then a click of a few buttons.
Will bloggers use this? Let’s take a minute to understand the motivation of the blogger.
Currently bloggers publish their blogs as a medium for building and expressing their self identity on the Web. When you write something on your blog, it stays with you in one centralized place and becomes part of your e-identity. If I write a product review on Amazon, sure it will get read (in fact it would probably get way more readership than it would on my blog) but that’s not the point. I’m sort of giving away my content. The world doesn’t know who I am on Amazon. Right now, people’s online identities are so fragmented. Pieces of their online expression are happening on many different sites. They might publish some product reviews on Amazon, list some items for sale on Craiglist or eBay, write movie reviews on IMDB, regularly comment on news items on various blogs, chat on various message boards… the list goes on. Sure, all these forms of expression come from me, but because they are completely decentralized they do not form any sort of identity for me. Someone reading my Amazon review of a DVD I bought has no idea about the movie reviews that I’ve written on IMDB. Without a doubt, the ability to keep the content I create on the Web in one spot, published in the way I want is compelling. But blogging already offers this. Why do I need to adopt structured blogging?
The reason is so others can better find the content I produce. If someone is searching or reviews on Syriana, if I have properly tagged my review as such, then there’s a higher chance that a user will find my review. The reason is that the aggregators of the future, while sucking up my blog content, will be able to recognize and precisely record my post as a Syriana movie review. Without this tagging, the only way my content will be located is by search relevancy for the term ‘Syriana’. That’s pretty much hopeless. Besides, someone searching for ‘Syriana review’ won’t even be likely to be given my blog post because I didn’t even put the word ‘review’ anywhere in it. Okay, so if I use Structured Blogging, people will be able to better find my content. Sweet! Well it’s not really that perfect.
These aggregators of the future are going to want to aggregate the content they suck up. You can imagine a movie review aggregator that sucks up all the reviews in the blogosphere, and provides an uber MetaCritic. So users looking for reviews for Syriana will conveniently see “average 4 star rating based on 35 bloggers”. And then of course this aggregator will have advertising and sell movie tickets and essentially be making money off of my and others’ reviews. Is this aggregator compensating me? Nope. They’re just leeching my content and making a buck. The only thing the aggregator can possibly offer me is increased traffic if, in this case, the user wanted to actually read individual reviews of the movie. Is this a fair tradeoff? If I am posting something like a classified ad where it absolutely benefits me to increase its visibility, then there is real monetary value in it for me then the answer is yes. For other situations, the answer becomes tricky. Note: This discussion is very similar to the relationsihp between web publishers and web search engines.
Finally, this topic of structuring content was in the news recently thanks to our friends at Google. A few weeks back GoogleBase launched. Read my post about it. The concept with GoogleBase is very similar: Structure data so that it can be better aggregated. Right now, the only way to input into GoogleBase is directly via a web form (they have different forms for different data types) or via a feed. Either way its the content creator actively submitting it to Google. But, if structured blogging takes off, doesn’t it make a lot of sense for GoogleBase to suck up structured content from the blogosphere? Sure. If there’s structured content anywhere out on the Web, it makes tons of sense for Google to go fetch it. The problem is that right now there is little, if any.
Need to sync/share files? Download FileShare!
Last month, Microsoft acquired a file-synchronization software company called FolderShare. What’s so great about it? It’s a very simple tool to transparently synchronize files amongst many computers. Even though there are many software solutions out there that solve this problem (most which cost $$$) , still most people use relatively cumbersome ways to accomplish file transfer: e-mail, IM client, FTP, uploading to file-serving websites, etc. Furthermore, it’s always surprised me how much trouble most people (even myself on occasion) have just setting up file sharing between Windows PC’s at home. Clearly, a free and easy solution is needed by the masses.

I tried FolderShare out and it definitely worked as advertised. You just need to download their Satellite program which, once installed, runs in the background. Within a couple minutes of playing around with it I had sync’d up a folder between my laptop and desktop, shared my music folder with my sister, and can now securely access my hard drive via the web. The great thing is that it’s all peer-to-peer so the process is very efficient.
FolderShare used to charge $50/year for the service. The first thing Microsoft did after acquiring the company was to make it free. Very cool.
Microsoft has definitely been on a bit of a hot streak in the geek blogosphere with the debut of Live, releasing SSE under Creative Commons license, and now this.

