Archive for the ‘aggregation’ tag
The battle of attention vs conversation in the blogosphere
I was composing an e-mail reply to someone (the person reading this will know who he is) and what I intended to be a short e-mail on the topic of conversation in the blogosphere ended up sprouting into this long rambling. I realized that I wanted to throw it on my blog for viewing by anyone who might find it interesting:
On any given Monday morning, thousands of people gather around water coolers at offices around the country to chat about the “Desperate Housewives” episode that aired the night before. On a daily basis, radio shows around the country host discussions covering the same events in news and politics. In these examples, because of physical limitations, the number of people that can engage in any one of these conversations is limited. That’s why many people flock to the Internet to discuss these same topics with a broader scope of people. Ultimately, the “perfect” conversation is when everyone interested in a topic, can engage in a single, dynamic conversation. It is often the case, however, that in the blogosphere, at any given moment in time, many blogs will be covering the exact same topic. The result is that there are many, duplicate conversations going on – just like what happens in the offline-world, as I described above.
A new service that made some commotion over the weekend, coComment, helps to facilitate conversation on a single blog. Also, services like Memeorandum help connect blog posts by finding memes thru back-links and track-backs. While services like these help to connect opinions, the problem is far from solved. Person a who comments on blog A may be stimulated by a comment from person b on blog B, but the two people are likely to never read each other’s thoughts.
As we all know, blogs compete for attention. More specifically, each conversation is competing for attention from other conversations about the same item. Every blogger would rather have a comment posted on his blog, and see his own comment thread grow, rather than that happen on another blog.
The blogosphere isn’t the only form of discussion on the Web. Far from it actually. There are many other discussion environments, most which are centralized. The best example of centralization are message boards (and if you think message boards are on the decline, do yourself a favor and check out some stats at Big Boards) where the entire conversation (both people and content) is centralized.
Message board culture is very different from blogosphere culture. People who visit and post in message boards do so because they like to be part of a community and for the entertainment value that is had by engaging in intelligent conversation about things of interest to that person. The blogosphere, which is sort of the opposite of message boards in the sense that people and content are decentralized, has a somewhat different culture. For many bloggers, their online identity is their blog (and the content that they publish on it). Most bloggers blog for the purpose of promotion of their identity – whether it’s their social identity or professional identity. The key advantage of a blog, in terms of building an indentity, is the very fact that all the content a blogger writes is explicitly connected to his blog, and thus his identity. This is not to say that bloggers don’t care about intelligent conversation, it’s just that bloggers have this additional motive of building identity.
By building our identity, we can increase the attention that we garner from our peers, and thus increase our value amongst our peers. This is true in both the message board case and the blogosphere case, except that in the former, the community of peers is small and isolated so building identity in this case has, in a sense, limited and finite value.
So the big question is, does blogger greed inhibit the unifying of conversation? Are bloggers so hung up on building attention that they’d rather own their conversations instead of joining their conversations together for the benefit to those involved in the conversation?
Anyways, I’m not sure I brought my points together as well as I wanted but it’s 5AM and I need to sleep. But I’m really hoping you read this post and tell me your thoughts.
Information overload
It’s approaching 3AM right now and I’m not asleep. In fact, over the past year, my sleeping time has gotten later and later and later. Why you ask? Partly it’s because I’ve been busy working on my startup Dontbuyjunk and I’m often working late into the night until I’m satisfied with the progress that I’ve made for the day. But, I’m increasingly finding that what really is preventing me from getting to bed is information overload courtesy of the Internet. Let me explain.
I’ve been spending hours per day on the Internet for several years now. The big difference though is that recently the time I spend is shifting away from entertainment (mindless chatting on message boards, gaming, etc.) to information exchange activities such as reading/writing in the blogosphere. Every night, after I’m done working, I do one last catch up with my RSS reader and almost without fail, I end up spending a couple hours bouncing from one blog to the next and then to aggregators like del.icio.us and memeorandum.
Today, publishing (via the Web) is essentially free. And when I say “free” I mean that it both has no cost and is without rules or barriers. Furthermore, the second you publish your content, it is instantly accessible to a billion people. Because of all this, the rate at which information id created and disseminated is astonishing. So this is a good thing right?
Well…sure. enabling people to express and share both knowledge and opinions is great for society in countless ways. The problem that develops is that with so much publishing going on, how can I keep track of that tiny subset of information that is relevant, unique (remember that the majority of content published everyday is either syndication or basically duplicate) and valuable in my world? It’s getting harder by the day. Further exacerbating my problem is the wanting to not just read the facts behind a topic/news bit, but also read the opinions and participate in the many insightful discussions that branch from it.
So what’s the solution to my problem? Lunesta? Maybe. The next-generation of aggregators? Bingo.
One big trend that we are starting to see develop and I believe will be a major area of focus in the years to come is in information filtering and aggregation. Search engines like Google and centralized information sources like ESPN and Wikipedia allow me to pull in specific pieces of information when I am actively seeking it. However, their limitation stems from the fact that most of the information I absorb on a daily basis is new and could not have been searched for. In other words, if I didn’t know the information existed, how could I have searched for it? Instead, I must rely on my set of trusted sources to push this new information to me. Information aggregations, either human-derived (digg, reddit, del.icio.us) or algorithmic (memeorandum, blogniscient, Google News), are a step in the right direction. But aggregators have a long way to go before they truly are accurate and encompassing tools for information.
Anyways, it’s now 4:30AM and I’m basically just blabbing. Aggregators is an area that I’m becoming increasingly interested in myself and I have some of my own ideas brewing in my head about what the perfect aggregator would be and how it would work. I’ll be thinking and blogging about it in the coming weeks.
For some more discussions on aggregators, check out a blog post on memeorandum I was reading earlier that I found insightful:
http://mashable.com/2005/11/08/hacking-memeorandum-more-proof-that-algorithms-dont-work/
Be sure to read the comments thread.
Google Base: the process of unifying data on the Internet
Back in 2000, in an article titled “Not Your Father’s Internet”, Bill Gates wrote
In many respects, today’s Internet actually mirrors the old mainframe model, with the browser playing the role of “dumb terminal.” All the information you want is located in centralized databases, and served up a page at a time (from a single Web site at a time) to individual users. Web pages are simply an HTML “picture” of the data you need, not the underlying data itself.
What Gates is describing here is the fundamental difference between the Internet infrastructure which stores and exchanges raw information and the Web whose purpose is to convey this information to humans.
Currently, for any type of information, there are often multiple sites each with their own database containing information of that type. Let’s take a simple type of information like classifieds, specifically auto classifieds. There are several sites on the Web that have auto classifieds listings: AutoTrader.com, Craigslist, Cars.com, and many others. Now, if you need to search these classifieds to find a 2001 Honda Civic in your area, you will need to go to each site and perform a search. Horrible.
To be more efficient, you could try a classifieds meta-search site like Oodle which will automate the process of searching several classifieds sites for you and return you a single aggregated result. Sure this is a time saver but there are inherent limitations to meta-search engines. Meta-engines do not, of course, have access to AutoTrader’s database or Cars.com’s database, all they can do is crawl and scrape these sites which is an imperfect process. No matter how much intelligence you can build into the scraper, it will never provide a superbly accurate, comprehensive, or up-to-date set of results. There are other limitations like being only able to search the common denominator of information (if Cars.com differentiates between transmission type but AutoTrader.com does not, then Oodle can’t offer transmission-type search refinement).
This same auto classifieds example can be applied to many types of information: product data, job listings, news articles, etc. It is a coincidence that these are the same information types found on Google Base? Of course not.
Ultimately what we humans want is the perfect set of information matching a given search. Any search engine, if limited to searching humanly readable documents (e.g. HTML, PDF, etc.) will never be able to provide perfect information. A better search engine will have access to raw, unadulterated, structured information.
Google Base is simply an attempt to unify the data found in the databases of the world. It’s not sexy, but raw information isn’t sexy. While you and I can add our own data to Google Base, the real power is in the bulk data upload. Imagine if the major classifieds sites continuously uploaded their data to Google Base. Google Base would then become the ultimate classified search. Now, of course, that’s not going to happen so easily because a site like Craigslist, whose value comes entirely from the information in their database (some would argue that Craigslist has other significant value-adds like its user community and simplistic interface), will effectively be putting itself in the fast-lane towards extinction.
However, if eBay were to upload auction listings to Google Base, that would be great for eBay because it would allow Google to more effectively search eBay auction listings. Unlike in Craigslist’s case, it would not threaten eBay’s existence. That’s because for eBay, the auction data is just one part of the puzzle in the auction process. eBay still owns the surrounding processes, like bidding and payment, which are necessary for the auction data to be significant. I doubt Google really has any desire to get into the auction vertical. Google just wants to organize information, not build verticals around this information.
