How Network Statistics Can Make Search Better And More Relevant

[qi:013] This week I read a fascinating article by Joe Weinman that was published in Business Communications Review. In it, he proposes an innovative concept that could initiate a paradigm shift in Internet search, fixing what may be its biggest problem: too many results, many of which are of limited relevance.

He argues that the search engine portals have been ignoring a wealth of information that can be gathered from the network layer. Search engines, he says, could use deep packet inspection, sampling, and other techniques to collect network traffic statistics, analyze and parse the statistics to better understand usage and popularity of sites and the pages within those sites — then add this knowledge to their algorithms to return more relevant results.

As Weinman explains:

Network traffic statistics such as unique visitors, interval between visitor arrival at a page or site and departure from a page or site, packets transferred, subsequent clicks from a page vs. reloads of prior pages, clicks leading to other pages within a site, and similar types of measures could be an excellent indicator of average user interest in a page or site, which in turn is a proxy for relevance.

Weinman provides a great analysis of today’s Internet search technologies and builds a very credible argument, citing numerous examples of ways in which network traffic statistics can be used to aid search relevancy. One example asks: If you were trying to select from among several restaurants, would you count the number of reviews written about each one (Google’s PageRank (GOOG) algorithm simplified) or go look in each of their front windows to see which of them were empty and which of them were packed with happy diners?

Although refining search relevance is the main thrust of the article, it is clearly in the interest of large global telecommunications providers to find a way to add value for their customers beyond providing commodity bandwidth and connectivity services. One logical assumption is that telcos and ISPs would want to gather network traffic statistics and sell the results to Internet search companies for inclusion in their algorithms.

The technology involved sounds suspiciously like a domestic surveillance program that was exposed last year and as such, could raise privacy concerns. But this may not be an issue if the data was gathered anonymously and aggregated to show generic network traffic flows, not personal information. After all, you don’t need to know anything about the diners in a restaurant to evaluate its popularity.

After reading Weinman’s article, I believe that it is technically possible to make search results more relevant using network traffic statistics. Such an approach could give birth to a whole new search algorithm — driven by network traffic statistics — which could, in turn, shift the strategic balance between companies who run search engines and the network service providers who provide commodity bandwidth. But it begs the question: Are you comfortable with your service provider helping to provide this relevancy by using your network traffic data?