7 Comments

Summary:

Google added some features to Google News, including the ability to choose to see less news from blogs. But how does the search giant define the term “blog?” There’s no easy answer to that, which reinforces why the distinction doesn’t really make any sense any more.

503600331_c271b2d2f1_z

Google recently rolled out some enhancements to its Google News site, including settings that allow users to say whether they want to see more or less news from “blogs.” But how does the search giant define the term “blog?” After all, the lines between traditional media and the blogosphere have blurred a lot over the past few years, with traditional media entities launching blogs, and some blog sites becoming major media entities. According to Google, it looks at a bunch of different factors the company won’t specify, but the main one is whether a source calls itself a blog or not, just reinforcing the point that drawing a distinction between blogs and non-blogs is a mug’s game when it comes to the news.

Google first started drawing a distinction between regular news sources and blogs in 2009, but it was never really clear how the search company was defining the term “blog,” or why it included some obvious blogs but not others in that category. According to the Google blog post announcing the change, readers complained it wasn’t clear whether something was a blog or not when they did a news search. Was this important because readers felt that blog sources were less news-worthy or less reliable? Who knows. Google still hasn’t said.

Zachary Seward, who is now at the Wall Street Journal, wrote a post for the Nieman Journalism Lab at the time the original changes were made, arguing (persuasively, I think) it didn’t make much sense to draw a line between what was a blog and what wasn’t for news purposes, either from a technical standpoint or a philosophical one. As Seward put it:

Dividing content along these lines is like classifying brownies based on whether they were baked in aluminum or glass pans. There’s no difference, and it obscures what you really want know: if they contain chocolate chips.

In other words, the only real criteria that should matter when it comes to searching Google News is whether something actually, you know, contains news. M.G. Siegler makes a similar point in his recent post about the changes at Google News, which he notes has never been very good at surfacing actual technology news. What it tends to do, he says, is give precedence to mainstream news sites that report the same thing a blog reported, but several days after the fact. Is that what a news aggregator should really be doing?

According to a spokesman for Google, the search company “examines a variety of signals” from websites to determine whether they are blogs or not, but for the purposes of Google News, “we primarily rely on self-identification.” In other words, if a site has the word “blog” in its name, then Google News defines it as a blog. So since GigaOM and TechCrunch don’t use the term blog, they aren’t designated as blogs in Google News — but the News York Times  Bits blog and the Wall Street Journal On Media blog are designated as blogs. If sites want to be reclassified, the spokesman said, they can contact the Publisher Support team.

That’s not all, though. As Danny Sullivan at Search Engine Land notes, Google also classifies blogs for the purposes of what it calls Google Blog Search — which you can get either from the dedicated blog-search site or by choosing “blogs” in the left-hand navigation menu on the main search page. In those results, GigaOM and TechCrunch are both classified as blogs. Why? Apparently, because they publish their content via RSS feeds, which (as Sullivan notes) means Google should really change the name of the search to Google Feed Search instead of Google Blog Search.

Presumably an RSS feed is also one of the “signals” that Google looks at when classifying blogs for Google News purposes, although that’s not explicitly stated anywhere. On the help page for the news site under blogs, it says:

Blogs typically identify themselves as such, and adhere to standard blog formatting by displaying regular entries in order from newest to oldest. In many instances, blog posts are excerpted on the blog’s homepage instead of summarized by an editor or author. Finally, websites that organize their articles in a more editorial fashion and employ a complex layout are generally not considered blogs.

So from the sounds of it, Google treats things as blogs if they either identify themselves as such, or if they have a certain design — i.e., posts in reverse chronological order and a lack of a “complex layout.” This makes no sense whatsoever. Not only are many news websites adopting a distinctly cleaner and blog-inspired layout, but some things that are clearly blogs have moved away from the chronological ranking of posts as well, such as Nick Denton’s Gawker Media network. Some, such as The Huffington Post, have a mish-mash of both news-type pages and blog pages. And why would someone expect the NYT and WSJ blogs to show up in a blog search, but not actual blog sites like GigaOM or TechCrunch?

On its help page, Google says it acknowledges “the difficulty in characterizing blogs and the rapidly changing publishing landscape,” but  is trying to help readers choose which sources they want to read. So why not just include everything that actually contains news in the site called Google News, and let readers sort out what to call them?

Post and thumbnail photos courtesy of Flickr users Wesley Fryer and jphilipg

You’re subscribed! If you like, you can update your settings

  1. Michael Dorausch Wednesday, May 18, 2011

    After having a news site I managed indexed by Google News since soon after they launched in 2002, I stopped paying attention after we got mysteriously dropped in 2009. This recent turn has peaked my interest since it appears many more sites will get canned in the ‘filter’ of news vs. blog.

    Just as I mentioned on Search Engine Land, I see GigaOM as an industry news source, even though it runs on a WordPress platform.

    If you ask me, Google has increasingly bowed to the pressure of traditional media, killing off opportunity for ‘independent’ news sources to appear on homepage. I have years of news homepage screen shots showing the changing trend. Oh well, I’ll continue to get my news from Twitter.

  2. Osma Ahvenlampi Wednesday, May 18, 2011

    It’s quite amazing that a company which prides itself taking an algorithmic approach to pretty much everything under the sun isn’t able to solve this, when TechMeme has shown the way years ago – posts which get a lot of referrals fast are obviously news, and sites which carry them (repeatedly) are obviously news sites. Chocolate chip cookies tend to more reliably come from where they’ve come before. I don’t know what could be clearer.

    1. Totally agree, Osma — thanks for the comment.

  3. Davis Freeberg Wednesday, May 18, 2011

    Being able to tell the difference doesn’t take fancy algorithms, all you have to do is measure the number of outgoing links that they publish. If they make the reader stay on the site and just “trust them” for the news then clearly they are traditional media. If they allow their readers to see the original source of ideas by crediting others then they are clearly blogs and should be blacklisted from Google’s search results entirely.

    1. Good one, Davis :-)

  4. lurid tales of doom Wednesday, May 18, 2011

    finally, quality control.

Comments have been disabled for this post