8 Comments

Summary:

Google is making changes to its search algorithms that will penalize websites hit with copyright-removal claims, but the company is saying very little about what criteria it will use to determine who gets hit and who doesn’t — can we trust it to make the right decision?

As we reported earlier, Google recently announced that it will start filtering its search results based in part on the number of copyright-takedown requests that have been filed against a site: according to a blog post from the search giant, it will tweak its algorithms to rank a website lower if it has a large number of “valid copyright-removal notices.” And how does Google know whether a copyright-removal notice is valid or not? The short answer is that it doesn’t — which is part of the reason why YouTube in particular sees so many bogus takedowns. So how do we know that this filtering of search results won’t adversely affect some websites that are perfectly legitimate? Google’s response so far seems to be “trust us.” But should we?

The search company says it decided to make the change because it will help users “find legitimate, quality sources of content more easily,” but some critics of the move have a different theory: they figure Google has essentially caved in to pressure from media and content companies — the same kind of pressure that led the U.S. government to push for legislation such as SOPA and PIPA, which would have allowed copyright holders to remove offending websites from the internet completely based on just an allegation of infringement. While Google’s changes won’t do this, being pushed down in the search results of the web’s dominant search player can have a serious impact.

Google’s criteria are completely unknown

As the Electronic Frontier Foundation points out in a blog post criticizing the move, Google’s search algorithms are opaque by design, and so there is no way of knowing what kind of criteria they will be using to decide which sites to penalize and which to leave untouched. What does a “high number of copyright-removal notices” mean? We don’t know. And while Google provides a “counter-notice” process for those whose content has been removed from search altogether, it’s not clear whether there will be any method of appeal if you think your website has been downgraded in search results because of bogus copyright claims. Says the EFF:

“Without details on how Google’s process works, we have no reason to believe they won’t make similar, over-inclusive mistakes, dropping lawful, relevant speech lower in its search results.”

Danny Sullivan of Search Engine Land noted in a post that if the simple number of copyright notices against a site are the defining factor in whether Google drops them lower in results, then YouTube will be in grave danger, since it gets a vast number of them (although the site doesn’t appear in Google’s public list of sites where it has been asked to take down content). And there have been repeated examples of bogus claims that have led to the removal of lawful content from YouTube — including one recent incident in which several different media companies launched claims of ownership over a NASA video involving the Mars landing.

That kind of behavior isn’t likely to fill anyone with confidence in Google’s ability to differentiate between a valid copyright claim and an invalid one. And the company’s response to Sullivan’s post muddies the waters even further: Google said that YouTube — and other user-generated content sites such as Facebook, Tumblr and Twitter — won’t be penalized (or at least not very much) by the new algorithm changes, because of “nuances” in the new algorithm. What kind of nuances? The company isn’t saying. According to Sullivan:

“Google told me today that the new penalty will look beyond just the number of notices. It will also take into account other factors, specifics that Google won’t reveal, but with the end result that YouTube — as well as other popular sites beyond YouTube — aren’t expected to be hit.”

Is Google trying to curry favor with content companies?

Is it just large user-generated content sites that will get some kind of free pass? We don’t know. Is there some kind of white list of protected sites? Unknown. According to Sullivan, the company simply told him that the algorithm “automatically assesses various factors or signals to decide if a site with a high number of copyright infringement notices against it should also face a penalty.” What these various factors and signals are seems to be a secret — just as everything else about the company’s search algorithms is kept secret.

While this is presumably done to prevent people from gaming the system (or competitors from copying features), it makes it a lot harder to determine whether Google is unfairly penalizing websites for bogus copyright notices. And as the EFF points out, “false positives” are a huge problem — not just for Google but for the internet as a whole, with some websites and domains being seized by the government based merely on allegations of copyright infringement. While Google’s search penalty may not be as bad as that, it still feels like the search giant is taking action against websites that should be innocent until proven guilty.

Why would the company decide to do this? For one thing, it is being investigated by the Federal Trade Commission for antitrust activity, and it may see moves like the algorithm change as a way of showing that it is a beneficial force for society. Google is also trying to do more content-related deals with traditional media and entertainment players through YouTube, and that may have increased the pressure to come up with a response to piracy that provides at least a watered-down version of the penalties that those companies were pushing for with SOPA and PIPA.

The bottom line is that Google is essentially asking users to trust it to decide what to do with websites that have been accused of copyright infringement. But we have already seen that Google is prepared to engineer its search results for its own benefit rather than that of its users, with features such as “Search Plus Your World,” which was designed to promote Google’s social network. That kind of thing makes it harder to rely on blind faith in Google’s value judgments, especially when it comes to crucial questions around copyright and freedom of speech.

Post and thumbnail images courtesy of Flickr user Stefan

You’re subscribed! If you like, you can update your settings

  1. Reblogged this on Tech Bytes Xpress and commented:
    A very good read.

  2. For me I still trust Google. I mean what company would you trust nowadays? The company I trust, Google, Amazon, Apple. The companies I do not trust Microsoft, AT&T, Oracle, Dell, HP and Comcast.

    1. That’s a very coherent argument exhibiting strong logic and critical thinking skills.

  3. This is such an incredibly bad move on Google’s part, I can’t believe they went through with this. There are so many things wrong with this:

    1. This goes beyond core search functionality. Now, Google is acting as not just a search engine, it’s going to be a copyright guard.

    2. Google has always claimed that it’s search results are based on what is good for the users. This is why it filters out obvious phishing, malicious and content-farm type sites from the results. This is okay because those sites do reduce the quality of the search results and leaving them out of results does help users. But no user asked for Google to be a copyright guard.

    3. If this type of filtering had been in place when Youtube was just a small fledgling company, it most probably would not have been able to grow in popularity to what it is today. This filtering will most adversely affect young, up and coming companies which may not be doing anything illegal.

    4. Copyright infringement detection is highly inaccurate. The technology, policies and processes are still being debated and evolving. Fair use practices are frequently questioned. Google itself has been at the receiving end of such questioning countless times, with the Google Books lawsuit being one of the prime examples. In such a scenario, how can Google use this highly inaccurate signal as an input for search results ranking.

    5. Google may yet live to regret this decision because one of the worst consequences of this practice might affect Google’s core Search product. By doing this, Google has opened itself to lawsuits by websites that may claim that they have been unfairly ranked down in search results due to bogus copyright issues. Such lawsuits would seek to force Google to reveal the internal working of its search algorithm, something that goes to the core of Google, something it has been loathe to reveal and can never reveal if the search product has to continue functioning successful. This may well turn out to be Google Achilles’ heel.

  4. Heaven forfend if a site was unjustly dinged!

    And of course you feel exactly the same level of indignation about the possibility that an offending site would go unpunished.

    Tiring, very tiring.

  5. Trust Google? No.

    This corporation has turned into a bloated monstrosity.

  6. Everything from Chrome and gMail to their Android phones and tablets will from now be boycotted by me.

Comments have been disabled for this post