A New York court issued a major ruling that limits the amount of content an internet scraping service can take without paying for it. Here’s a plain English explanation.

photo: Pixelbliss

A federal court has sided with the Associated Press and the New York Times in a closely-watched case involving a company that scraped news content from the internet without paying for it.

The case has important implications for the news industry and for the ongoing debate about what counts as “fair use” under copyright law. Here’s a plain English explanation of what the case is all about and what it means for content creators and free speech.

Fair use or a free ride? The facts of the case

The defendant in the case is Norway-based Meltwater, a service that monitors the internet for news about its clients. Its clients, which include companies and governments, pay thousands of dollars a year to receive news alerts and to search Meltwater’s database.

Meltwater sends its alerts to client in the form of newsletters that include stories from AP and other sources. Meltwater’s reports include headlines, the first part of the story known as the “lede,” and the sentence in the story in which a relevant keyword first appears. The Associated Press demanded Meltwater buy a license to distribute the story excerpts and, when the service refused, the AP sued it for copyright infringement.

Meltwater responded by saying it can use the stories under copyright’s “fair use” rules, which creates an exception for certain activities. Specifically, Meltwater said its activities are akin to a search engine — in the same way that it’s fair use for Google to show headlines and snippets of text in its search results, Meltwater said it’s fair use to clip and display news stories.

The case has divided the tech and publishing communities. The influential Electronic Frontier Foundation filed in support of Meltwater, arguing that AP could inhibit innovation and free expression if it succeeds with the copyright claim. On the other side, the New York Times and other news outlets filed to support the AP; they claim Meltwater was simply free-riding and that the company is undermining the ability to create the sort of journalism on which a free society depends.

A clean win for the AP

In a decision published Thursday in New York, U.S. District Judge Denise Cote shot down Meltwater in blunt language. While much of the 90-page ruling covers procedural issues and other defenses put forth by Meltwater, the heart of the decision is about fair use.

To decide if something is fair use, courts apply a four-part test that turns in large part on whether the defendant is using the copyrighted work for something new or unrelated to its original purpose. Famous examples of fair use include a parody rap song of “Pretty Woman” and Google’s display of thumb-size pictures in its image search. In the AP case, however, Meltwater’s fair use defense failed.

Judge Cote rejected the fair use claim in large part because she didn’t buy Meltwater’s claim that it’s a “search engine” that makes transformative use of the AP’s content. Instead, Cote concluded that Meltwater is more like a business rival to AP: “Instead of driving subscribers to third-party websites, Meltwater News acts as a substitute for news sites operated or licensed by AP.”

Cote’s rejection of Meltwater’s search engine argument was based in part on the “click-through” rate of its stories. Whereas Google News users clicked through to 56 percent of excerpted stories, the equivalent rate for Meltwater was 0.08 percent, according to figures cited in the judgement. Cote’s point was that Meltwater’s service doesn’t provide people with a means to discover the AP’s stories (like a search engine) — but instead is a way to replace them.

The judgement also points to the amount of content that Meltwater replicated. Whereas fair use allows anyone to reproduce a headline and snippets, Cote suggested Meltwater took “the heart” of the copyrighted work by also reproducing the “lede” and other sentences:

“A lede is a sentence that takes significant journalistic skill to craft.  [It shows] the creativity and therefore protected expression involved with writing a lede and the skill required to tweak a reader’s interest.”

The ruling added that Meltwater had taken more of the story than was necessary for a search engine and that its economic harm to AP also weighed against finding fair use. And, in a line that likely had news agencies clicking their heels, the judge wrote:

Paraphrasing James Madison, the world is indebted to the press for triumphs which have been gained by reason and humanity over error and oppression [...] Permitting Meltwater to take the fruit of AP’s labor for its own profit, without compensating AP, injures AP’s ability to perform this essential function of democracy.

These are what I regard as just some of the most important points of a very long decision. You can read it for yourself below; I have underlined key passages.

Common sense or a chill on free expression?

The decision has already caused concern on the part of internet freedom advocates. Techdirt’s Mike Masnick, for instance, says the ruling has “a ton of problems” and that Cote misapplied the four-part fair use test.

Meanwhile, the company has vowed to appeal and and its CEO claims to be “especially troubled by the implications of this decision for other search engines and services that have long relied on the fair use principles for which Meltwater is fighting.”

Meltwater is likely to face an uphill battle on appeal, however. Cote’s ruling is exhaustive and the Second Circuit Court of Appeals is regarded by many lawyers as sympathetic to the hometown publishing community.

The impact of the ruling, however, will be determined by how far it ripples beyond Meltwater. As all of the clipping service’s competitors have already paid AP for a license, the impact could be insignificant for everyone but Meltwater while, at the same time, boosting the AP’s resources for gathering news.

On the other hand, the ruling could embolden the AP and other news outlets to file more lawsuits. While this could bring more licensing revenue for journalism, it may also produce a phenomenon like what is occurring in France and Germany where publishers are treating copyright like a tax to protect outdated industries — and chilling online innovation in the process.

Meltwater AP Ruling

(Image by Pixelbliss via Shutterstock)

You’re subscribed! If you like, you can update your settings

  1. Reblogged this on ProvocativelE and commented:
    “A New York court issued a major ruling that limits the amount of content an internet scraping service can take without paying for it.”
    Where does one draw the line when in comes to New Media? What about feeds, Flipboard, StumbleUpon, Pinterest, Facebook, etc.etc. Every blog would then be in jeopardy, right? Even This post is using content that was published. Am I committing a crime by re-blogging it?

    1. The judge seems to say that one sentence and a headline could be acceptable for fair use. However, taking the entire first paragraph, which summarizes the story, is not fair use.

      I think that in previous court cases, one paragraph was acceptable fair use, so this judge might be misapplying the legal basis for the decision.

        1. I’m still awaiting someones opinion about RSS feed reposting. Is that or is that not fair use? Because it’s WAY more than just a headline and first paragraph…

      1. I noticed that the AP license for RSS feeds is for personal, noncommercial use only. I guess that means that they don’t want anyone reading the news at work.

        However, the license also says then say: “You further agree not to frame or otherwise control the browser window (if any) in which the AP content opens, including limiting the size or position of such window.” (http://hosted.ap.org/dynamic/fronts/RSS_FEEDS?SITE=AP)

        These RSS terms are completely ridiculous. If you were accessing the RSS feed for personal, noncommercial use only, why would there be any requirement about framing or controlling your own browser? Basically, if you access the AP RSS feeds, you agree not to move or resize your browser in any way. Even scrolling the feed up or down is prohibited, because that’s a form of control as well.

        The AP RSS feed is pretty useless, because they don’t even provide a complete sentence for most news stories. Previously AP’s license fee was: “The licenses start at $12.50 for quotations of 5-25 words.” If less than 1% of the articles are interesting enough to actually click on, Meltwater would be paying over $200 per article that got clicked through.

  2. I dont understand this ruling. Isn’t Meltwater a B2B service for media monitoring. I was under the impression that they sell companies a service to monitor brands and company names in the media. Meaning, Ford would pay them to monitor news stories about Ford and how they are being covered in the media. I don’t think the Meltwater model was to crawl and display or republish news content for public consumption, right? There are dozens of media monitoring companies doing the same thing right? Critical Mention? TV Eyes? Any experience I had with companies like this never grant license to republish, etc. And does this case set an interesting precedent for these types of companies… can they stay in business? This case just seems ridiculous to me. Why does the AP even worry about this kind of service? Just seems silly.

    1. The problem is, to use your example, Meltwater isn’t sending Ford to the original AP articles to read. They’re essentially copy/pasting almost everything relevant about the story and displaying it a Meltwter page for the people at Ford to read.

      This is helpful to Ford because they don’t have to click around and read long articles (or see the ads on the original site) to get to the 2 or 3 paragraphs about Ford that are relevant.

      It is for public consumption. Are not the people who work at Ford part of the public?

      1. The headline, lede plus one sentence is hardly “everything relevant”. If the search was done on the clients’ computers, there would be absolutely no issue here. The AP makes their articles accessible for free. The problem is only that the scraping is done on the servers that Meltwater pays for.

        The job of the lawyers should simply be to convince the judge that the exact same work can be done in any location. Meltwater is only performing the processing that is requested by the clients, they don’t care about the sources of the news and they are not running a newspaper where they let millions of readers read the articles, they are simply providing access to a free service to a single client.

        If the AP article is indeed valuable to the client, the client would obviously not be satisfied with the snippet. The only value that the client receives from the snippet is the ability to ignore fairly useless articles and hone in on the really useful ones. Meltwater should just provide the scraper as a program that runs on the clients’ servers, or convince the judge that the identical process is occurring when they let the clients run the same process on the Meltwater servers.

    2. I don’t know if there are dozens of media monitoring companies doing the same thing. What I see most often is a blog post or aggregator that has headlines and a URL. The URL either leads straight to the Associated Press story (as an example) or somewhere like here, GigaOM/ PaidContent. If here, I read an original article that describes what happened, synthesizes and sometimes has a statement of editorial opinion, which is fine with me, as it is clear to me which is fact and which is opinion because the writer makes certain to delineate that.

      Then there are search engines, of which Google is dominant, but in fact, one of many. There are many specialty search engines that are, and hopefully will remain separate from Google. They provide a headline and a single sentence, sometimes only a phrase of a sentence. I don’t know if it is legal precedent under fair use, but is an accepted standard. So far, so good.

      What Meltwater is doing is different though. If it is B2B, it is especially annoying. Meltwater shouldn’t get a free ride off of Associated Press, then directly profit from it.

  3. I don’t know about this case, but AP seems to have an exaggerated sense of ownership. At one time AP is reputed to have wanted a license fee for the use of five words in a row. There are relatively few five word combinations that others have not used before AP. So presumably AP would be willing, in all fairness, to pay others for the use of their words. I just Googled the last five words in the sentence before this “the use of their words” as a straight quote. I got about 7 million hits. Would AP be willing to license the use of those words from the previous users? How would they contact them, contract with them and pay them. Who should be paid first? Should AP be allowed to use the phrase first, and then settle on payment? Or should it have to get a license from all previous users, and then be able to use it. Writing a story could take years. Especially when you consider that there are 495 five word combinations in a 500 word article. Now we would be looking at 4.5 billion licenses to be verified. Consider the use of 6, 7, 8, .. word quotes. More billions and billions of licenses.

    This current suit may or may not be reasonable. I don’t know. But according to a wide variety of sources, AP has previously attempted to gain control of any sentence of over 4 words that has been used before — just about all of them. I would not trust anything that AP wants.


    1. Of course BoingBoing hates the ruling because they’re some of the biggest offenders. They rarely do any original work at all and fill their blog with huge “quotes” from someone else.

      And I see nothing wrong with the AP feeling like they have what you call “an exaggerated sense of ownership.” They paid the salaries and the health care of the people who created the stories. They deserve to earn something from the work. Why don’t we let Meltbrains hire some people and pay for them to do some reporting.

  4. Chill of free expression? Come on! These aggregators wouldn’t have an opinion if they couldn’t lift it from someone else. This won’t chill the expression of their own opinion, it just stops them from regurgitating someone else’s. They’re just lazy punks who want to make a fast dollar off of the backs of the real workers.

  5. It’s about time the fair use excuse gets cracked. Artful theft of content, while certainly pioneered by Google and others, is now in epidemic form.

    Taking the lead, a thumbnail, and two lines of text from an article is not fair use, its a micro article. It is simply republishing, redistributing, ….. or stealing.

    It is a bit of a tell when the site claiming fair use doesn’t even report any news on his own. The act of aggregating news as a sole means of delivering news is not indicative of a company that is in the news business. Its indicative of a company that is in the stolen goods business.

    Funny how nobody is claiming fair use of music or movies (remember music file sharing?). Why? because those industries already beat everyone in court. The same is coming around for written and pictorial content.

    The internet is certainly the wild wild west, but eventually the sherif comes to town.

  6. — in the same way that it’s fair use for Google to show headlines and snippets of text in its search results,

    It is actually not fair use (see 17 USC § 107) for Google to “show headlines and snippets of text” and then append ads around them without a license for each and every one Google is getting away with it, currently, because the settle out of court, have defense judges omit text of statutes, and the fact that they have “fair used” everyone all at once, stealing $137 Billion USD, a large chunk in off-shore tax evasion accounts (not avoidance because of infringement) Google Images’ “May be subject to copyright” mark (which has grown smaller as of late) is not fair use. They know they have no license, then post the “may be subject” mark to use it nevertheless until a court says otherwise. Ultra-criminal.

    There is no de minimis non curat lex protection. See Playboy Entertainment, Inc. v. Frena, 839 F. Supp. 1552 at Pg. 11 (M.D. Fla. 1993) (“The detrimental market effects coupled with the commercial-use presumption negates the fair use defense”; hence, denying the principle).


    1. Is there a Google employee out there who would publish some of the Google search code – the actual code? If so, then we could have an interesting experiment.

      Now, their code would actually be an example of some of the only the intellectual property that Google actually creates…. So, if some people on the web wrote some sort of article about Google, and quoted and linked the Google search code in their article…… It would certainly make the articles more interesting… and we could hide around the fair use excuse. It might be fun to try.

      Just claim fair use, and let Google hash it out in the courts to see what is fair about using their dear intellectual property.

      So, if we can get the shoe on the other foot, then there might be an interesting hashing out of this issue.

      Don’t you think so?

      1. @CfC: Yes. I do.

        I didn’t think that was true in the recent past. Google provided many free services, and seemed like something of a Robin Hood at times. That was flawed thinking, however. Ultimately, rule of law and acknowledgement of intellectual property is essential. The lack of balanced*, consistently applied IP law, THAT is what will stymie innovation, I fear.
        * fair and balanced as in “doesn’t over reach”, nor demands excessively broad scope

        I’m sorry to be cryptic, without a basis for my opinion. There is a character limit length on comments. I don’t want to drone on here. Instead, I’ll respond on my own blog.

  7. Chris Boulanger Saturday, March 23, 2013

    The whole thing seems to hinge on the use of the lede and the low ctr. I think curating services will be alright if they limit their snippets and can show their users clicking back to the source at a high rate. But more traditional aggregator sites and scrapers are going to be paying license fees or facing lawsuits.

    1. “The whole thing seems to hinge on the use of the lede and the low ctr.”

      Agreed – but I am really, really wondering where anybody got that 56% click through rate on Google News.

      I find it very hard to believe that over half of the AP stories on GN are clicked through over half of the time.

      I wonder if the judge got suckered by some shady definitional games or just out and out lying about the data.

      I just downloaded the ruling – it will be interesting to see the details of this 56% business.

      1. Man oh man, I just traced the heritage of the 56% factoid and it looks like it is rife with potential conflicts of interest all down the line.

        Basically, the court cites a NY Times friend of the court brief (cough cough HACK!) which in turn cites a techcrunch post (cough cough HACK!) which in turn cites a 2009 Outsell Report (retail cost $995) as the primary source of the data point.

        That’s a lengthy game of Telephone going on there…with not a lot of disinterested players.

        Anybody want to guess if the judge actually saw the primary source material (Outsell report) – I strongly doubt it.

        In fact, the judge basically goes along because Meltwater didn’t bother to offer up competing click-through data for Google News (how could it?).

        Ironically, the whole tone of the NYT/TechCrunch/Outsell 56% data chain is that *Google* (with its alleged 56% ctr) is really ripping off NYT, etc.

        So how long is Google News safe? (Insanely inflated 56% ctr in particular…)

        This has been a bad week for the free internet – first Google Reader gets the axe (pretty mysteriously in my book – does Google own Twitter shares?) and now the ground work is laid for giving Google News the boot.

        Something wicked this way comes.

  8. Constant Geographer Sunday, March 24, 2013

    I don’t see where Meltwater is re-publishing the articles as its own self-authored news service. Without knowing exact details, Meltwater seems to be searching through newsfeeds and looking for bits of information related to clients. Then, Meltwater repackages those bits of news and sends what it finds on to customers. Not even the entire article, just enough tease to get the client interested.

    And, the clients know they have contracted with a news aggregator, that the actual news source lies outside of MeltWater, so it’s not like any fraud or deception as been committed. Meltwater is merely saying, “Here is what they world is saying about you today. Read up, or don’t.”

    So, in a sense, one would think Meltwater would drive more traffic to AP, helping improve AP’s reach and readership, as Meltwater is driving specific organizations and people to AP’s content. AP’s financial bottomline is better with Meltwater driving traffic to AP, traffic the AP might not otherwise have gotten.

    In a similar vein, then, its almost like the AP took Meltwater to court to stop them from driving traffic to the AP.

    Perhaps Meltwater should either change its business model, slightly, or, even better, ask to be bought by AP. AP could then use the service to help drive its own content to specific companies, people, etc.

  9. The difference between Meltwater News service and search engines like Google is that search engines don’t charge their users for usage of their tool.

    The real value people get from Meltwater’s service is the content that is freely and publicly published on the Web by its owners. Hence, Meltwater shouldn’t charge users for the same content. This is similar to open source software where it provides free and public access to its code. Anyone can make use of the code and modify it to add more value to the software but must likewise distribute it freely and not charge for it.

  10. Meltwater tried to sell their services to my company recently as a PR tool for media monitoring. We didn’t go for it because the basic plan was so similar to what you get from Google Alerts- i guess that would be the reason it’s not legal- since it competes w a free service. The more you pay the better it gets though and could potentially be a great monitoring tool like Ciscion. Overall I’m surprised that ctr would show its being used for something different than any other monitoring tool out there.

Comments have been disabled for this post