Wholesale Blog Plagiarism … Alert

105 Comments

Updated: December 27, 2005: The wholesale blog plagiarism is a much wider problem than most of us realize. In the past few days, as we tried to get one site shut down, many more examples have come to light that are simply ripping the content, and repurposing it for making Ad dollars. I can list many a few names here, but why give them the traffic. Some suggest no-more-full feeds, which has sparked off a whole different debate. I am not cutting off the full feeds because of some people are not doing the right thing. That doesn’t mean I am not worried about this whole trend. I turned to Dick Costolo, CEO of Feed Burner and asked him if he could do something. Dick replies…

We do have the ability to throttle these kinds of things IF they are identifying themselves. Frequently, these kinds of sites use a tool that masks their identify by just requesting the feed with a blank user-agent string (for example, instead of sending up “FeedDemon/1.0” or “Googlebot”, they just send up nothing). The problem with banning a blank string is that there are a bunch of perfectly valid home grown RSS readers out there that also send blank strings, and finally, these guys usually bounce around from IP address to IP address.

Clearly, these sites ONLY exist because they can make money from Google AdSense. The issue is important enough for Google to step in, and do something. Everett says, in the short term it may not be a bother for Google. Jeremy Pepper points out that while the splog sites are doing this to make a quick buck, how about the aggregators etc, who are repurposing the content and making money off that.

One commenter writes, “Scrapper sites may soon become the Achilles heel of google adsense program and trigger massive advertiser withdrawal, like what happened to banner advertisements of Web 1.0 era, when many sites started to reload the page every few seconds to get billions of ad displays and advertisers lost millions.” Meanwhile there has been a lot of behinds the scene conversation, that cannot be blogged right now.

Original Post…

Last week, Mike over at Crunch Notes was complaining about Josh Stomel, who was making slight changes to Mike’s posts and reposting them as his own writing. Well, at least Stomel made some effort. This morning, Andy Abramson sent me an email about this website which is lifting and reposting the posts from GigaOM wholesale, images and everything.

These guys who call themselves a magazine network are so dumb, they even took the categories. Apparently, these people are not just ripping my content, but also the content of other bloggers. The design seems to be inspired by “Weblogs Inc” and clearly, this site is created to make money off other-people’s work. Think of this as a new kind of a splog. All right folks, I need some suggestions on how to make this shit stop. The domain is registered to someone in Texas, and the email address on the domain registration information goes to RezGlobal, a wholesale luxury travel agency. Global company, which has a website, but no executives.

Update: Thanks to reader suggestions, some aggressive reporting by Dave Burstein and Andy Abramson, along with a quick response by the said site’s host, the xb90.com has been shut down. Thank you all for moral and technical support.

105 Comments

Find in Forums

It’s a win win situation. Google makes money scrapers make money, adwords publishers get targetted traffic..
But also it’s a trade off for google, it needs to maintain the quality of i’s results, but also be as profitable as possible, so while other search engines aren’t doing anything to improve their serps and ban scrapers, why would google care?

blogs.us

The answer is DRM, not DCMA. You need to proactively protect your content using technical means, not reactively throw lawyers at the problem. This would mean changing the way everyone publishes and consumes content. Unfortunately, that’s a huge change, but the only way we can ensure correct attribution and compensation.

Eoin

Make sure to get a cached copy of one of their offending webpages, including the date and time. You can get one from Google by searching for “cache:” followed by the offending website’s URL. Archive.org’s WayBackMachine may include more than one copy, so you can see how long they have been copying you over time.

David Knight

I repackage many rss feeds on my site and I have google ads. I don’t see any problem with it. I’m offering a service by aggregating various feeds together. An RSS feed is meant to be redistributed, is it not? (BTW I give clear attribution and links to the original rss feed and web page. )

Sean P.

This site: laptop-notebook.blogspot.com ripped off two of my reviews, word for word (along with ripping off a lot of other sites, including Trusted Reviews, Ziff-Davis Net, PC World, PC Magazine, LAPTOP Magazine, etc).

It’s hosted by Google. I have contacted Google, both the Adsense and the Blogger sides. Nothing, not a word.

How do these people get away with it? They don’t even change the wording of any of the reviews they lift (and they do a damn fine job, including lifting all relevant images).

Adam

Sorry, Duncan, and others suggesting that bloggers drop their full-text feeds… that’s throwing the baby out with the bathwater.

What next? Gee, I think I’ll stop sending and receiving e-mail and just IM people with a link to view a note to them on my Web site?

Spammers and other thugs on the Internet shouldn’t force us to make content distribution and accessibility a pain in the ass for the 99% of the people with ethics.

Instead, people should indeed lobby Google to institute policies and procedures that — while not, unfortunately, likely to increase their revenues — will at least largely make the blogosphere a better, less-scraped place.

The problem, though, is that Google AND its advertisers have no *economic* incentive to clean this stuff up. Look at how things currently are with splogs:

1) Asshat creates a splog featuring, say, of Viagra links.

2) Some floppy fella searches Google for the Big V.

3) He lands on a scraper page and sees a big bold ad for Viagra, along with some scraped info from a medical blog about the topic. He’s happy. He clicks through on the ad to a real Viagra site.

4) Google’s happy. They just got paid.

5) The real Viagra site’s happy… they just got a new, valid customer (someone truly interested in their product), albeit in a slightly round-about way (one extra click).

6) And the scraper site’s happy. They just got AdSense money.

So, basically, in this typical scenario, EVERYONE is happy except for the folks whose content is being scraped. And unless they uber-geeks who check for their links regularly, they probably don’t even KNOW their stuff is being scraped, so no harm no foul, right?

And those being scraped, so to speak? They’re typically not Google’s customers. Their unhappiness currently isn’t any sort of a liability for Google. Worse yet, making them happy (serving as a copyright policeman) is likely to LOSE Google money.

I sincerely believe that Google’s engineers are trying to figure out algorithmic ways to blast the sploggers to hell and kill their AdSense revenue largely BECAUSE it’s the right thing to do. But given the lack of economic incentive, I sincerely doubt this is a top-priority project over there.

duncan

Drop your full feeds Om, it doesn’t give you 100% protection but it does limit the scrapers ability to rob from you.

Keith

while ppl are discussing this issue I thought I should check opinion of legal/ethical experts on
http://sf.getvendors.com (check out the news & views section).. we need to polish it and fix a number of issues (the final version will look quite different and load fast)..but looking at this discussion wondering what you folks think about approach..Feel free to take shots..

Andi

I have to agree with AGoToGuy here–this was news 10 months ago. Since then Technorati notifications on my domain name have been running 10-20 a week, all splogs. Wikipedia has had it defined for months.

http://en.wikipedia.org/wiki/Splog

If you click on the “Ads by Goooogle” link you can report the offending site, but that’s playing whack-a-mole, you’ll waste more time reporting than the offender does generating the splog.

Darren Campbell

If you have ever checked-out a “hire a freelancer” website, you know the ones where people post their Internet and related technologies project for so called “professionals” to bid on, you’d find that many many projects are to “clone” or “scape” another website. It makes me sick that people bid on these illegal projects!!! I’m not saying that all the buyers and sellers are crooks, but I am saying that this activity is going on in plain site yet nothing is being done about it.

Julio Alonso

MIT Dude, I don’t think your analogy is quite right. The main issue here is these guys making money out of your content, which is not something you do when you download music (unless you copy it to a CD and then sell it on the street).

The worst thing here is plagiarism: when someone takes your content and pretends he is the author (something you are unlikely to do with a Britney Spears song).

Om Malik

hey MIT Dude, i think it was in the napster days. its simpler to just buy now. thank you itunes. still your point is very valid.

MIT Dude

Well now you know, how the artistes felt when you downloaded their music using Napster or BitTorrent.

awww…don’t tell me you never ripped off a song in your life huh?

Gopi

I hate to post this Om , but stopping the scrapers is almost a impossible task from the webmasters end ,you kill one and hundreds will appear!…

IMHO , the best way to stop this is to eliminate their financial incentive ,almost 90% of this scrappers monetize with adsense – if only google had a strict adsense spam policy and act on the spam reports fast enough (meaning they have to deploy more warm bodies) this problem would’nt be this big!

AGoToGuy

Sorry to hear about this Om but I have one question. Have you been hiding under a rock or something? The amazing part to me is that someone who is supposedly so net savvy JUST realized that scraper sites steal blog content for the purpose of displaying ads next to it. Just seems odd you didn’t know about this a long time ago. In the online marketing and advertising business this kind of thing is old news.

Brian

Om, this is nothing new – webmasters have been complaining of their sites being scraped for short-term Google AdSense monetization for some time now.

By the way, your blog entries don’t seem to show for this Firefox user:

Error: [Exception… “Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIXMLHttpRequest.status]” nsresult: “0x80040111 (NS_ERROR_NOT_AVAILABLE)” location: “JS frame :: https://gigaom.com/wp-content/themes/gigaom/javascript/giga.js :: stateHandler :: line 32″ data: no]
Source File: https://gigaom.com/wp-content/themes/gigaom/javascript/giga.js
Line: 32

jason

i know you don’t want to do this, but in august i killed our full text feed because of the text theft. our feeds now just display the summary. i didn’t want to go that route either, but i was getting sick of our content being copied/pasted and placed on a page with adsense ads(we don’t even have adsense on our page, so why should they make money where we don’t). as soon as i killed the full text feed, they completely stopped. i haven’t seen any any new content stolen since august.

General Public

Scrapper sites may soon become the Achilles heel of google adsense program and trigger massive advertiser withdrawal, like what happened to banner advertisements of Web 1.0 era, when many sites started to reload the page every few seconds to get billions of ad displays and advertisers lost millions…….

Fazal Majid

Talk about a nasty Christmas surprise…

Uf as seems likely, the ripoff artists are mirroring your posts wholesale using some kind of bot, it is very easy to add some JavaScript code to your page, that defaces the plagiarized site big time.

Jay Currie

Publishers using the summary feature of RSS feeds is one side, aggregators using the “excerpt” capacity of most of the RSS parsing software is the other.

Having twenty words of Om’s content as Memeorandum does at the moment means people will come to this site; having the whole article is theft.

It is an issue which is going to become much more important as mashups and aggregations try to add value to original content.

Havagan

Looks like the site’s been taken down…

“Account for domain voip.xb90.com has been suspended”

Paul

Jacob Levy

This is why I’m against full text in feeds. Set your feed to include only the first N words and you’re done. I can’t see how you can complain if your feed contains the full text, you’re practically giving it all away.

Comments are closed.