Twitter details its anti-spam system, BotMaker

1 Comment

As with most things Twitter, fighting spam messages is not as easy as it is in other parts of our online lives. The traditional machine learning models and other techniques used to learn what spam looks like and classify messages don’t always work when you’re dealing with content that users expect to see in real time. However, Twitter has developed a system called BotMaker to address its unique situation — a system the company claims has resulted in a 40 percent reduction in spam since it was rolled out.

Engineer Raghav Jeyaraman explained BotMaker in a blog post on Thursday that like many other systems in place among web companies, Twitter included, the trick to BotMaker is breaking it down into real-time, near-real-time and batch jobs. Essentially, a tool called Scarecrow tries to stop spam messages before they’re written to Twitter, by spotting problem account names or URLs, for example. Next, a tool called Sniper is constantly scouring written messages looking for things Scarecrow missed, possibly because it didn’t have enough time to analyze certain features. Finally, batch jobs periodically analyze large amounts of offline data in order to uncover long-term behavior patterns that can help make the online models smarter.

Source: Twitter

Source: Twitter

Aside from the 40 percent overall spam reduction Twitter has seen from BotMaker, Jeyaraman notes that the ability to detect spam in the write path has been particularly beneficial.

This is not the first attempt [company]Twitter[/company] has made to combat spam using machine learning. It’s not clear whether BotMaker uses techniques from this research, but Twitter did team up with University of California, Berkeley, researchers in 2012 to develop a system that can detect spambots based on characteristics such as email addresses or the time it takes them to fill out a registration page. One of researchers, Chris Grier, told Gigaom last year that while the resulting algorithm had been used to periodically purge Twitter’s roles of bots, it could also be turned into an online system that could spot spam accounts in real time.

1 Comment

Slavon Smartmil

To my mind inevitably Twitter stories invite comments from people who say things like “I don’t want to know what kind of sandwich someone ate!” etc. And if you don’t want to use Twitter by all means don’t!

But if you’ve been avoiding it you should know that it’s become a much more interesting tool than just sandwich watching, or for that matter duplicating an RSS feed.

The amount of raw and super useful information I’ve pulled off Twitter on Ferguson the last week alone makes it an amazing tool. The media filters everything, and you have to of course apply your own filters to what you read from people, but getting your information straight from the sources in real time is really tough to beat.

William Gibson (the cyberpunk author, @greatdismal on Twitter) alone is an awesome resource, he’s a retweet machine, just from his feed you can sample a huge variety of fascinating information. I follow less than 50 people myself, and a lot of those are specialty accounts that don’t even tweet often. But due to the networked nature of things I still access a very broad swath of data.

You don’t have to follow people who tweet their lunch.

Comments are closed.