Catching pedophiles with text mining and game theory

child protection online

Earlier this year, I wrote about how ill-intentioned data scientists could use their skills to build highly sophisticated spambots that fool us into thinking they’re real by looking and acting like real people. It turns out those same techniques can be used for good, too — like in the case of a group of Spanish researchers who have developed a chatbot designed to lure and catch pedophiles in online forums such as social networks and chatrooms.

In practice, the researchers’ chatbot, which they call a “Negobot,” should act a lot like how an advanced spambot on a service like Twitter might act, and using tools already in play by email spammers to avoid spam filters. The Negobot initiates a discussion with a target and then decides its next move based on how the target reacts. It is designed to respond to interactions in a time consistent with how long a child would take to respond.

Going forward, the researchers note, they’d like the system to speak more like a young person would on the internet and less like a machine acting like a teenage girl. Other planned improvements include amping up the bot’s abilities to discern sentiment, answer specific questions and transition naturally from one topic to another.

These seem like critical steps to take given how strongly the system relies on game theory to capture its prey. Negobot is designed to get as much information as possible from the person on the other end of the discussion, and doing so will require an accurate understanding of what he’s saying so the bot can up its “aggression” level or change tactics accordingly. When the system senses its target getting more interested in sexual activity, it engages in a more-explicit discussion; if the system sense the target getting less interested (or perhaps playing hard to get in order to avoid raising suspicion), it brings up topics such as family problems or a need for attention in order to appear more vulnerable.


Aside from having a laudable goal, the Negobot research also highlights the growing number of tools out there for anyone trying to build advanced data-analysis products. It uses Google Translate to convert discussions to English, Lucene to rank the “slimyness” conversations against a database of other conversations, and the Artificial Intelligence Markup Language in order to help the bot be a more realistic participant in the discussions. For their training dataset, the researchers used 377 real-life child-pedophile conversations from a sort-of vigilante pedophile-baiting site called Perverted Justice.

(One has to assume there have been a lot of corporate R&D resourced poured into replacing social media monitors and livechat customer-service agents with machines, too, but many of those techniques might be under lock and key.)

I’m actually kind of torn on whether the chosen dataset is helpful or harmful to Negobot’s reliability, though. On the one hand (assuming they’re all real conversations), it’s a clever way to get access to a data that might otherwise be difficult to obtain. On the other hand, pedophilia seems like one of those areas where the more controlled a study is, the better, and where some expert guidance on how to interpret the data could be important.

A bunch of computer scientists inferring the behavioral traits of pedophiles based on chatroom conversations between pedophiles and online vigilantes could uncover some interesting patterns, but it also could result in some incomplete assumptions. The researcher actually use Perverted Justice’s “slimyness” scale as a baseline against which they can rank future conversations.

Even if the approach isn’t ideal, however, it is hard to argue with the idea behind the Negobot research. I’m guessing that chatting up lechers online isn’t anyone’s idea of a good time, so offloading that work to computers that can do it well should be at least a small victory.

Feature image courtesy of Shutterstock user Feng Yu.

You're subscribed! If you like, you can update your settings


Comments have been disabled for this post