9 Comments

Summary:

Twitter’s use of actual human intelligence to make sense of its search results, points to the mundane reality that even with machine learning and lots of data, sometimes humans are the best source for insights — something those trying to create artificial intelligence should remember.

brainstorm

Twitter has done a lot to change the way people share news, watch live TV and even how they protest. Along the way it has had to deal with scaling problems as well as the challenge of tracking how users connect to each other and how to then instantly — or at least very quickly — deliver the right tweets to the right streams.

To do this, it has built some impressive tools such as Scalding, Storm and FlockDB. Now as it attempts to sell ads and make money operating its popular service, it faces a problem that has vexed people who deal with computers for decades. How do you get a machine to understand what people are saying or typing? Not just recognizing the words (although in speech recognition programs that can still be tough) but to understand that when they are talking about “cancer,” they might be referring to a disease or an astrological sign.

Many people view this as a problem for machines — another step in the path to true artificial intelligence — and have created elaborate efforts to solve the problem of translating human thought into something a machine can parse. On Twitter, where an inane comment during a presidential debate can result in a #bindersfullofwomen hashtag, how does a computer recognize what that means and what type of ads to then place against it?

The answer is that it doesn’t. So Twitter instead turns to real people, via an automated process it described in a blog post published Tuesday. It has essentially automated a query to Amazon’s Mechanical Turk service which bids out jobs to real people.

From the Twitter post:

Suppose that our Storm topology has detected that the query [Big Bird] is suddenly spiking. Since the query may remain popular for only a few hours, we send it off to live humans, who can help us quickly understand what it means; this dispatch is performed via a Thrift service that allows us to design our tasks in a web frontend, and later programmatically submit them to Mechanical Turk using any of the different languages we use across Twitter.

On Mechanical Turk, judges are asked several questions about the query that help us serve better ads.

There has been a lot of effort to mimic the human brain in computers, but perhaps the most optimal way to take advantage of people is to recognize what they do well and find cheap ways to optimize our brain’s computational powers — not via replication in silicon, but by using computers to outsource the task to the most appropriate, cheapest, nearest or whatever people. Twitter has done this, but it’s not alone.

Douglas Merrill of ZestCast at Structure:Data 2012

(c) 2012 Pinar Ozger. pinar@pinarozger.com

For example ZestFinance, a company using data analysis to offer credit to people, tweaked its credit scoring model to include humans to help determine what variables might really matter in a particular person’s scoring model. All told, about 25 percent of the variables the company analyzes are the result of human intervention, but it’s the mix of humans and the existing data analytics that make the combination so powerful.

Another example of this combination is Gravity Labs, which uses people plus machine learning to construct interest graphs. When you combine people with better databases, faster computers or task-optimized systems like Siri or Watson you have a more realistic version of artificial intelligence than some self-learning and thinking robot. It also drives home one of the more subtle aspects of the big data revolution that my colleague Derrick Harris pointed out earlier this month. Data analytics in many cases is more about automation than insights.

Some of the best uses of advanced databases or data visualizations is in narrowing down what might thousands or millions of variables into something that can assessed by a person and then acted on. In Twitter’s case the computers can handle the recognition of a hashtag’s spiking popularity, but a quick call to a person can tell you why and what that hashtag means far more quickly and cheaply than a computer could. But a person can’t filter through billions of tweets to see what’s spiking and what isn’t.

Thus perhaps the most practical AI — for now — is recognizing what humans can do, and getting them the best, most compact information that allows them to make their decisions. Why replicate the brain if you don’t have to?

You’re subscribed! If you like, you can update your settings

  1. > Thus perhaps the most practical AI — for now — is recognizing what humans can do, and getting them the best, most compact information that allows them to make their decisions.

    Yes and intuition is a pattern recognition process !

  2. Great post. We have been planning to discuss this topic at #ideachat on Twitter, this Saturday, at 9 am ET: “Intuition + Creativity = Insights”. Would love to have you join us.

    @blogbrevity

    1. I’m on PST so #ideachat on Twitter this Sat, at 9 am ET a bit early however I’m now following you on twitter

  3. One Word: Context

    You seem to underestimate MI (Machine Intelligence). While AI doesn’t care about context MI is build on top, AI can’t learn math, MI learns it and also the fun of “no”. Not only the recognition but one of the earliest concepts we teach.

    There’s also a difference in exposure learning vs. directed learning. With directed learning one can stuff a few billion records into a machine and it will recognize a pattern. With exposure learning one has to expose differentials to let the machine create a pattern, which then builds an abstract to use it in the most adaptable way.

    The state of AI:
    This summer, Google built the largest pattern recognizer of them all, a system running on sixteen thousand processor cores that analyzed ten million YouTube videos and managed to learn, all by itself, to recognize cats and faces—which initially sounds impressive, but only until you realize that in a larger sample (of twenty thousand categories), the system’s overall score fell to a dismal 15.8 per cent.

    http://www.newyorker.com/online/blogs/books/2012/11/ray-kurzweils-dubious-new-theory-of-mind.html#ixzz2HWPurLf4

  4. I have been a big believer in judicious use of humans in building intelligent systems. Let’s see recent success stories in this regards.

    During 1999-2002 we built a semantic search engine [1,2] which got its semantics from ontologies/domain specific knowledge bases (we covered a series of domains [3]). The process involved a small amount of human involvement (just about 2 persons) who would use our toolkit to write software agents (focusing on a domain at a time) that would collect knowledge from multiple high quality sources of factual information, disambiguate, integrate and enrich them to build these domain specific ontologies/KBs. When we sought second round of funding, VCs asked how can you compete with Google who did every thing automatically (and in those days it was a bit hard to convince the need for “semantic search” when Web search seem to be giving users what they seem to want then!). So it was nice to see that recently Google took a major new step with GKG which started with Freebase that took a good bit of human effort to build it (and it is good that more semiautomated processes, after the Metabase acquisition, are speeding up building up of Google’s KB).

    The discussion in this story about humans in the loop for improving Twitter search is further proof that good use of human is critical in building higher quality systems.

    [1] http://slidesha.re/sw-ib
    [2] http://knoesis.org/amit/Taalee-Seamtic-Search-Engine-Interview.pdf
    [3] http://bit.ly/taalee-patent

  5. Another Great Day For the World of Value Extraction Share Cropping. Community Content appropriated via TOS that is then leveraged for more value by low paid workers….Paid with the revenue generated from the appropriated content.

  6. Bravo. Cross-posted to Phi Beta Iota the Public Intelligence Blog.

  7. Charlie Thompson Thursday, January 10, 2013

    Perhaps what we should be doing is allowing people to determine what they would like to gain more knowledge of verses trying to decide for them. Think about this and the consequences of making people mindless by telling them what a company or computer thinks they should know instead of letting them learn things they may not already be aware of. Think about it?
    There is a company that is doing just that called Darwin Ecosystem and the have a product that allows you do the opposite of semantic search on the web and correlates and shows trends of mass tweets on any topic. It’s called tweetzup.
    These are new true innovations.

  8. Charlie Thompson Thursday, January 10, 2013

    Perhaps what we should be doing is allowing people to determine what they would like to gain more knowledge on verses trying to decide for them. Think about this and the consequences of making people mindless by telling them what a company or computer thinks they should know instead of letting them learn things they may not already be aware of. Think about it?
    There is a company that is doing just that called Darwin Ecosystem and the have a product that allows you do the opposite of semantic search on the web and correlates and shows trends of mass tweets on any topic. It’s called tweetzup.
    These are new true innovations.

Comments have been disabled for this post