Seen those features before

Researchers build pattern-recognition model that acts like a human

A trio of MIT researchers has developed a machine learning model that might help humans make better sense of big data by helping us make better sense of the patterns it discovers. Its creators call it the Bayesian Case Model, but a simpler description might be the example-creator.

The thinking behind the research is that humans tend to think about things and make decisions based on previous experiences or examples we’ve seen. Children, for example, might overhear just a few words of their parents’ conversation and know they’re talking about summer camp because they went last year and they know that words like “month,” “lake” and “counselors” are primarily used together only in that context.

If, however, we have limited or no experience in a particular field, a little help might be necessary — which is where the Bayesian Case Model comes into play. Given a set of data such as a recipes (which is one type the researchers used in their research), the model will categorize them based on their most-prominent ingredients, as well as their similarity to a representative example, or prototype, for any given cluster of recipes, which is also chosen by the computer.

prototype2

For example, even if I didn’t know that beer, chili powder and tomato were common ingredients in chili, I might be able to deduce that a recipe containing them is chili after seeing what the model has deemed the prototypical chili recipe. Indeed, the MIT researchers (Been Kim (right, above), Cynthia Rudin and Julie Shah (left, above)) found that not only did their model perform more accurately than previous approaches, but human testers were able to correctly categorize recipes at a significantly higher rate using output from the Bayesian Case Model than output from earlier approaches.

The approach should work with more difficult types of data in more specialized fields, as well.

This type of work, even if not this model itself, could become more useful as datasets continue outgrowing people’s abilities to analyze them. Unsupervised machine learning or artificial intelligence models, for example — from software like Ayasdi to Google’s famous cat-recognizing deep learning system — can already churn through lots of data and identify similar things. But any tools are only as useful as they are accurate, and as easy they make it for humans to decipher what they’ve found.

The full MIT paper is available here.