Big data meets the connected car: Researchers tackle the vehicular network


Traffic Jam

Soon our cars will be the most connected devices we own. Consequently they could generate the most expensive monthly data bills of any device we own.

Cars will have built in Wi-Fi allowing them to not only share data, but quite possibly act autonomously on that information. If carriers like Verizon(s vz)(s vod) get their way, every car will have embedded LTE, allowing them to grab any manner and any quantity of content from the airwaves. But all radio connections aren’t created equal.

Wi-Fi is essentially free, while cellular data is expensive. The seeming liberation of an always-connected vehicle could easily be constrained by the shackles of an enormous cellular bill. Is there a way we can maximize the “free” connectivity of  Wi-Fi while minimizing the costs of mobile data?

That’s the question distributed computing researchers from MIT, Georgetown University the National University of Singapore are trying to answer, and it’s a doozey of a problem. A distributed network of cars is by definition ad hoc – the vehicles are constantly moving in relation to one another. They’re forming new W-Fi connections while breaking old ones, changing their positions within the network or leaving the network entirely. Trying to get these mobile and unpredictable nodes to cooperate is going to be difficult.

But MIT graduate student Alex Cornejo said math can be used to used to wrestle just such a network out of freeway chaos. He and his colleagues have developed an algorithm that would allow hundreds of different cars to aggregate their internet-bound data and send it compressed over a single cellular connection, thus reducing bandwidth costs for all the vehicles participating.

The process starts with two cars in Wi-Fi range, both hoping to establish an internet connection to download content, send email, upload documents or some other action. One car passes its data along to the other, which is initially determined randomly, but as the vehicles move throughout the network patterns start emerging. Those patterns determine which vehicles become aggregation nodes for the network, Cornejo said.

“We bias the coin toss,” Cornejo explained. “Cars that have already aggregated a lot will start ‘winning’ more and more, and you get this chain reaction. The more people you meet, the more likely it is that people will feed their data to you.”

When any given car has aggregated enough data, it establishes its cellular connection, uploading aggregated data to the internet or downloading data, which it then distributes back through the same ad hoc network, Cornejo said. The amount of time spent aggregating is determined by the type of data, he added. Files with a longer shelf-life, like e-mail could be passed back and forth between hundreds of vehicles before it exits the network. Real-time applications would have far less tolerance for delay, but he said it would be possible for two vehicles making VoIP calls or video chat sessions to share a single cellular connection.

In theory, a fleet of 1,000 cars could see all of their data aggregated into just five cellular links, even accounting for cars that suddenly break from the network taking their stored up data with them, Cornejo said. The key is for algorithm to define distinct clusters of cars among seemingly random traffic patterns. If the distinctions between those clusters start breaking down, such as one platoon of traffic crosses paths with another, then the whole system breaks down.

That’s the paradox of connectivity, Cornejo said. If you have 1,000 cars in a single big cluster, data can be aggregated. If you have 10 well-defined clusters of 100 cars each, again data can be aggregated. But if you have two clusters of 500 cars in the vicinity of one another – with data occasionally being passed back and forth between each cluster – then aggregation becomes impossible.

Traffic Image courtesy of Flickr user


Calgary Drivers Ed

I keep seeing blog posts about car connectivity, but I still don’t understand the real-world application for it, even after all these blogs.


Why would I want to share my cell connection with others on the road and get stuck with the bill??

Even with some form of service that has willing/paying customers, what are the realistic chances there are enough customers in wifi range to would make this work?

Kevin Fitchard

Hi Orion, I think the idea is that if everyone share’s each others’ connections then everyone benefits equally. You may get stuck as the aggregated node in one instant but your neighbor would be designated next time.


It sounds like a new parameter is needed to solve the cross-cluster problem, which is to be selective about which nodes can join a particular cluster. You need to gauge the direction of travel and even lane position. If I’m on the 405 headed North, 25 feet away from me is another pack of cars headed South. Sometimes we are closing on each other at 130 MPH (rare), so it’s obvious who’s who, but sometimes we are all sitting in gridlock, so it isn’t obvious which vehicle is going which way. The vehicle DOES know which direction it’s traveling, so if it could include that bit of info when forming a cluster it would keep the opposing directions in distinct groups. Same holds for cross traffic. If positional/directional information could even detect that you’re in an exit lane or waiting to make a left turn, etc. it could reduce the occurrences of orphaned packets.

Sam Moore

Great article, Kevin! Ever do a follow-up story, you could talk to a semiconductor company.


Sloppy and spelling errors:

“Soon our cars will the most connected…”

“If carriers like Verizon get there way…”


What are the chances of two drivers out of 1000 listening to the same Pandora channel? Slim to none. Ad-hoc vehicular networks are good to share local data – such as traffic patterns 20 cars ahead, or GPS, if the poor soul does not have one and so on.

Kevin Fitchard

Hi ZoobiZoo (love the handle).

I asked that of Alex as well and he acknowledged it would be very rare that cars would be able to take advantage of multicasted data — the exceptions being live sporting event streams, etc…

But he said the point of the system isn’t necessarily to aggregate the same data, just like data, since a single radio transmitting a lot of compressed info is much more efficient than a bunch of radios transmitting small payloads independently.

Comments are closed.