“The best minds of my generation are thinking about how to make people click ads … That sucks.”
- Jeff Hammerbacher, co-founder and chief scientist, Cloudera
Well, something has to pay the bills. Thankfully, there’s also a sweeping trend in the data science world right now around bringing those skills to bear on some really meaningful problems, from the effects of tree pruning to mapping humanitarian crises around the world. I don’t know about you, but I’m willing to sacrifice a little digital privacy if it means saving some lives.
We’ve already covered some of these efforts, including the SumAll Foundation’s work on modern-day slavery and future work on child pornography. Closely related is the effort — led by Google.org’s deep pockets — to create an international hotline network for reporting human trafficking and collecting data. Microsoft, in particular Microsoft Research’s danah boyd, has been active in helping fight child exploitation using technology.
This week, I came across two new efforts on different ends of the spectrum. One is ActivityInfo, which describes itself on its website as “an online humanitarian project monitoring tool” — developed by Unicef and a consulting firm called BeDataDriven — that “helps humanitarian organizations to collect, manage, map and analyze indicators.” That partnership actually seems fairly well established (the ActivityInfo website claims it’s used by more than 75 organizations across more than 15,000 sites), although I came across it via a blog post about why BeDataDriven decided to build the database on Google’s cloud.
The other effort I came across is DataKind, specifically its work helping the New York City Department of Parks and Recreations, or NYC Parks, quantify the benefits of a strategic tree-pruning program. Founded by renowned data scientists Drew Conway and Jake Porway (who’s also the host of the National Geographic channel’s The Numbers Game), DataKind exists for the sole purpose of helping non-profit organizations and small government agencies solve their most-pressing data problems. It accomplishes this goal by hosting weekend-long DataDives — essentially hackathons for data scientists — as well by facilitating longer-term engagements between volunteer data scientists or DataKind staff and organizations.
Saving money by proving what every landscaper knows
One of those volunteers is Brian Dalessandro, VP of data science for display advertising platform Media6Degrees. He met Porway at a data-industry function in New York in late 2012, was sold on DataKind’s vision (“[Jake's] very convincing that you should be passionate about it, too,” Dalessandro said) and got involved with his first DataDive shortly thereafter. The beneficiary organization: NYC Parks, which wanted help quantifying the benefits of tree pruning and the neighborhoods most at risk of tree damage from storms.
The benefits of mapping the neighborhoods in peril are pretty obvious, but doesn’t everyone already know that pruning keeps trees healthier and reduces the risk of falling limbs and other accidents? Kind of, Delassandro explained. Up to this point, all of the evidence has been anecdotal, which isn’t always enough when it comes to new expenditures in tight city budgets.
“They knew what they wanted to solve,” Dalessandro recalled, “they just didn’t know if they had the right ingredients to solve it.”
NYC Parks came to the DataDive with three datasets it hoped would do the trick — a census of every public tree in the city; a log of every work order on those trees; and a log of when each city block’s trees were pruned. After scraping some weather data and figuring out a working definition of “risk” that was both quantifiable and satisfied the department’s needs, Dalessandro and some others were able to solve the storm-prediction problem. Quantifying the effects of pruning turned out to be a hairier problem, though.
So, for the next four months, Dalessandro went to work during his spare time trying to solve it. Most of the work went to formatting the datasets so he could actually work with them like they were the same thing. This is actually a common issue with government agencies and non-profits, Porway noted, because they’re usually collecting data for accounting or reporting purposes rather than to use for statistic analysis.
Once the data was ready to go, though, Dalessandro was able to rework some existing code, which he had previously written to predict whether ads actually caused people to buy products, and do the actual analysis. “Instead of people converting, there’s trees and limbs falling off,” he analogized.
In the end, he found that pruning reduces hazardous work orders the following year on the blocks pruned by 22 percent. The next steps are to put his results into a business context, presumably to make a case for a better-planned and more-comprehensive pruning system. If it’s cheaper than sending out crews to fix damage, that’s probably not a bad idea.
Can you solve bigger problems without targeting a few ads?
As easy as it is to rip data science in the name of advertising, though, it seems like having that high-pressure business experience actually really helps with data volunteerism. One of SumAll’s missions is to teach the non-profits it works with to think about businesses in terms of what key performance indicators they want to track. Porway said DataKind is quite focused on teaching organizations to think like data scientists, even that just means structuring their data consistently so they can analyze it if they need to.
For his part, Dalessandro is excited to volunteer again, in part because he likes putting his well-honed technological skills to work in the name of the greater good. At previous jobs, he said, volunteering meant spending eight hours at the park pulling weeds or something equally mundane. However, he said, if someone needs a type of predictive model that he could build in his sleep, he could deliver truly meaningful results in just a couple hours.
If there’s a dark lining to this silver cloud, though, it’s that there will always be more problems than people to solve them. That doesn’t dissuade Porway, though, who sees a growing movement every time hundreds of people show up at a DataKind event, new chapters popping up overseas and the work being done by his peers in other organizations. Beside, he said, while some people are tackling difficult problems, there are lots of organizations who could benefit even from simple things like visualizations.
And free help is probably a better option than trying to bring those skills inside an organization. “Trying to hire data scientists to do this,” Porway said, “would be a Herculean task given how rare they are.”