Making every developer a Big Data developer

What does it mean to be a Big Data developer?  Does it mean you’re a data analyst who knows how to code?  Does it mean you’re a hero developer who also knows about statistics and BI?  Does it mean you’re s MapReduce developer?  Perhaps all of the above?

Honestly, I don’t know what it means to be a Big Data developer today.  And I don’t believe anyone else knows — not with great certainty, anyway.  What I believe quite strongly, however, is that the distinction shouldn’t even exist.  Every developer should be a Big Data developer.

Ubiquity of data skills
It is the case today, as it has been since at least as far back as the early nineties, that most developers have basic SQL database skills.  Even someone completely unfamiliar with database internals, administration and design knows how to write code that queries a database, presents the data, permits it to be updated by the user, and then posts those changes back to the database.  It’s a basic skill; an assumed prerequisite.

When I first started presenting to developers at conferences twenty years ago, and for many years after, I was fond of saying “every business software application is a database application.”  Back then, that seemed a bit bold …even arrogant. Today it’s just a statement of the obvious.

Soon, I will need to modify that mantra to say that every business software application is a Big Data analytics application.  That isn’t yet the case, but it will be.  Because every user will want analytics features in their software (including the apps on their phones and tablets), and developers will have programming languages, APIs, object models and tools inside their integrated development environments that make it as routine to implement analytics functionality as it is to provide basic data access functionality today.

Big Data dev shouldn’t be a big deal
Querying a relational database is easy; so is visualizing the data that gets returned.  Why should data ingestion, processing of streaming data, building machine learning models or performing quick aggregations on semi-structured data still be specialty tasks?

Microsoft has provided some rudimentary tools for its .NET developers, mostly tied to Apache Hive.  Oracle has done likewise for its PL/SQL developers.  But developers need more, and they need it faster.  Java, C# and even JavaScript (including Node.js) developers should have libraries that abstract away the difficulty of doing so many labor-intensive Big Data tasks.

The breakthrough, and its opportunity
But dev tools vendors aren’t providing these facilities, and developers aren’t demanding them.  What will break the logjam?

The likely answer is user demand, and anticipation of that demand by astute developers and developer managers.  Once users insist on analytics functionality in their line of business applications, and once lots of developers experience the full-blown pain of implementing this functionality without good tools and libraries, then the market need will be clear.

But right now there’s a window for companies to be proactive, and take an early lead.  Those companies can win developer mindshare, and they can capture the attention of the incumbent vendors.  They will need to educate and evangelize, too.  That’s not for the faint of heart, of course.

But one day these capabilities will be commonplace and commoditized; adding them to a developer stack will be mere necessity.  Today, there’s a chance establish leadership.  And that will pay dividends for a very long time.