Developers are always on the lookout for new, better, faster, cooler tools, languages, compilers. And the popularity of these toolsets ebbs and flows. One week Scala is at the top, the next it’s Go language.
Silvius Rus, director of big data platforms for Quantcast, gave Sawzall a shout-out during a Structure Data Guru panel last week. “It’s a lightweight language developed by Google that bridges procedural and interpretive languages,” Rus said.
Michael Driscoll, CEO of Metamarkets and moderator of the panel, later explained why that’s important. With a declarative language, the programmer tells the computer what to do in almost English-language-like sentences. To tell the computer to draw a circle, a declarative or imperative programmer might say “draw.circle with a size attached,” Driscoll said.
Procedural languages, on the other hand, are much more detailed step-by-step instructions — they sound more like math. A procedural approach would “define the actual pointer and tell it to move one degree to the left and one degree up and the square root of 2 up to the diagonal and repeat X times,” Driscoll said.
Sawzall is a nice blend between a declarative language that might be too high level to do all of what the programmer really wants and procedural, “which is way too in the weeds” to be fully productive, Driscoll said. More broadly, Sawzall is a powerful and compact language for log data aggregation and transformation. And, he added, it plays well with Hadoop MapReduce.
New toolsets for webscale computing
Yarn was built to “just think about mass-produced jobs.” Continuity is building a real-time streaming engine called Big Flow and using Yarn for all the resource deployment and management.
He also gave kudos to Weave, a higher-level framework. Weave “allows you to build a much wider class of applications on top of Yarn. So,t Yarn is … something that we will be going forward with for at least the next half a decade [and] Weave allows you to actually build more wide scale applications on top of that.”
Bhaskar Ghosh, senior director of engineering at LinkedIn(s lnkd), touted Helix, a generic distribution cluster manager developed at LinkedIn and which is now an Apache incubator project. Helix simplifies distributed system development by separating cluster management from the primary component tasks of a distributed system, according to LinkedIn.
Kafka, Storm slake the thirst for real-time frameworks
Driscoll also sees traction for Kafka, a real-time framework for ingesting and managing data streams and Storm, out of Twitter, for processing those streams. “Think of Kafka and Storm as the HDFS and MapReduce analogs but for real time — Kafka for storage and Storm for compute,” Driscoll said.
On its blog, LinkedIn describes Kafka as a distributed publish-subscribe messaging system — also now an Apache project. Kafka is used by Twitter and Square for log aggregation, queeuing, and real-time monitoring and event processing.
This list is by no means complete. When I spoke with Github co-founder Tom Preston-Werner a few weeks ago, he said Clojure, heretofore a rather obscure dynamic programming language, is gaining momentum. “It’s getting a lot of buzz round on the enterprise side,” Preston-Werner said.
The continued popularity of the Java Virtual Machine has breathed new life into languages like Clojure and Scala, he added. Indeed, the JVM remains nearly ubiquitous and that is a huge advantage for languages that support it. If you’re a developer, you want the widest possible audience.
“The JVM is still the modern foundation that lets you run everywhere and Clojure has benefited from that,” Driscoll agreed. “It’s certainly gained steam among an elite set of programmers in Silicon Valley.”