Web companies like Google (s goog) and Facebook invest incredible resources in making sure they know everything about their infrastructures and how each piece is performing. Why? Because the applications running on that infrastructure represent those companies’ entire businesses — if something at the server level is affecting application performance, that something must be fixed. The rest of the business world is now catching on.
Big banks, big data
One of the biggest converts (not surprisingly) is the financial services industry. According to Bryan Clark, CEO of Scotland-based systems-analytics provider Sumerian, his company’s roster of high-end banking clients want to analyze the data coming from their expansive computing infrastructures for two major reasons. One is to tune their electronic-trading systems for maximum performance, and the other is to adapt their infrastructure to the large-scale business-model changes the banks themselves are undergoing as they adapt to a new economic climate.
Those two reasons actually converge, he said, in the form of community cloud computing platforms. As more banks co-locate their trading systems on shared platforms with shared data sets — an attempt to minimize latency between themselves and trading centers without building their own data centers nearby — they need to make sure everything is running optimally. Clark said that means helping customers architect their cloud-based systems, then monitoring them after deployment with specialized algorithms to ensure that nothing — including another tenant’s system — is messing with their trading applications.
Banks are notoriously skilled in the IT department, but even they’re running up against tough obstacles with today’s new deployment models. “Systems are so complex these days that you can’t just do it by thinking about it,” Clark said. “You really do need to do some deep analysis.”
Cloud infrastructure is complex
Especially when it comes to the cloud, Sumerian isn’t alone in trying to help companies get insight into how what’s happening inside their server and switches affects their prized applications. I recently spoke with Vikas Aggarwal, CEO of Zyrion, about his company’s new predictive analytics module that’s designed specifically for monitoring and analyzing cloud-computing and other next-generation infrastructure. Essentially, he said, it’s a matter of being able to assure application performance through network intelligence.
With Zyrion’s software, he explained, companies can learn how their systems behave over the course of weeks, days or hours and plan resource allocation accordingly. If behavior varies too greatly from the expected norm, the software will send an alert or, if permitted, automatically adjust resources as necessary. Over time, companies can track how their infrastructure usage and the behavior of the components has changed, and can better plan for future deployments.
software vendors service providers such as Virtela are pushing predictive analytics products for customer data centers, and cloud startups such as New Relic and Boundary Networks are also helping customers make the connection between application behavior and infrastructure-level problems. Using its own service to analyze its own system, New Relic was able to create a time-series graph for disk utilization and spot performance trouble spots that needed attention.
Build your own with Hadoop
Some systems-savvy companies — including the aforementioned Facebook and Google — as well as smaller sites — are building their own tools for collecting and analyzing system data. Unstructured data platform Hadoop is proving particularly useful in these efforts. Even Splunk, the software vendor riding its ability to analyze machine data toward an IPO, has built an integration with Hadoop to let users take their analytics efforts to the next level. E-commerce marketplace Etsy, for example, uses both Hadoop and Splunk to help it keep track of what’s going on across its infrastructure in the face of more than a billion page views a month. (I’ll be speaking on stage with Splunk Founder and CTO Erik Swan at our Structure: Data conference this week in New York on the topic of mining machine data.)
But don’t let all this talk about trading platforms, cloud computing and massive-scale web sites fool you into thinking systems analytics is only for the big guys. Sumerian’s Clark says it’s the natural evolution as companies of all types look to get more from their management software and operate more efficiently because of it. Even if you have a monitoring system showing CPU utilization, he asked, “How does that relate to people being able to enter timesheets on a Friday afternoon?”
Feature image courtesy of Switch Communications.