Job one of someone operating a data center is keeping it online. At the Structure Data event in New York City, three women responsible in some way for keeping the data centers of Microsoft, Goldman Sachs and Facebook online discussed how they gathered the info needed to prepare their data center and hardware for Hurricane Sandy.
By far the biggest challenge was faced by Tamara Budec, VP of critical systems and engineering at Goldman Sachs & Co., who had to deal with tracking the weather, but also preparing to move the data and software systems that in some cases are on decades-old legacy systems. In the aftermath of Sandy she faced a different challenge, namely wondering if the stock exchange would open or if banks would end up trading electronically from their back office.
Whatever they decided would affect the capacity she needed to plan for the day. As for Facebook, it was on alert, but its capacity planning issues are tied more to planning out for the next 18 months and supporting the growth in users and new products. Heather Marquez, manager asset strategy and optimization at Facebook, said, “Granted we are surprised by last-minute product launches,” but in general she and the Facebook team work hard to be ahead of the demand.
Massive scale means bigger data and more opportunities
The panelists also addressed the new challenges associated with massive scale. Amaya Souarez, director of data center services at Microsoft, started off philosophically, “When you’re building a cloud scale infrastructure you need to start with acceptance … and build a cloud software that is resilient in and of itself.”
She quickly got practical when it came to the topic of tracking data and metrics that Microsoft uses to charge business units for use of the data centers. She said the computing company now charges people based on the kilowatts of power used. “We track all performance on a per watt per performance basis, and so we look at this data and come up with more efficient models,” said Souarez.
She added that they are charging back carbon emissions to the business as well, with the goal of using the data about the operations to improve the world. For example she said, “We still have older systems and want to transition them to the most effective overall platform.”
Data also is helping Microsoft reduce spending on unneeded equipment. For example Souarez said that the firm’s Boyton, Va. data center expansion doesn’t come with backup generators, because a look at the preceding six years of data showed that the power didn’t go out often enough and that the services hosted in that area were able to be switched over to another site in case of failure.
Data isn’t everything
Yet, the data can’t help in every situation. Budec noted that the firm is interested in pursing the flexibility and resiliency offered by software defined networks and software defined data centers, but that she is having a tough time coming up with return on investment metrics. Yet, she knows that the transition would have advantages for the company, but still isn’t sure of the metrics she needs to prove it.
“Basically you’ll let an OS run your facility,” Budec said. “It’s coming to be more of a trend … but engineers are not ready to let go with manually running that equipment.”
She also said, “With a scale out distributed compute model, which is becoming more and more how we operate, we’re abstracting the physical from the infrastructure, so capacity and asset tracking is harder and more challenging.” For example she said that you have an increase in the number of servers, despite the physical decrease in machines.
Beyond the increasing complexity and more data generated by their operations, the three panelists ended with the observation that data doesn’t always mean everything. Souarez noted that while data will help you in your discussions, it takes a lot of personal committment and interactions. “Even though you may have the data that doesn’t mean you will always win the argument … It does take personal influence as well.”
Check out the rest of our Structure:Data 2013 live coverage here, and a video embed of the session follows below:
A transcription of the video follows on the next page