How NASA battle-tested its Mars rover live stream

Curiosity Rover

Updated: Anyone excited to watch NASA’s Mars rover, called Curiosity, land on the surface of the red planet on Sunday night can all but rest assured that too much demand won’t kill the stream. NASA teamed with an application-testing specialist called SOASTA to ensure the world can keep watching even if demand spikes or servers fail, proving a single implementation of its application stack can handle 25 Gigabits per second of web traffic.

SOASTA tests the traffic load applications can handle by generating cloud-computing-based resources that mimic the traffic generated by potentially millions of simultaneous real-world users. The company also recently tested London2012.com, the official Olympics web site that organizers predict will have to handle more than a billion visits over the course of this year’s event.

According to an e-mail explanation sent to me by NASA and SOASTA, here’s how the two groups put Curiosity’s streaming infrastructure, which is hosted on the Amazon Web Services cloud, through its paces:

  • They built a test infrastructure comprised of a single origin server (a Mac Pro housed at NASA’s Jet Propulsion Laboratory) serving four bitrates (250, 500, 750 and 1,000) to a single Flash Media Server. Output was cached by a single “tier 1″ Nginx server, fronted by 40 “tier 2″ load-balanced Nginx servers running on Amazon EC2.
  • SOASTA generated load from six Amazon EC2 regions across the world, generating more than 25 Gbps of traffic and pounding the application for nearly 40 minutes.
  • After 20 minutes, they terminated 10 instances (see Arrow 1 on the chart) to see if their stack and Amazon’s cloud could handle the failure. This temporarily reduced the amount of traffic the system could handle, but Amazon’s Elastic Load Balancer service had the failed instances back up and handling 25 Gbps in about 5 minutes.
  • When the team terminated 20 instances (see Arrow 3), the remaining servers’ traffic-handling rate dropped to 12 Gbps and servers started showing signs of being overloaded. Once again, Elastic Load Balancer brought the instances back up (see Arrow 4) and the traffic rate returned to its initial 25 Gbps.
  • All told, SOASTA’s load-testing servers downloaded 68TB of video (see Arrow 2) from NASA’s cache during the nearly 40-minute test.

In the end, the team concluded:

Load on the primary FMS server and the tier 1 cache remained very low for the entirety of the test; we should have no problem running dozens of stacks during the live event. Anecdotal evaluation of the NASA live stream during testing showed no buffering or bitrate drops.

We are confident that the results of this test suggest that an aggregate of these stacks will be able to deliver the required streaming delivery for the Curiosity landing event.

Overall cost and flexibility benefits aside, the ability to test the effectiveness of an application’s infrastructure relatively easily and inexpensively is turning out to be one of the big benefits of cloud computing. NASA’s Curiosity test is just the latest example of this. Video-rental giant Netflix has built an army of simian-named services (such as Chaos Monkey) that simulate everything from the failure of a single server to the failure of an Availability Zone in Amazon’s cloud, where Netflix runs almost all of its IT operations.

Tune in tonight at 10:31 p.m. Pacific Daylight Time to see if NASA’s Curiosity streaming infrastructure really can hold up.

Update: Amazon Web Services has posted a blog detailing NASA’s production architecture for the Curiosity live stream. You can read the details there, but here’s a diagram of the architecture that shows how the test architecture scaled:

loading

Comments have been disabled for this post