8 Comments

Summary:

With millions of viewers expected to watch history Sunday night, NASA couldn’t afford to let the live stream of its Mars rover Curiosity landing go untested. Here’s how NASA put its Amazon Web Services-based infrastructure through its paces to ensure it keeps up with demand.

Curiosity Rover

Updated: Anyone excited to watch NASA’s Mars rover, called Curiosity, land on the surface of the red planet on Sunday night can all but rest assured that too much demand won’t kill the stream. NASA teamed with an application-testing specialist called SOASTA to ensure the world can keep watching even if demand spikes or servers fail, proving a single implementation of its application stack can handle 25 Gigabits per second of web traffic.

SOASTA tests the traffic load applications can handle by generating cloud-computing-based resources that mimic the traffic generated by potentially millions of simultaneous real-world users. The company also recently tested London2012.com, the official Olympics web site that organizers predict will have to handle more than a billion visits over the course of this year’s event.

According to an e-mail explanation sent to me by NASA and SOASTA, here’s how the two groups put Curiosity’s streaming infrastructure, which is hosted on the Amazon Web Services cloud, through its paces:

  • They built a test infrastructure comprised of a single origin server (a Mac Pro housed at NASA’s Jet Propulsion Laboratory) serving four bitrates (250, 500, 750 and 1,000) to a single Flash Media Server. Output was cached by a single “tier 1″ Nginx server, fronted by 40 “tier 2″ load-balanced Nginx servers running on Amazon EC2.
  • SOASTA generated load from six Amazon EC2 regions across the world, generating more than 25 Gbps of traffic and pounding the application for nearly 40 minutes.
  • After 20 minutes, they terminated 10 instances (see Arrow 1 on the chart) to see if their stack and Amazon’s cloud could handle the failure. This temporarily reduced the amount of traffic the system could handle, but Amazon’s Elastic Load Balancer service had the failed instances back up and handling 25 Gbps in about 5 minutes.
  • When the team terminated 20 instances (see Arrow 3), the remaining servers’ traffic-handling rate dropped to 12 Gbps and servers started showing signs of being overloaded. Once again, Elastic Load Balancer brought the instances back up (see Arrow 4) and the traffic rate returned to its initial 25 Gbps.
  • All told, SOASTA’s load-testing servers downloaded 68TB of video (see Arrow 2) from NASA’s cache during the nearly 40-minute test.

In the end, the team concluded:

Load on the primary FMS server and the tier 1 cache remained very low for the entirety of the test; we should have no problem running dozens of stacks during the live event. Anecdotal evaluation of the NASA live stream during testing showed no buffering or bitrate drops.

We are confident that the results of this test suggest that an aggregate of these stacks will be able to deliver the required streaming delivery for the Curiosity landing event.

Overall cost and flexibility benefits aside, the ability to test the effectiveness of an application’s infrastructure relatively easily and inexpensively is turning out to be one of the big benefits of cloud computing. NASA’s Curiosity test is just the latest example of this. Video-rental giant Netflix has built an army of simian-named services (such as Chaos Monkey) that simulate everything from the failure of a single server to the failure of an Availability Zone in Amazon’s cloud, where Netflix runs almost all of its IT operations.

Tune in tonight at 10:31 p.m. Pacific Daylight Time to see if NASA’s Curiosity streaming infrastructure really can hold up.

Update: Amazon Web Services has posted a blog detailing NASA’s production architecture for the Curiosity live stream. You can read the details there, but here’s a diagram of the architecture that shows how the test architecture scaled:

  1. Reblogged this on Pallino1021…The Blog and commented:
    NASA is one of our favorite examples of collaboration. In Collaboration and Co-Creation, we talk about NASA’s commitment to involving people within everything NASA, as a means of creating lasting, very emotional connections for their future. Tonight is truly an opportunity to inspire “Curiosity”!

    Share
  2. will NASA stream all the video live throughout the world. Is it just the landing that will be live. If its USA Tax dollars being spent why can’t we watch everyday operations of the rove. why does Nasa hind all the details of the other rover live video? Is mars top secret or secret. Come on this is a joke. wow a live landing with pics Nasa says is ok to put out. wow thank you tax dollars. 2+ billion in secrets.

    Share
    1. You are very right. And a live stream to Germany too, plz, because Germany’s tax-payers payes more than just Hundefutter (meaning: peanuts) for a ray detecor onboard the rover.

      Share
  3. > All told, SOASTA’s load-testing servers downloaded 68TB of video (see Arrow 2) from NASA’s cache during the nearly 40-minute test.

    Are you sure about 68TB for 40mins? The screenshot shows 6TB.

    Also, you need 200+Gbps pipe to pass 68TB over 40mins.

    > SOASTA tests the traffic load applications can handle by generating cloud-computing-based resources that mimic the traffic generated by potentially millions of simultaneous real-world users.

    Are you sure about millions of simultaneous users? I see just 10K sumiltaneous users, not even million.

    Also, it’s not clear how 10K simultaneuos users with 1Mbps bit rate can generate 20Gbps traffic. It should be 10Gbps.

    Share
    1. Derrick Harris Monday, August 6, 2012

      Re: Point 1: That’s the number they gave me. It could be missing a decimal point.

      Re: Point 2: I’m talking about SOASTA, in general. It can run some mighty large tests.

      Share
      1. Derrick, re #2, I’m talking about the screenshot in the article. It has some strange data I failed to understand. It’d be nice to get SOASTA guys comments on the actuall figures.

        You see, such unconsistency gives a wrong impression on the whole story.

        Share
      2. Moreover, they managed to generate just 22Gbps / 40 = ~500Mbps from each frontend node. This is a very low figure for a such workload. I’d expect x5-x10 more traffic from each node. Perhaps this is just Amazon ELB / EC2 issues/limitations.

        Share
  4. good information

    Share

Comments have been disabled for this post