12 Comments

Summary:

Hopes were high leading into Saturday’s Comic-Con ticket-sale launch that TicketLeap’s cloud-based ticketing platform would be an availability superhero after two failed attempts in November, but those hopes were dashed nearly immediately. The problem was a MySQL bug, and the solution was scaling down cloud servers.

spiderman

Updated: Hopes were high leading into Saturday’s Comic-Con International ticket sale launch that TicketLeap and its cloud-based ticketing platform would be an availability superhero after two failed rounds of ticket sales in November. Those hopes were dashed nearly immediately, however, as would-be buyers were greeted with over-capacity error messages. Despite speculation that the issue was caused by TicketLeap running too few web servers in its Amazon Web Services infrastructure, I have confirmed with TicketLeap that a known issue with the MySQL database is to blame. The news doesn’t exactly wash away the stain on TicketLeap’s reputation — especially among thrice-scorned Comic-Con fans — but it actually goes a long way toward confirming the wisdom of TicketLeap’s decision to utilize cloud computing.

TicketLeap Vice President of Engineering Keith Fitzgerald explains the issue in great detail in a blog post that will go live at 8 p.m. EST, but the gist is that under heavy Comic-Con load, nearly all of TicketLeap’s database connection got tied up doing DNS resolution. Update: The post-mortem post is live here. As Fitzgerald explains in his post, DNS lookup is a “blocking task” that can slow performance during heavy traffic periods, but TicketLeap uses security features of AWS’s Relational Database Service that negates the need to perform the lookups. Unfortunately for TicketLeap, RDS does not support the standard workaround, called the “skip-name-resolve” flag, used to avoid DNS resolution when it isn’t necessary. Fitzgerald believes the issue might be resolved in MySQL version 5.5, for which AWS just announced support. TicketLeap was using MySQL version 5.1.

The real kicker of Saturday’s failure is that scalability wasn’t an issue at all and, in fact, just exacerbated the problem. Fitzgerald explains:

As it turns out, the issue was exacerbated by the number of servers. We decided at 9:13 AM PST to drop the number of web servers to 4 and orders began to flow at that time. This worked because the number of DNS lookups MySQL had to perform were reduced and we were able to process ~200 tickets a minute under extremely heavy load. This is certainly not our ideal level of throughput, but we were thrilled to start selling tickets to Comic-Con.

As I reported on Friday, TicketLeap scaled its AWS infrastructure up to 64 web servers in preparation of Saturday’s sale, and a test run in December led to the successful sale of 1,000 tickets in a minute against a traffic load of 50,000 buyers. Demand was so high on Saturday, however, that a decision to add more servers would have meant more DNS lookups and an even slower experience for customers. The ability to automatically scale down actually saved the day, and tickets sold out despite the performance issues.

Assuming TicketLeap is able to upgrade successfully to MySQL 5.5 on Amazon RDS and put this issue to rest, the question then will be whether its reputation can recover. Foursquare and Digg didn’t suffer much lasting damage after their decisions to use NoSQL databases MongoDB and Cassandra, respectively, led to lengthy outages last year. But the big difference in this case is that events rely on TicketLeap for serious business. Of course, it’s also arguable that sticking with the tried-and-true MySQL database on the proven AWS platform was hardly an imprudent decision. In fact, it looks a lot better after cloud computing saved the day by letting TicketLeap scale down its infrastructure as an ad hoc fix, and that it still remained operational.

For its sake, I hope TicketLeap gets another chance to prove that it can handle a Comic-Con-scale launch, and that it that it does its homework in advance to make sure nothing goes wrong.

To hear all about the cutting edge of strategies for handling big data, be sure to attend our Structure Big Data conference on March 23 in New York City.

Image courtesy of Flickr user permanently scatterbrained.

Related content from GigaOM Pro (sub req’d):

  1. It’s a shame, there are plenty of case studies on how to get this kinda thing rolling in the cloud, not something that I’d want to reinvent the wheel on.

    Share
  2. “TicketLeap Vice President of Engineering Keith Fitzgerald explains the issue in great detail in a blog post that will go live at 8 p.m. EST..”. His blog post explains nothing at all, especially not an issue that MySQL and certainly not an “issue” that will be fixed in the next version. Please disclose your relationship with TicketLeap as it seems TicketLeap simply neglected to take DNS into account in overall system design. That’s a shame for a cloud service provider.

    Share
    1. Derrick Harris Tuesday, February 8, 2011

      1. I inadvertently linked to the TicketLeap Blog instead of the TicketLeapTech blog. His explanation is available here: http://ticketleaptech.wordpress.com/2011/02/07/well-that-happened/.

      2. I have no relationship with TicketLeap other than my coverage of its partnership with Comic-Con. Regarding this post, I opted to let any configuration errors or oversights on the part of TicketLeap speak for themselves — it’s too easy to take the TicketLeap-screwed-up stance and ignore the bigger picture.

      In this, IMHO, the bigger picture is that despite the DNS situation, tickets were ultimately were sold because of the flexibility enabled by the cloud. It isn’t that removing web servers is impossible without the cloud, but TicketLeap was able to do so easily and without having to eat the cost of overprovisioning in the first place. Save for the DNS oversight, those 64 servers would have come in handy.

      Share
  3. ^anonymous

    /sigh. first and last response to anonymous trolls:

    I linked to the MySQL source code where the bug seems to lie and why I think this issue has been resolved in the 5.5 build. I also asked the community to comment on whether I am correct. You can’t really get more detailed than that. I also went into technical detail regarding why DNS lookups shouldn’t even happen in the first place.

    Moving forward, I’m absolutely happy to engage the community on what happened saturday but please, include your professional credentials.

    Share
  4. I would never support TicketLeap in any future endeavor again. Especially for something the scale of SDCC.

    Share
  5. Chris Albrecht Monday, February 7, 2011

    Too bad they didn’t scale down in time for me to get my ticket.

    Share
    1. hey chris –

      we’re really sorry too.

      Share
  6. I think the link anonymous is looking for is:
    http://ticketleaptech.wordpress.com/2011/02/07/well-that-happened/

    Share
  7. Sadly, there was too much in play, and why Comic-Con continued to use TicketLeap after last December’s all-but-failed “test,” it was bound to happen again.

    Comic-Con is far too big to try to save money with vendors that don’t have experience handling the demand required for this event. As we saw in countless comments, customers would be absolutely willing to pay Ticketmaster’s fees and UX shenanigans like bypassing offers. To date, I have yet to witness an event where Ticketmaster was shuttered as badly as TicketLeap’s mess this past Saturday and the test in December.

    In the end, independent media employees and companies like myself will likely be unable to attend Comic-Con; TicketLeap’s systems /actively/ played a direct role in preventing them from doing so.

    /Some/ of us weren’t afforded the luxury to sit in front of our computers for hours on Saturday retrying as in life, there are many things more important than trying to get tickets. There shouldn’t be a “trying” when a company like TicketLeap charges a service fee on top of every registration.

    If TicketLeap truly wanted to try to make things right, they would start by refunding these service fees right away- since customers can’t charge them for the time they wasted trying to see it to completion. Instead, what we’re seeing are a bunch of apologies and blame shifting from the staff when all it comes down to is one thing: poor planning.

    Unfortunately for TicketLeap and Comic-Con, no amount of apologizing will make things right for the people who were already making plans and now look to cancel them. They screwed up big time, and that’s all it comes down to. They just weren’t prepared, couldn’t handle it, and everything that followed was in vain.

    History will show us that Comic-Con (and if TicketLeap is even involved with them in the future- despite the objections of many) will announce a block of available tickets for a specific time in the future, getting everyone ready again for an attack floodgate of hits at the exact moment- causing all of this to happen again. More frustration, more complaining, more blame shifting.

    Here’s a clue: Don’t announce WHEN tickets will be available since you clearly can’t handle it. Just make them available, and those who are diligent in checking the site for updates will see and have a fair chance.

    It’s a shame really, because it’s just not getting better in my personal history of Comic-Con registration over the past few years- it’s getting worse.

    We all hoped for the best, given the past vendor’s inability to provide this service, but we were all, again, horribly disappointed.

    [TicketLeap, if you'd like to contact me personally and see what you can do to fix things, my e-mail is below.]

    Chris
    chris[at]mysterything.com

    Share
  8. [...] TicketLeap’s exciting announcement last week that it would be ticketing ComicCon, the world’s largest comic book conference, took a turn for the worse this weekend as the site crashed from 9 to 4 on Saturday, after receiving up to 400,000 page requests per minute. Despite preliminary tests, the site just couldn’t handle the traffic load, and as TicketLeap explained in a statement, “In 2009, it sold out after 6 months. In 2010, it sold out in 2 months. On Saturday, Comic-Con International 2011 sold out in 7 HOURS (200x faster than last year if you’re keeping track).” The technological details of the mishap are here. If there’s good news, it’s that the company learned (the hard way) how to handle an incredible influx of sales, which could be a feather in its cap. Or as GigaOM put it, the company still has some homework to do. [...]

    Share
  9. [...] need lots of capacity from specific geographies in order to simulate real-world user load (see the recent Comic-Con ticket-sales snafu for evidence of how important this can be). If Enomaly and/or users can unearth a few other killer [...]

    Share
  10. [...] another recent situation, online ticketing startup TicketLeap suffered a database failure in its attempt to carry out opening-day sales for Comic-Con International atop an AWS-hosted [...]

    Share

Comments have been disabled for this post