As anyone who has ever uploaded a high-definition video to the web can attest, it can be a painfully long process even with a high-speed connection. Now, imagine uploading videos far larger — well into the gigabyte range — all day, every day. You might get an idea of the challenges facing the companies responsible for getting us the endless supply of streaming content that we consume every day. I recently spoke with Shane Russell, a lead engineer with Microsoft’s VidLab division, who shared with me the technology that his team uses to handle all the digital content that it is responsible for delivering to Microsoft (s msft) devices via the Zune Marketplace and the Xbox LIVE service.
The general workflow, he explained, is that content providers send video to VidLab; VidLab ingests, processes and compresses the files for download on various types of devices; and then VidLab uploads the files to its content delivery network providers, who serve the content to consumers. As one might expect, however, VidLab doesn’t rely solely on IP transport and some hard drives from the local Fry’s to do its job.
Across the Internet
Because VidLab gets video from partners in its uncompressed form, some form of high-speed transport is key to ensure that VidLab actually gets the video in a timely manner. According to Russell, 90 minutes of uncompressed video can be around 80GB in size, which would take at least several hours to upload. Only VidLab doesn’t just get 90-minute movies — Russell said it gets everything from 30-second trailers to 2.5-hour Ultimate Fighting Championship broadcasts.
That’s part of the reason why most of VidLab’s content providers deliver files using Aspera’s high-speed file-transport technology, which Aspera claims can boost upload speeds into the 700-800 Mbps range. Russell added that Aspera has the security features required by Hollywood studios, which don’t like to risk their content being compromised over the network.
On the way out to CDNs, VidLab utilizes a company called Signiant, which provides what it calls “content supply chain management.” Russell explains the technology as automating workflows as files migrate between VidLab’s various systems and ultimately to the CDN. This lets Russell’s team focus on transforming video files without worrying so much about whether the files are where they’re supposed to be at any given time. He said VidLab compresses most files to between 5GB and 10GB and sends them to CDNs at about 200 Mbps, which takes about 20 minutes for a standard file.
But latency isn’t just an issue over the Internet. As I’ve reported before, network latency is also a huge issue when talking about moving big data within a company’s own data center. To tackle this issue, VidLab keeps a huge pool of high-performance storage systems, primarily from digital-media specialist DataDirect Networks. Russell explained that every piece of video his team receives is immediately sent to an 800TB SAS disk that’s connected to more than 200 processing servers. Once the servers are done converting the file, it moves on down the line. DataDirect claims transfer speeds of more than 240 GBps for digital media files.
VidLab also maintains a 3.5-petabyte data warehouse that Russell said is growing exponentially. Interestingly, Russell said that its orginal data warehouse was only 300TB, but that now its smallest volume of any type is 100TB. VidLab needs such a voluminous storage setup not only because it houses so much high-definition video, but also because it wants everything available to convert to new encoding standards as they come along.
Latency matters when delivering content to consumers, too. Russell said VidLab utilizes a number of CDN providers for redundancy and stability, and explained how the streaming process is designed for an optimal end-user experience. Essentially, he said, any given part of a file is transient on users’ devices, so as long as they’re getting a steady stream from the CDN, they don’t have to wait for an entire download to complete in order to watch their content.
Going forward, Russell said he sees cloud computing playing a big role in VidLab’s operations, although it needs to overcome a few issues. One is security because, as noted above, Hollywood wants to keep its content safe, and the other is the general lack of high-performance hardware and workflow process in the cloud. However, he said, cloud computing is very appealing because it would let video on-demand providers such as VidLab manage elasticly scalable systems and distribute processing tasks across as many or as few servers as needed.
Speed limit image courtesy of Flickr user jpctalbot.