Web Infrastructure

Mesosphere raises $10.5M to push virtualization à la Google

Inspired by Google’s famous approach to resource management, Apache Mesos is the open source software that manages the large pools of servers and cloud instances at companies such as Twitter and Airbnb. Mesosphere, a company trying to commercialize it, has raised $12.75 million since launching.

Twitter details its Manhattan real-time database

Twitter has released some details on its Manhattan database system, which was built to power a wide variety of applications that existing technologies can no longer handle. Twitter handles thousands of tweets per second, which means speed and scale are critical.

Netflix open sources its data traffic cop, Suro

Netflix has open sourced a tool called Suro that collects event data from disparate application servers before sending them to other data platforms such as Hadoop and Elasticsearch. It’s more big data innovation that hopefully finds its way into the mainstream.

Google spent a billion on infrastructure last quarter

Google spent more than a billion dollars on infrastructure in the fourth quarter, representing the company’s second-biggest quarterly expenditure ever. As it competes against Facebook, Apple, Yelp and Amazon, the company can’t afford to stop building data centers now.

Can a new database help get Zynga back on track?

Zynga has deployed nearly 100 nodes of MemSQL, the hot new database from two former Facebook engineers. It might not be a magic pill for Zynga’s woes, but it could help the company boost revenue and even build new types of games.

A peek inside China’s internet giants and their massive scale

China’s big four internet companies are big — huge, in fact — but they’re not yet technological innovators like their American counterparts. However, scalability is an an issue that knows no borders, which has spurred some cross-continental cooperation. Will it also inspire a Chinese tech awakening?

Hacking hardware isn’t just cool — it’s also good business

When companies such as Google and Facebook design their own servers, switches and data centers, it’s more a business decision than it is a test of their hardware-hacking skills. Custom gear means lower power bills, better performance and the flexibility to adapt to unforeseen situations.

For the future of big data startups, look to Facebook

Facebook knows something about big data — it collects more data and has built more tools than almost anybody else. Here, Facebook’s Jay Parikh and Accel Partners’ Ping Li talk about what lessons big data startups can take from Facebook to build businesses that can succeed.

Quantcast releases bigger, faster, stronger Hadoop file system

It’s not for everyone, but if you’re storing petabytes of data Hadoop, Quantcast thinks it has the cure to your woes. Its newly open sourced Quantcast File System promises smaller clusters and better performance, and it has proven itself over exabytes of data inside Quantcast.

Balancing Oracle and open source at Orbitz

Orbitz has transitioned a major system off of Oracle’s Coherence database and onto the NoSQL Couchbase Server, but the database giant still has a significant footprint in Orbitz’s data centers. It’s all part of being a big company trying to roll with the IT punches.

GoDaddy: ‘We weren’t attacked.’

Hosting giant GoDaddy has completed its investigation of Monday’s outage and deemed it was not the result of a DDoS attack as originally rumored, but rather the result of network failures within GoDaddy’s system. The outage crippled hundreds of thousands of web sites.

Etsy unveils its infrastructure (and its Supermicro love)

Etsy shared the details of its hardware architecture on Friday, showing the world a whole lot of Supermicro servers running everything from web servers to Hadoop. At this point, software is the name of the game at webscale, so hardware openness is just welcome community service.

Netflix open sources cloud-testing Chaos Monkey

Netflix has open sourced Chaos Monkey, a service designed to terminate cloud computing instances in a controlled manner so companies can ensure their applications keep running when a virtual server dies unexpectedly. In the past year, Chaos Monkey has terminated more than 65,000 of Netflix’s instances.

Why Twitter should open up about its infrastructure

If Twitter wants to remain opaque about its practices, that’s fine — but it shouldn’t expect any slack from upset users or investors. Blaming a two-hour outage on an “infrastructural double-whammy” after remaining mum on even where its data centers are located doesn’t exactly inspire confidence.

Netflix: ‘We’re bullish on the cloud’ despite outage

Last week’s AWS outage outage might have outsmarted Netflix’s Chaos Monkeys, but the content-distribution giant isn’t about to turn its back on cloud computing. It was a relatively small blip in what has been better availability since the company made the move entirely to the cloud.

AOL building refrigerator-sized data centers

AOL is taking its flexible infrastructure strategy to a whole new level of flexibility by building data centers that are about the size of French door refrigerators. Now, AOL will be able to deploy infrastructure where needed with little more than an electrical outlet required.

Why Netflix’s CDN should scare the storage industry

Lest storage vendors thought they were immune to disruption that open source hardware is having on the server industry, Netflix’s new Open Connect content-delivery network might make them think again. It’s inspired by open source storage designs first released by Backblaze almost three years ago.

With $42M more, 10gen wants to take MongoDB mainstream

10gen, the creator and commercial entity behind the popular MongoDB database has raised another $42 million and wants to take the technology to an application near you. The money will help 10gen double down on research and development to make MongoDB live up to its hype.

Why 900M isn’t the only number that matters to Facebook

Facebook’s hyperinflated valuation heading into its IPO has everything to do with its promise, and very little to do with its actual profits. Here are some numbers we know about Facebook’s infrastructure that speak to its promise perhaps as much as its 900 million users.

Did Yahoo sow the seeds of its own demise with Hadoop?

As the world once again starts analyzing Yahoo’s myriad woes after Sunday morning’s ouster of embattled CEO Scott Thompson, I’m left wondering if its investment in Hadoop didn’t aid in the company’s demise, even if it’s a way down the long list of Yahoo’s mistakes.

Facebook’s delicate balance between profits and privacy

If Facebook really is overvalued leading up to its IPO, privacy might be the underlying cause of the company’s missed expectations. As it turns out, pleasing both investors and users isn’t easy for a company that relies heavily on advertising and personal data.

Worried you’ll outgrow the cloud? You’re not alone.

If you think about it, Netflix’s metamorphosis into a company that runs its infrastructure completely atop cloud-based resources is truly remarkable. For many companies, such as site-optimization and CDN provider Yottaa, the bigger they get, the harder it is to justify the cloud’s cost and performance.

Why Instagram is likely moving on from Amazon’s cloud

Instagram is pretty proud of the infrastructure it built atop the Amazon Web Services cloud, but does an acquisition by Facebook mean goodbye to Amazon? If I were a betting man, I’d say there are already some engineers working really hard to make that happen.

How OMGPOP scaled to 36 million users in three weeks

OMGPOP can thank the cloud for its acquisition by Zynga on Wednesday. The gaming startup, whose Draw Something iPhone used cloud computing and a NoSQL database to scale from zero (relatively speaking) to more than 35 million downloads in three weeks and never miss a beat.

How Facebook made it possible to geo-tag everything

It all seems so easy: You log into Facebook, update your status, tell everyone where you are and — voila! — your Timeline is geospatial. Only, while it’s just one extra step for you to add location, building that capability was a tad more complicated for Facebook.

How Twitter is doing its part to democratize big data

Twitter has been on a tear lately when it comes to open sourcing big-data tools. The latest two are Cassie, a client for managing Cassandra clusters, and Scalding, a MapReduce framework for simplifying the creation of Hadoop jobs. Big data won’t be black magic forever.

Dropbox bought Cove to help it grow like Facebook

With more than 45 million users already connected to his company’s cloud storage service, Dropbox CEO Drew Houston knows he has an infrastructure challenge ahead of him. That, Houston says, is a big reason Dropbox bought Cove, which brings with it Facebook engineering cred.

Investors and users beware: Facebook is all about IT

Facebook’s S-1 filing shows the company is all about infrastructure. The ad revenue and user experience it relies on to exist mean Facebook can’t afford to take it easy on IT, which means shareholders and users will both find plenty of reasons to get upset.

Amazon’s DynamoDB shows hardware as means to an end

Somewhat lost in the greater story of Amazon Web Services’ new DynamoDB NoSQL database is that the new service runs atop a solid-state storage system. By abstracting those SSDs behind a NoSQL service, AWS is trying to prove that hardware presents greater opportunities than Infrastructure-as-a-Service alone.

NoSQL’s great, but bring your A game

At last week’s MongoSV conference in Santa Clara, Calif., a number of users shared their experiences with the MongoDB NoSQL database. One common theme: NoSQL is necessary for a lot of use cases, but it’s not for companies afraid of hard work.

Facebook speeds PHP development again with HipHop VM

Never content with good enough when it comes to speed, Facebook has taken its open-source PHP-boosting HipHop technology to the next level for programmers. With the new HipHop Virtual Machine, Facebook claims it has improved upon HipHop interpreter performance by 60 percent.

Facebook shares some secrets on making MySQL scale

Facebook held a Tech Talk on Monday night explaining how it built a MySQL environment capable of handling everything the company needs in terms of scale, performance and availability. Based on what I heard, it looks like critics of Facebook’s MySQL environment might be wrong.

How Etsy handcrafted a big data strategy

E-commerce site Etsy has grown to 25 million unique visitors and 1.1 billion page views per month, and it’s generating the data volumes to match. Using tools such as Hadoop and Splunk, Etsy is turning terabytes of data per day into a better product.

Could Facebook be your next software vendor?

Over the past couple years, Facebok has released details on a number of its internal efforts to automate and simplify the management of its massive infrastructure. As reliance on web applications and cloud services become more common, Facebook’s tools and technologies could be a cash cow.

Quarterly Wrap-up

Infrastructure Q3: OpenStack and flash step into the spotlight

Last quarter we highlighted the fast maturation of the Platform-as-a-Service and big data spaces. Those two trends only picked up speed during the third quarter of 2011. Joining them on the cusp of IT greatness, though, are the OpenStack project and flash storage.

How business taught scientists about big data

Traditionally, scientists and researchers develop the latest and greatest techniques in computing, which trickle down corporate data centers where they’re relevant. But with big data — the process of analyzing voluminous quantities of data in new, unique ways — it’s industry that’s driving the innovation ship.

Twitter’s ever-changing infrastructure story

Earlier this year, rumors swirled about whether Twitter had actually moved into a new Utah data center, or if it was forced to move its operations to a different facility. Now there are reports that Twitter is leasing more data center space, this time in Atlanta.

How FBAR keeps Facebook online automagically

When you’re running a large web infrastructure, automation is critical to ensure that administrators aren’t spending their every waking seconds putting dealing with downed servers. Google, Yahoo and other pioneers had to figure out how to automate failover in their data centers. Now it’s Facebook’s turn.

10gen raises $20M for MongoDB in maturing NoSQL space

MongoDB-based startup 10gen has raised $20 million in a Series D funding round. The latest round speaks to the popularity of the MongoDB document database among large companies, even though the hype around NoSQL has lessened considerably over the past year.

Cotendo using Equinix data centers to host CDN services

Cotendo is leveraging Equinix’s global data center footprint to give itself 30 points of presence, letting Cotendo focus on differentiating elsewhere. The companies released details of their partnership Thursday morning, including a quadrupling of Cotendo’s customer base to 400 from 100 in the past two years.

5 strange but true concerns for keeping Google online

Google, which serves about 7 percent of the world’s overall web traffic, isn’t any ordinary company. Google Research Director Peter Norvig recently shared some of the considerations that Google takes into account when designing its infrastructure and systems to operate at Internet scale.

How Facebook moved 30 petabytes of Hadoop data

For anyone who didn’t know, Facebook is a huge Hadoop user, and it does some very cool things to stretch the open source big data platform to meet Facebook’s unique needs. Today, it detailed how it migrated its 30-petabyte cluster from one data center to another.

Nginx creator launching company based on popular web server

Nginx creator Igor Sysoev is planning a company based around the wildly popular open-source web server. Sysoev announced the decision on the Ngnix blog Monday morning, writing that the commercial entity’s primary goals will be better support and more consistent feature releases.

Why Google spent almost a billion on infrastructure in Q2

Google spent $917 million on infrastructure during the second quarter, continuing an upward trend that helps ensure new services like Google+ keep running. It’s the eight consecutive quarter of increased capital expenditures for Google, which is now spending at near-record levels.

The server architecture debate rages on

Big processors or little processors, scale-up or scale-out, on-premise or in the cloud: the answers might not be as easy as one would think. Web-style, scale-out architectures, low-power server processors and cloud computing are getting more attention by the day, but they have their limits.

Facebook trapped in MySQL ‘fate worse than death’

According to database pioneer Michael Stonebraker, Facebook is operating a huge, complex MySQL implementation equivalent to “a fate worse than death.” It’s actually a predicament all too common among web startups, for which the solution might be a class of databases referred to as NewSQL.

Meet the next big programing star: Node.js

Sometimes, the geeks come out of the shadows and hit the mainstream consciousness. Remember the early ’00s and the rush of publicity for Ruby and Ajax as they became the calling cards of Web 2.0? Node.js looks like the next candidate for such mainstream elevation.

GenieDB designs around CAP to scale cloud apps

Among the biggest problems with developing applications for the cloud is scaling the database layer. GenieDB, a competitor in our recent Structure 2011 LaunchPad competition, wants to give its customers the benefits of both SQL and NoSQL to scale across data centers.

Hadoop may be hot, but it needs to be useful

Hadoop is a very valuable tool, but it’s far from perfect. While Apache, Cloudera, EMC, MapR and Yahoo focus on core architectural issues, there is a group of vendors trying to make Hadoop a more-fulfilling experience by focusing on business-level concerns such as applications and utilization.

Can Google App Engine compete in the enterprise?

At Google’s I/O event last month, the company announced new features and a new pricing model for its App Engine PaaS offering, and now the web giant thinks it’s prepared to compete with companies like Red Hat and Salesforce.com in bringing enterprise users to its platform.

The web’s watchful eye fixes on Apple’s cloud gear

When Steve Jobs flashed inside images of Apple’s new cloud data center during his WWDC keynote on Monday, he ignited a mini firestorm of speculation about just kind hardware is filling its immense surface area. Everyone seems to agree that HP and Teradata were big winners.

Apple launches iCloud; here’s what powers it

Apple officially launched its much-hyped iCloud suite of services at its Worldwide Developer Conference today, and although the capabilities are sure to be the talk of the town, it’s Apple’s cloud infrastructure that makes it all work. Steve Jobs said as much during his WWDC keynote.

Facebook’s PHP Codebase: It’s Complex

Facebook today published an interesting visualization of just how complex its codebase is. Actually, the visualization is part of an application within the company, but it gets the point across: Making code changes is no small feat when every module is dependent on so many others.

How Facebook Brings a New Data Center Online

According to a post today on the Facebook Engineering blog, the social networking leader undertook an effort called “Project Triforce,” which involved provisioning a replica production region from an existing cluster, to ensure the site could run smoothly across three regions without falling on its face.

Can Groupon Compete with Facebook in Hadoop Hiring?

The most interesting part about yesterday’s announcement that Groupon is using the Cloudera Distribution of Hadoop wasn’t the actual use but, rather, the insight that Groupon is “building a world-class infrastructure” of which Hadoop will be a key part. But recruiting big-data-savvy talent is getting rather pricey.

Nutanix Gets $13.2M for Google-like Storage Architecture

Nutanix startup that sells an appliance combining computing and storage on the same nodes, has raised $13.2 million. The company is developing an appliance combining computing and storage on the same server nodes, a story that should resonate with customers concerned with scalability and performance.

Node.js and the JavaScript Age

We decided to rebuild our dashboard framework in server-side Javascript, using node.js. This decision was driven by a realization: the LAMP stack is dead. In the two decades since its birth, there have been fundamental shifts in the web’s make-up of content, protocols, servers, and clients.

DataStax Shakes Up Hadoop with NoSQL-Based Distro

NoSQL startup DataStax officially entered the pantheon of Hadoop providers today, introducing its own distribution called “Brisk.” Brisk utilizes the open source NoSQL database Cassandra as a replacement for Apache’s Hadoop Distributed File System, as well as Cassandra’s built-in MapReduce engine and Hive.

Can Data Centers Help Cure Urban Decay?

Is it possible that the ever-increasing demand for data center space could be the cure for vacant commercial real estate plaguing cities? In some areas, data center operators are buying up vacant real estate to house new operations and helping revitalize those areas as a result.

NoSQL Startup Basho Raises $7.5M for Riak

Lost in the wake of Membase and CouchOne merging to form Couchbase, and far away from Silicon Valley, Boston-based NoSQL startup Basho has raised $7.5 million for its efforts to commercialize the Riak NoSQL database, according to a report in Mass High Tech.

NoSQL Consolidation Begins: Membase Buys CouchOne, Forms Couchbase

NoSQL database startups Membase and CouchOne have merged to create Couchbase, a company that will combine Membase’s memcached-based Membase Server and CouchOne’s CouchDB-based products into a family of NoSQL products. Other NoSQL vendors need to broaden their scope if they want to compete against Couchbase.

Is Massive Infrastructure Always an Asset?

Myspace’s gradual decline and a recent blog post have me wondering what the flip side is of rapid scaling. I wonder what social-media sites do with their expansive infrastructures once they no longer need them to meet high demand. They can’t just scale them back, right?

Infrastructure Key to Google’s No-Downtime Guarantee

Google blogged this morning about a new no-planned-downtime for Google Apps, a promise it’s able to make because of its globally distributed infrastructure estimated at more than 1 million servers. Google’s expansive infrastructure gives it multiple options for migrating workloads during planned downtime.

8 Cloud Companies to Watch in 2011

In the case of the following companies (and one open-source project) — ranging from Cisco to Twitter — I think that although they made lots of headlines in the past year, the true effects of their actions won’t be realized until later this year.

For Facebook, Now Is the Time for Infrastructure Spending

Depending on how Facebook intends to evolve, both performance considerations and data privacy laws might make additional infrastructure investment a good idea. Regardless of its rationales, however, the time to do so is now — before the company goes public and must answer for every dollar spent.

9 Companies That Drove Cloud Computing in ’10

From ARM Holdings to Facebook to VMware, and whether via acquisitions, innovation or challenging the status quo, many vendors were able to effect paradigmatic shifts in computing or otherwise leave indelible marks on enterprise IT by what they did in 2010.

Dec. 20: What We’re Reading About the Cloud

Today’s links focus on the importance of infrastructure in building reliable services. We have Tumblr investing in a new data center, KT building a cost-efficient cloud and Citrix’s Simon Crosby telling why private clouds could have helped prevent the Wikileaks debacle on all fronts.

Clustrix Gets $12M More for Scalable SQL

Scalable SQL startup Clustrix has closed a $12 million Series B round of funding, bringing its total to $30 million. The new money came from existing investors U.S. Venture Partners, Sequoia Capital and ATA Ventures. Considering Clustrix’s steady momentum, this funding shouldn’t take anybody by surprise.

Nov. 29: What We’re Reading About the Cloud

Today’s news underscores my feeling that 2011 will be a huge year for cloud computing. Aside from CloudBees’ funding, we have Mellanox buying Voltaire, rumors of Oracle buying Salesforce.com, Enomaly continuing to push cloud brokerages, and questions about whether Intel’s Open Data Center Alliance can succeed.

Nov. 23: What We’re Reading About the Cloud

Today’s links demonstrate that there’s a long way to go before we have issues like cloud computing and web infrastructure figured out, but also that we’re making progress: Twitter teaches lessons on scaling, Google runs test queries, and IBM Research is tackling cloud privacy.

Yahoo Open-Sources Real-Time MapReduce

Yahoo has open-sourced its S4 project for developing real-time MapReduce applications. As we’ve seen with Google’s new Caffeine infrastructure for its Instant Search features, there is a growing trend of unchaining large-scale data analysis from its batch-processing roots.

Quarterly Wrap-up

In Q3, Big Data Meant Big Dollars

If the third quarter told us anything, it was that IT M&A activity is alive and well, particularly in the big data space. When a company can store or analyze large amounts of data with any degree of innovation, a larger vendor is likely to eye them up for an acquisition — even if it means paying premium. 3PAR, Netezza, Greenplum, Storwize, Ocarina Networks and ParaScale all found new homes during the past few months.

Hey Shareholders, Capex Means Cash in the Cloud!

Om’s post about Google’s spending got me thinking about the hypocrisy in the way we assess web companies’ decisions to splurge on infrastructure. Startups are praised for spending on more infrastructure, while public companies feel the wrath of financial analysts when they do the same.

Needed: Infrastructure to Make the Web Personal

The web is becoming more dynamic, context-aware and personalized by the day, and the amount of information consumed by each person is increasing exponentially. But software infrastructure is not keeping pace. We need to develop data processing architectures that go beyond technologies like memcache, MapReduce, NoSQL.


The New Net-Neutrality Debate: What’s the Best Way to Discriminate?

Supporters and opponents of the Federal Communications Commission’s proposed net neutrality rules achieved a rare moment of agreement Thursday. Speaking at a panel discussion organized by the Washington, D.C.-based think tank Arts + Labs, Public Knowledge Legal Director Harold Feld, a strong proponent of the regulation, acknowledged that complete neutrality toward all bits on a network can never be achieved and should not be policymakers’ goal. Where Feld parted company with opponents of regulation was on the question of how to decide which bits are more equal than others. That question has quickly emerged as the crux of the debate since the FCC made clear that its definition of net neutrality includes allowances for “reasonable network management.”

Gear6's Web Cache Makes Web Scalability Easier

Gear6 today released Web Cache in an effort to commercialize the Internet’s predominant (de facto, for Linux) distributed caching protocol, memcached. Every…