Voices in Data Storage – Episode 30: A Conversation with Andres Rodriguez

:: ::

Enrico Signoretti and Andres Rodriguez discuss how data storage is evolving in the cloud computing era.


Andres Rodriguez is CTO and cofounder of Nasuni, where he is responsible for refining and communicating Nasuni’s technology strategy.

Andres was previously Founder and CEO at Archivas, creator of the first enterprise-class cloud storage system. Acquired by Hitachi Data Systems, Archivas is now the basis for the Hitachi Content Platform (HCP). After supporting the worldwide rollout of HCP as Hitachi’s CTO of File Services and seeing the Archivas team and technology successfully integrated, Andres turned his attention to his next venture, Nasuni (NAS Unified). Delivering value-added enterprise file services on top of cloud object storage was the natural progression of Andres’ cloud storage vision.
Before founding Archivas, Andres was CTO at the New York Times, where his ideas for digital content storage, protection, and access were formed. He joined The Times through its acquisition of Abuzz, the pioneering social networking company Andres co-founded.

Andres has a Bachelor of Science degree in Engineering and a Master of Physics degree from Boston University. He holds numerous patents and is an avid swimmer.


Enrico Signoretti: Ciao, everybody. Welcome to a new episode of Voices in Data Storage brought to you by GigaOm. I am Enrico Signoretti and today we will talk about how data storage is evolving in the cloud computing era. I mean, there are a lot of things happening not just on the cloud, per se, but with distributed enterprises and with the fact that we don’t have the same kind of organization that we had a few years ago. Branch offices are becoming more hedge things and things like that. [Fewer] people [are] working on IT even if IT has more purposes than ever. Today, with me, I have Andres Rodriguez, CTO at Nasuni and co-founder of the company. Hi, Andres. How are you?

Andres Rodriguez: Very well, Enrico. Very good to be on your show.

Andres, thank you very much for joining me today. Maybe we can start with a little bit of background about you and your company.

Sure thing. I’m technical, I have a background in distributed systems. Started my career as a CTO of The New York Times. Then was a CTO at Hitachi Data Systems. Started my own company in object storage which today we call ‘cloud storage’ and then decided we needed a file system for object storage. I created Nasuni around the idea of building a global file system that would be native to object storage which gives you some pretty formidable capabilities; specifically around the stuff you were talking about, which is how organizations have changed and are now global and so they need to support infrastructure that is also global.

Before the cloud, but even at the beginning of cloud, we have to remember that the first service launched by Amazon in 2006, if I remember well, was object storage. Object storage was totally different from any other storage that we had in the past. I mean, block and files because block is very good for your databases. File systems, NaaS systems are good if you access them locally, in a local area network because the product, those usually are very, very - They are not optimized for long distances.

Object storage on the other hand, is accessible from everywhere because – especially S3 protocol which is now considered the standard in this industry is something that you can access with HTTP, so from everywhere. It’s easy. You don’t have specific rules for firewalls and things like that. Theoretically, it’s the perfect storage for the internet here, but you know that better than me that, Andres, there are some limitations of the object storage.

Yes, absolutely, Enrico. I’ll tell you a good story. I built an object storage company and Hitachi – first [as an]OEM and then Hitachi Data Systems eventually bought the company. When they made me the CTO there, they said, what do you want to do? I said, “Look, object storage is the future of storage.” There is no question in my mind that given its scale and its ability to protect the data in a distributed way, all data, all enterprise data will eventually end up in object storage, but without file systems, really IT can do very little with it. It’s great if you’re a developer. It’s great if you’re writing a website, supporting a website like Facebook. All those pictures of cats are stored in object storage, but it’s not good if you need access control. You need versions. You need consistency. You need performance. NFS saves all that stuff. When Amazon launched S3, we were all sitting there trying to convince a very advanced engineering but traditional storage company that the future was: ‘Let’s build giant object storage data centers and build a file system around that and deliver the whole thing as a service to our clients.’

Once we saw Amazon launch S3, we pretty much all looked at each other. We read the Dynamo Papers at the time and we said, “This is pretty much identical to the stuff we just built because you give engineers the same problem constraints, and they’ll come up with some pretty similar stuff.” We had a REST API. We had many of the things at S3 was broadcast into the world at that time. We said, “More big players are going to follow suit. They’re unlikely to be the traditional storage guys” and it’s proven to be that way.

If you look at the leaders in this market, they’re really Amazon, Microsoft, and Google. None of them were in the storage or infrastructure market before. We said, “We are going to build the enterprise class, the global file system that can be portable across those three systems so that we can bring the benefits of object storage into the gnarly world of file systems in the enterprise.

Yeah. There is another interesting fact. I mean, you mentioned the major service providers and what is really interesting to me is that, I think, we started seeing files happening in the cloud two, three years ago. Like an afterthought for many of these providers, they thought that object storage is going to rule the world. Yes, they added block storage because it was needed for their virtual machines in the cloud, but something was missing.

It’s true that when enterprises started to adopt the cloud, they started bringing their workloads and most of their workloads are still based on files. Not only your office workloads, but also application workloads. This created a few issues. Everybody started building file systems, but I don’t know. Many of them look like something – working on top of the object. Something with an object storage interface on the back-end but actually with a lot of limitations: scalability, performance, everything.

Absolutely. We actually tried that when I was doing my company that we ended up selling to Hitachi. You can put NFS or CIFS as a protocol on top of object storage and get by being able to put files through those protocols, but the resulting file system is going to be pretty lousy. Like you mentioned, it’s going to be slow. It’s not going to have real versioning. It’s not going to have the atomic high performance that you expect from real file systems.

The concept for our design is really to start with the object store and build a file system inside the object store where the inodes, the metadata structures that hold the file system are native objects in the object storage. Once you have that image in the object store, synchronizing changes back and forth to something that’s a separate – essentially an edge appliance that’s doing both a protocol conversion and a transformation back and forth when the synchronization happens becomes a much more elegant, much more streamlined process. At that point, you get to match the performance levels of traditional file systems and data centers, what we call NaaS, NaaS arrays, but you get the benefits of object storage. Those benefits are what you mentioned before.

If you have a file system built into object storage, it will scale forever. You can build protection into that file system by taking the same – at the end, what you’re trying to do is you’re trying to remove the biggest limitation in file systems which has been a limited pool of inodes. If you look at what happened when we went from monolithic arrays to distributed file systems, it was all really about trying to bring more capacity and more inodes into the file system.

While object storage can give you an infinite pool of objects, if you map the inodes to the objects, you get infinite inodes. You get every scale constrained removed from the file system. That means you can go millions to billions of files. You can go terabytes to petabytes in capacity. Most importantly, or just as importantly, you can have an infinite number of snapshots or versions of the file system. That means the file system can protect itself, which means you can get rid of all that junk backup that really it can’t support file systems once they get to a certain size.

From my point of view, it’s not only scalability though. I mean, we have changed the way we create, consume, and distribute data. The teams are now distributed across huge distances. Sometimes you have teams working on the same project on different continents. It’s not just the file system because if you think about the file system, even if you have an object storage back-end that is theoretically accessible from everywhere, without a file system that is accessible from everywhere, you miss out on the story.

Very good, very good. That’s actually the third property, which is brand new. That is the difference between a distributed file system which is just a very scalable file system, something like Isilon and a global file system. A global file system is not only scalable, but you can distribute it geographically.

That changes the whole equation from DR and business continuity because all of a sudden you can fail from any data center or branch office to any other geographic location and you have the same synchronized file system to enabling collaboration with end users by offering global shares like CIFS shares that are just like CIFS shares, except they behave globally. They exist everywhere. You have one with edge appliances all the way to very heavy workloads like media workloads or game development; things that require hundreds of terabytes or petabytes of data to be synchronized around the world through this global file system.

That’s what’s exciting is that not only have we resolved some of the issues that pester IT around file system and management and backup and all that stuff, but we’re enabling with the global file systems a whole new array of capabilities that are important to the line of business.

If you can collaborate geographically with heavy file payloads, it means you can move videos around. It means you can move very large data sets around so that multiple groups can work on it. Now you have infrastructure that’s global which is what the companies are. I can’t tell you the number of very large global companies that come to us with their heads of infrastructure, the CIOs essentially saying, “Look, we’ve become global. We’re very successful, but our infrastructure still feels provincial. It works really well in one or two places around the world, but we have dozens of locations around the world where we have important projects going on and they just get the fumes of the infrastructure.” It’s slow infrastructure. It’s hard to get resources. It’s hard to plan, hard to scale. The idea of the cloud is that you can make every location around the world equally resourced without a huge amount of effort and cost.

Also, as you mentioned, this kind of file system, having everything synched in the back-end, in the object store, means also that the front-end is no longer important. I mean, you can lose it. It becomes easier to backup the system or plan for disaster recovery, strategies, these kinds of things. Especially now that we have these teams very distributed. This is not like then the two teams work on the same project in a distance from each other, but sometimes they are in very small offices in the middle of nowhere, and you don’t have IT people managing their infrastructure.

That’s exactly right. That is in general the benefit of SaaS. That’s one of the reasons why companies love to deploy SaaS across their vast organizations because it’s the same level of service for everyone. In the past, SaaS as a service has been limited to just software applications that don’t require a whole lot of interaction with the end users. The cloud is changing all of that.

It’s not just Workday and Salesforce that are now SaaS offerings. Infrastructure is now a SaaS offering and applications, full application stacks, can be SaaS offerings as well. That gives you a lot more flexibility when... a couple of trends that are important that typically go out with a cloud architecture when organizations are beginning to change.

We’ve seen in the last ten years a real evolution from a cloud, what? We just want to be educated. We don’t really want to do much with it. Maybe put some backups in it to give me a cloud option for everything I want to do. To make cloud option the first option we consider before buying more hardware which is where a lot of organizations are today to ‘cloud only.’ In other words, we want to take ourselves out of the data center business altogether.

Yeah. I totally agree with you. In all this dream of the cloud and having a file system distributed all across the world, there is a small issue. For example, I live in Europe. We have a lot of regulation around GDPR for example [on] data privacy, but not only that, in some countries you can’t move the data outside the country, so data sovereignty, these kinds of things. I think sometimes that everything is cool, but if you don’t have the tools to manage all of this, with policies, with the necessary rules to manage your data, then it becomes a nightmare.

You’re absolutely right that you need tools to be able to manage the compliance. There are two misunderstandings about cloud that are important to clarify when you’re trying to set up your cloud strategy, because I think a lot of people understand the benefits: scale, cost, global reach. What I think is misunderstood is the cloud is not like an ethereal magical place that exists in the actual clouds. The cloud is in data centers all around the world, physically in countries, in locations that are just being run as services by these giant service providers.

Let me give you an example when it comes to compliance. We have many clients that will have a private object store and deploy the large majority. They’ll have petabytes in this object store. They’ll be running all their file systems happily with private object stores that happen to be in North America.

Then they go to Europe and they want to be able to provision the storage and be compliant in Europe, where say in Germany, you can’t get the data out of Germany. Rather than set up a complete object storage stack in Germany themselves, they’ll go to Azure or they’ll go to Amazon AWS and use a local resident data center from those providers in those countries and be able to meet the requirements, the country requirements with public cloud.

It’s very important to understand that public cloud just gives you a very large menu to choose from in terms of how you actually localize the data. Like you say, Enrico, it’s very important to have the tools that allow you to do the management of being able to do that, but the physical plant is there and you don’t have to build it. As an organization, as a customer, you don’t have to build it.

The other misunderstanding about cloud is that you need incredible network infrastructure to get to the cloud. The opposite is true. You typically need a lot of network infrastructure when you’re doing say traditional backups. That’s why you see a lot of customers consume their MPLS networks with backup streams. When you go to the cloud, the cloud systems are designed to work in streaming modes, which is far more efficient than the batch type transfers, the typical backup or typical replication attempts to do. On one acceleration say that they’re horrible. CIFS NFS where one is just a nightmare for end users. By adopting a cloud architecture, you can actually go much farther in places that have much smaller pipes and then have public pipes because the cloud is designed to be secure from the edge to the cloud without needing to have all these additional pieces of infrastructure to provide security.

We see a lot of clients that are going from MPLS to SD-WAN and benefiting from it and they want it because they want to get to the cloud faster from any location that they’re in the world and there’s a lot more availability of just straight internet pipes than there is this complicated expensive, hard to manage MPLS networks.

We have clients that are mining for aluminum and they will be in the Amazon jungle and have one, two megabit pipes. Backup as you can imagine with that kind of infrastructure, the file server backups were crushing them and were impossible to actually achieve. With a cloud infrastructure, you’re just synchronizing all day long. Even though it’s infrastructure, it feels a lot more like the way Dropbox feels on your laptop. As soon as it’s got a little bit of air, it just reaches out and synchronizes, synchronizes, and everything is always up-to-date at your edge location.

Those are the two concepts that I think are very important. The cloud is everywhere, really physically almost anywhere in the world you need it. The cloud can run with various degrees of network infrastructure that can reach.

Let me play the advocate here for a moment. We’re talking about file systems and we all know that you have a global file system, but why not use sync and share then? Some of the features look similar. Now at the end of the day, you have your users accessing files remotely and they share the same vehicle volume somehow.

Right. There’s two reasons for it. One is: there is tremendous architectural advantages to having a local cache distributed out to the sites where a lot of users are accessing the same files. By being able to localize the storage closer to where the users are having all their interactions, you’re gaining a tremendous performance advantage. That’s the benefit of bringing the cloud into the on-premises sites where the actual end users are sitting and doing their work.

The other benefit, and this is the one that organizations realize, they’ll move their home directories to OneDrive or to Box or to something like that and everyone is super happy. Then they’ll try to do what they do with shares with those SaaS applications and everyone immediately gets very cranky. What happens is, there is a ton of glue. There is a ton of infrastructure in terms of how applications talk to each other. Links in Excel documents that are all predicated on links through the file system, through a share file system that all the users can see as the same share-file system. All that breaks.

There is the need when you’re in the enterprise to scale well beyond the capacity of what you can put in any one user’s workstation or laptop. The moment you get into hundreds of terabytes, you’re not going to have that accessible to any end user and be able to do that at scale. What happens is, you really benefit from having file servers. Things where all of that power, all of that skill is being aggregated so that the users can consume it in a shared mode.

It’s very important not to confuse a SaaS product, which is what essentially Dropbox, Box, OneDrive is [with] infrastructure. Infrastructure for everything we’ve come to hate about it, we depend on it. It’s one of those things that, why do people hate infrastructure? Because it’s hard to plan for it, because it’s complex to run. There are many things that are difficult about running, whether it’s NaaS or SAN storage or just on the infrastructure level. However, we depend on it.

There is no way that you can run an organization without an organization’s file system or NaaS any more than you can run virtual machines without a SAN infrastructure of some sort, a block infrastructure of some sort. Infrastructure is necessary and is not to be confused with SaaS.

Let’s take a few moments to talk a little bit more about Nasuni then. So far, we’re talking about a global file system with an object storage back-end. We have already an idea on what you do. Maybe you can go a little bit more in the details and explain how it works actually.

Sure thing. Let’s start at the edge. At the edge, you deploy these Nasuni edge appliances and they are, for all intents and purposes, very similar to what any enterprise class NaaS is. It has CIFS. It has NFS. It blocks into AD. The goal of those edge appliances is on the front-end to not change anything about the way IT delivers file services into the organization. Again, because of that need to keep the links, that need to keep infrastructure the same so that everything that plugs today into it can continue to plug into it.

There [are] two massive differences though, with these appliances. First of all, they are compact. Even if you are handling a file system that’s in the hundreds of terabytes or even petabytes, the appliance itself can just be a handful of terabytes. Now it needs to be high performance terabytes because that’s the way that you’re delivering that high level of IOPS that your end users come to expect. These IOPS can be delivered from a virtual machine because people know how to run high performance storage from VMs and delivered out to VMs. That means that everything you come to completely depend on for virtualization, you can leverage with this model at the NaaS layer.

The second piece is that the appliances themselves are all integrated into a common control plane that is actually giving you central monitoring, central configuration to all of them. As I mentioned, they can be deployed anywhere you have a hypervisor. You can deploy them on all your on-premises locations. We have many clients that will deploy on UCS around the world, but they can also be deployed in the cloud, which means you can have say, a disaster recovery site somewhere far away from your headquarters.

Say you want a recovery site for Bangladesh. You can have a local recovery site provided by AWS or Azure because like I said before, the cloud is physically in many, many places around the world that make it very convenient for you to deploy access to your file systems there. The result of all this, as the appliances are all working away, the file systems is being formed in the object storage layer. If it’s Amazon, it’s in S3, as we mentioned before. If it’s Microsoft, it’s going to be in the Azure Blob Storage.You’re going to get access to those same files, that same file system from all of the locations.

Just like you do today, you’re not going to have one file system, you’re going to have many file systems to cater to whether it’s compliance needs or just for management reasons. You want to partition it, but unlike what you have to do today, you won’t have to back it up. You’ll never going to run out of space with it and you’ll never going to have to deploy another file server because you’re out of inodes, you’re out of resources for what one file server can take because you’re consuming the unlimited resources of the cloud, of the Amazon or the Microsoft or the Google object store core.

That’s what we like to call that architecture. In the market, we call it an edge core architecture because you’ve basically taken all of the problems with management, scale, availability into the core and you’re still leaving that edge to just deliver high performance and edge availability, nothing else.

In this architecture, I mean, you remove all the complexities at the edge, but actually these devices become expendable in the sense that if you…

Exactly. Yes. We’d like to think of it almost like a smartphone. You lose your iPhone, it’s not – yes, you’re sorry you lost it. In terms of the data…

The iPhone's cost today, it’s a B-plus.

That’s an interesting analogy. Think about how we used to think about phones especially when phones started having data in [them]. Every time you got a new phone, it was a hassle because you had to get the data somehow from the old phone to the new phone. That’s a situation that most organizations find themselves in when they’re trying to migrate from that NetApp array that was taught to you three, five years ago to this year’s version. You have to do this bulk migration and professional service and all this nonsense.

In the cloud model, the moment you want to replace that appliance at the edge, you want to spin up a new appliance, you just resynchronize like you do with your phone today to the core services. Everything is handled behind the scenes. There is no ‘state’ in the edge appliance. There is nothing to go get from the edge appliance. It’s actually the same thing that allows the global file system to exist because the state of the global file system, it’s in the cloud core object store. The appliances are just constantly synchronizing against that shared common image of their file system.

Another advantage here is that it’s much easier to migrate data or enable the data mobility between on-prem and the cloud because you just deploy one of these appliances in the cloud so you can migrate data there to be whatever you need.

Yes. I’ll tell you, one of the very cool things that we’re doing is... remember the file system ends up in the cloud. That happens because that’s a place where you can scale and you can protect the data and you can distribute the data geographically from, but once the file system is in the cloud, it’s logically centralized in the cloud. That is, you have a logical handle on your corporate file systems in the cloud.

You had mentioned GDPR. We have a new series of services coming out from Nasuni that basically allow you to plug in, for instance, GDPR engines that look at the file system in the cloud or with your own encryption keys, your own access control, and plug directly into say, the AWS GDPR compliance engine. What that’s doing is that’s basically scanning. It could be scanning billions of files, hundreds of terabytes, petabytes of data in the cloud outside your infrastructure at a speed that is unimaginable in traditional array infrastructure. It’s because you’re looking directly at a cloud file system.

This is one of our goals as a company. For years, we’ve sold this basic infrastructure. Companies that are basically backup and moving their files around DR, business continuity. Our customers are now asking for better insights into their data. Instead of trying to give them some kind of analytics tool within the Nasuni infrastructure, we’re plumbers. We’re a file system company. We are giving them connectors so that they can now bring their data to the ‘best in class’ analytics tool; essentially transforming what has been traditional NaaS and file system in the enterprise into big data that you can access with the cloud analytic tools.

Yeah. That’s very clever. Last question that I have is around licensing because we’re talking about clients or everything as a service, a subscription. How [does] it work for Nasuni? You have two companies, the software part and the hardware appliance.

More and more we are just all in the software appliance business, but our clients have always wanted to have an option that – by the way, we support every hypervisor in the market and the other trend that pretty much I talked about as the one before, the other change that companies are typically undergoing when they bring in Nasuni is hyper-converged.

If you’re thinking about how to deploy a simple stack across all your locations where you just want to run VMs, the last thing you want is a very large VM full of files. They’ll deploy a compact Nasuni virtual machine on top of their Nutanix or the UCS and run their NaaS that way, run their file services that way. However, in some situations, you don’t have IT staff. You don’t have any way to support a hypervisor that’s far away. We have a special OEM program that allows our clients to access an appliance that has no hypervisor. It’s just bare metal Nasuni code and runs that way. The virtual machine gives you a lot of advantages, a lot of things for free including resizing dynamically, high availability.

All of our advanced features are really meant to run as software only. We’ve hardly touched on this, Enrico, but I think one of the major trends on what’s happening that the cloud is a big part of, is: ‘software is eating the data center.’ It’s eating the world and it’s eating the data center. Every single part of the stack in infrastructure is being transformed into just software with no hardware dependencies. The cloud is the ultimate ocean of resources for being able to deploy software tools because you basically have no limitation on the hardware. That’s all being managed behind the scenes. Our clients all want to go do software. They all want to do orchestration. They want to automate. They want to access everything through APIs.

The last thing they want is to run into any physical limitation or dependency when it comes to their infrastructure. You can see that what’s happening is, for the more conservative side of the house, they’re doing private hyper-converged, very large though, infrastructure deployment. Then the guys are thinking five years in the future are already going to the cloud and deploying cloud only. Deploying their entire data center, virtual, in the cloud which is very aggressive for today’s standards, but it’s absolutely the way things are going to go. You want to pick tools that make software infrastructure possible. We are a software-defined NaaS and as such, we give you a strategy that allows you to go all the way from that virtual machine, on-premise today to that virtual machine in a pool of virtual machines in the cloud tomorrow.

This conversation was very useful I think and thank you again for the time that you spent with me. Last thing that I want to ask you to wrap up the episode is, where we can find more about Nasuni, find some documentation, and maybe follow-up on Twitter or other social media to continue this conversation.

Absolutely. Nasuni.com has everything you want to learn from us about and it’s wonderful. By the way, that comes from NaaS, N-A-S and UNI which means unify. Bring in all those headaches and all the potential of object storage into one integrated system. It’s where the name comes from. Yes, Nasuni.com. You can find everything, technical papers, and our blog because I blog in there. There’s lots of content there that’s very informative.

Okay. Thank you very much, Andres. Bye-bye.

Interested in sponsoring one of our podcasts? Have a suggestion for a great guest? Please contact us and let us know.