Today's leading minds talk Data Storage with Host Enrico Signoretti.
Iikka Niinivaara has worked the past eight years with RELEX Solutions in various roles, currently Head of Scale-Out Architecture Group. As a Systems Architect, Iikka uses RELEX’s proprietary in-memory analytics platforms to provide scale-out features and modernization of customer’s systems. Iikka has been an avid programmer for almost two decades, handling a variety of languages and projects and has played a vital role in developing new features in RELEX’s proprietary analytical in-memory database. Currently he resides in Frankfurt Germany.
Enrico Signoretti: Welcome everybody. This is Voices in Data Storage brought to you by GigaOm. I’m your host, Enrico Signoretti, and my guest for this episode is Iikka Niinivaara. Iikka is a distributed system architect at RELEX solutions. RELEX is a leader in retail planning solutions with customers all across the globe. And in this role Iikka is responsible for the vision and execution of scale-out features in the RELEX proprietary in-memory analytics platform. He identifies a path to upgrade existing product with minimal disruption and successfully helps to implement the company's private cloud platform designed around these new scale-out features.
Now he is overseeing the transition and the finalization of the initial feature set and first production rollout. Today we will talk about RELEX and its two-tier based storage system that integrates in memory and objects storage together, quite a radical architecture, I would say, and you probably already know how much I love your storage architecture. Hi Iikka and welcome to the show.
Iikka Niinivaara: Hi Enrico. I'm really happy to be here.
Me too. Thank you very much for taking the time for recording this episode.
I'm quite excited about your work at the RELEX, but maybe I missed something in the introduction. So if you can give us a little bit more info about you, your job and RELEX. That would be very helpful for our listeners.
Yeah sure, why not? I've been with RELEX for… well, this summer it's going to be nine years, which is quite a long time in this industry. At RELEX basically what we do is forecasting and replacement for retailers which—in a nutshell—means for example, making sure that the grocery store chain doesn't run out of milk, but also doesn't need to throw any of it in the trash. I work here at RELEX as a distributed systems architect like you mentioned. Nine years ago I started as a full stack developer and then moved on to working mainly on our in-memory platform and now in this architectural.
RELEX as you mentioned is in forecasting and planning for retailers, and you have customers all across the globe right?
Correct. We have customers of all sizes. Last year in 2018 we really broke through into Tier 1 largest retailers. We signed with the top five retailers in the world. So that's where the focus is. When you build a company you often have to start small and not go for the big fish right away. We still care for our older, smaller customers. They are really important as references even for these bigger ones.
Oh yes and and indeed knowing that your infrastructure can support the smaller and the bigger customer. It's very very important I would say.
So what about your infrastructure? Describe how RELEX deploys a solution to the customers?
We primarily deploy in what we call our RELEX private cloud. Basically we rent colocation space in two data centers here in Finland and two in the United States, where we deploy on our own hardware and then offer the software as a service to the customers. We are also able to do ‘on prem’ installations for customers that really require it. It's still important, well with big enterprises in certain markets.
So mainly a SaaS application for a customer that requires to have it on their premises, you can do it as well?
Okay. And as we mentioned, at the core of your solution there is an in-memory application. Can you go a little bit deeper on its architecture and how it works?
Yeah absolutely. If you're used to web scale kind of things, our architecture is a bit different—atypical if you will. We deploy a single process per customer that is completely in-memory. For our largest customers, the heap sizes for this process range up to 4 terabytes, so in a way it's more like high performance computing than typical SaaS if you will.
Everything for that customer runs into this one process, so all the computation, all the hot data is in memory. And this gives us the ability to have to the computation always close to the data, giving us the highest possible performance, avoiding any data transfer or serialization overhead.
As you said you're doing this for improving performance, meaning that there is a lot of computing involved and a lot of back and forth from the CPU to the database to get things done, but don't you need some historical data to make the planning for the future?
Yes absolutely. Forecasting is primarily based on historical data. Our architecture allows us to keep history for many years, hot in data, and still being able to access it with the speed of memory. In another way, our problem is much more data intensive than say CPU intensive. So we need a lot of data accessible fast but the actual algorithms themselves are fairly simple.
At the end of the day and with the number of customers that you have, probably your infrastructure is pretty large anyway. So even if a single customer allocates a small amount of memory…which is not that small actually because four terabyte is quite a lot in a single process.
That of course depends a lot on the customer's size how much memory they need. Four terabytes is really for the biggest retailers in the world. I think currently we are running around 15,000 CPU cores and 250 terabytes of RAM across all our data centers.
Okay, and just to understand better: So this is their in memory part, so the real time analytics that you do on the data and the high performance tier. But then we mentioned that this is a two tier architecture. So how does it work?
All of the in-memory analytical platform we've built here at RELEX is our own in-memory analytics database. And this database works in an ‘up and only’ fashion that whenever there is a new transaction committed into database, it writes a new immutable file that onlycontains the changes from that transaction. In the sense it's an ever increasing log of changes that then allows us to also travel in time to see if some parameter has changed—while we always keep the latest state of the database hot in memory.
OK. So every time you commit something it ends up in the second tier.
Correct. Yes exactly.
And the second tier is an object…it's not even a file or a block storage. So it's not what we usually think about the second tier after memory?
Yes that's right. Of course historically when we started building this solution, we built it usually to local disks on each server. But when it came time to expand to more scale-out features, we realized that object storage directly fits very well into our on-disk format where there [are] only bright immutable new files, never changing existing files.
At the end of the day, as we mentioned, in memory on one side and object storage in the other. And I'm very curious about that. It's quite interesting to associate these two types of storage system. Before going on the object storage side, I'm just quite curious to understand better the memory part. So with all these in-memory databases that are available out there, why didn't you buy one of those instead developing your own solution?
Well we started the development of this in memory solution nine years ago, and at the time the market for in-memory databases was much smaller. And of course RELEX at the time also was much smaller. So Oracle licensing at the time wasn't really feasible for us.
Oh yeah, I can understand that. And do you think that having control of your own application is also a competitive advantage now or it's more about cost savings especially at the beginning?
In the beginning I think it was more about cost savings and also the fact that there weren't very many suitable solutions in the market. That has really changed in the couple past year years, so probably if we started over right now we would build on top of some existing in memory solution. But as it is now I think that having our own database in-memory solution really gives us a big competitive advantage. It allows us to implement domain specific [solutions], so in our case supply chain specific features directly into that database which provides humongous performance advantages.
So back to the object store, is the objects store a commercial solution or is it something that you developed on your own?
For the object storage we decided that it's not really in any way our core competence developing storage solutions. So we ran a very thorough market research project of all the available object storage solutions. Couple of years ago I think we spoke to about 40 different object storage vendors at the time.
Wow, 40 architectures practically covers most of the market.
I think we found even the weirdest smallest ones that don't exist anymore.
What were you looking for? I [would] think some performance at least and the ability to cope with small objects, because I think that when you commit something to the database this is not that huge.
Well specifically for the comet sizes those actually range from very small like some kilobytes to very big—to like tens of gigabytes per comet. So that was really one of the key features we required: this ability to support a range of different sized objects in the same system. Otherwise too we were looking for the most technically flexible solution that in extreme cases would allow things like converging the object storage layer on the same hardware as that compute. Otherwise we focused a lot on the feeling of working with the object storage vendors. We were trying to really find a partner—not just somebody whose sells as the system, but somebody where we can feel like we're having very good cooperation with somebody we really enjoy working with, and somebody we feel we can trust, making sure that our customers stay safe.
I know the name of the vendor because I worked for them in the past and I have to agree with you, the fact that technically speaking is a very good solution and also it's a good team both from the developer side as well as from the human side. Maybe we can also mention that we are talking about the OpenIO, which is a French company. I'm curious about the process that brought you to select this objects store. So did you test more solutions or did you try this one and it just worked?
We did two small POCs with a couple of other vendors as well. One of them was Formation Data Systems which promised a unified storage solution that can take care of object file and block in the same system. But they went bankrupt during our POC and their POC servers got stranded in our offices because we had no place to send them back to anymore, so that didn't go too well. We also did some testing with EMC’s—I think its ECS object storage system. For our hardware we primarily used Dell machines. This was right after that merger between Dell and EMC, so it would have been in that sense a logical choice, but the organization felt too big and slow and technology wasn't there for supporting on premise installations.
So yeah, because you mentioned at the beginning that you can do your installation both at your datacenter and on customer premises, so you need something that can scale from a very large installation like yours down to a single customer installation plan.
Yes, that's a very tough thing to solve.
OpenIO is an open source solution. Did it play any role in your decision?
It did play some role. I think we would have chosen it even if it wasn't open source. But from a business continuity or risk management perspective, having it open source was really a big plus. OpenIO frankly isn't a very large or well established company, so there is some risk in that and the open source version helped with that quite a lot.
Yeah. And how do you manage data protection and disaster recovery for your infrastructure? Do you delegate it to the object store or do you have an external mechanism to do that?
We use a multi-tiered solution where we use OpenIO's replication to replicate between data centers for hot recovery. But we also take ‘old school’ backups of our data. We store them somewhere offsite. We don't use tapes or anything like that for the backups. The backups only contain delayed this state of the database. So if we have to recover from a cold backup we lose all the historical data. So that's really the last resort recovery mechanism.
So you can recover pretty quickly from the object store and repopulate your memory portion of the database. And then if something really, really bad happens to your data center you have offsite tapes too, as a last resort right?
Correct. If for example, here in Finland we have two data centers and we have OpenIO replication between those two sites. So if both of them went down for some reason then we could recover into a third place from that backup. But if either of those sites is uploading the data into memory from object storage, it takes about 10 minutes for that instance to become completely functional again.
Which is a good FTL right?
Yeah it's not perfect. It's good for disaster recovery but it's not good enough for like interactive use, which is why the first scale-out feature we built on top of this storage system is a passive replication mechanism for high availability in our analytical platform.
Okay. So you have replication of data between in memory instances that gives you the ability to respond almost in real time to any sort of single data center or single fail in the infrastructure. And then if something really bad happens, you can restore data in 10 minutes?
Which at the end is pretty awesome. So you provide a very high availability for the application.
Yeah that has historically been one of the downsides of that relaxed approach of single big in memory instances on separate servers; and as we've moved on to bigger and bigger customers, they have higher and higher SLA requirements. So this is something we have worked hard on over the past couple of years.
What's the next step for this architecture in your perspective?
Right now we're working hard on adding more real scale-out features into that solution. Previously we have been mostly memory limited by the nature of our architecture. But with the latest batch of really large customers, we are becoming CPU limited in some cases. So now we are working on allowing computation to happen on multiple servers at the same time in the same database, and all of this replication between the instances on different servers happens through the object storage system.
Okay, so the object storage becomes even more critical for your installation?
Yes. The object storage system is really at the core of our distributed features. All the data between servers in a cluster is transferred to the object storage system. Whenever there is a new transaction committed that new file is uploaded to the object storage system and then it's through there they're distributed through the other nodes. This gave us the persistence for the file at the same time as the replication to all of the nodes.
And do you think that in the future you will need an old flash object store to do that?
That's a great question. Right now we are fairly positive that the performance of spinning disks in the object storage clusters should be good enough. We've so far in the old architecture been using spinning disks on the local servers as well and that has been fast enough. OpenIO's architecture also is unique in the way that it really scales when you add more machines. So at least in theory, we should be able to just keep adding spinning disks if it looks like we're running out of bandwidth on the disks and save today like that. In other words we currently don't have any plans to add flash into our architecture at all.
Iikka, that was a very, very nice conversation and I really loved how you explained how you do things at RELEX. Just to close this episode, it would be nice to give to our listeners a few links about RELEX, where we can find a website, as well as a few social media links [if listeners/readers wish] to continue this conversation.
Yeah sure, the website is www.relexsolutions.com. There you can find, well mostly if you are in the retail business, things interesting to those guys, [like] case studies and customer references. But there are also a couple of white papers on our database and analytical architecture for the interested. RELEX is also in LinkedIn, as RELEX Solutions, and on Twitter as @relexsolutions. I myself am only on LinkedIn, you can find me there by searching for my name.
Yes people will find how to spell your name on the ViDS episode web page because it's quite complicated for a non-Finnish guy.
It's a very typical Finnish name with a lot of double letters. So if you're interested, go check out the website. Find me on LinkedIn and do get in touch.
Very good. Iikka thank you again for the time you spent with me today and the information you shared about your architecture and how you do things at RELEX.
Thank you Enrico. It was really fun to chat with you.