2 Comments

Summary:

In-memory, SQL, NoSQL and graph databases were on display in a fiesty discussion about databases that don’t involve Hadoop. The distinctions stand out amid growing interest in specialized databases in a big-data age.

Structure Data 2013 Ryan Garrett MemSQL Emil Eifrem Neo Technology/Neo4j Andrew Cronk TempoDB Damian Black SQLstream
photo: Albert Chau


FOUR FOR THE FUTURE: UPCOMING DATABASE TECHNOLOGIES THAT ARE NOT HADOOP

Speakers:
Announcer
David Linthicum
Damian Black
Andrew Cronk
Emil Eifrem
Ryan Garrett
Audience Member 1
Audience Member 2

ANNOUNCER 00:02
Thank you Barb. Here we are at last one. Make it a good one. ItÕs going to be a good one. WeÕve got Four For the Future: Upcoming Database Technologies that are Not Hadoop. ItÕs going to be moderated by David Linthicum, Analyst with GigaOM Research. WeÕve got Damian Black, CEO of SQLstream; Andrew Cronk, CEO TempoDB; Emil Eifrem, Founder and CEO Neo Technology and Ryan Garrett, VP Product MemSQL. Please welcome our last panel of the show [applause].
DAVID LINTHICUM 00:37
IÕm David with the hardcore session of the GigaOM show. LetÕs talk about databases that are not Hadoop, which I believe it or not there is many out there in many different flavors of technology out there and let’s consider for today and IÕm going to introduce our panel here. First, Damian Black, CEO SQLstream. So tell us a bit about your technology Damian, just to kick us off.
DAMIAN BLACK 01:03
So SQLstream is the opposite of a database. Instead of saving the data and running the queries, we save the queries and run the data through the queries through rich time series analytics. So we stream out the real-time results continuously and they can stream into other queries and so on, so a massive parallel, continuous visibility into streaming real-time data.
DAVID LINTHICUM 01:28
WhatÕs the typical use case for that?
DAMIAN BLACK 01:29
Often processing log file data, operational analytics where you want to take action sooner rather than later. If you let it, the problem go on and itÕs going to cost you a lot of money, security break-ins, a customer you want to get hold of right then, make that promotion whilst theyÕre still using your service and still active.
DAVID LINTHICUM 01:50
Okay and Andrew. Tell us a bit about your database?
ANDREW CRONK 01:55
So TempoDB is the time series database service. So we store, analyze, monitor, eventually predict on time series data generated from sensors, smart meters, servers, and things like that. WeÕre really useful for developers who want to store lots and lots of data, where in the past you might have been able to only store maybe hundreds of millions of rows. What we know about time series is this little timestamp/value pairs, there is billions and billions of them. WeÕre only generating more with all the sensors weÕre putting out in the world. So weÕll probably be able to store all that. We deliver this as a service for our customers.
DAVID LINTHICUM 02:25
Great. WhatÕs the use case typically for your technology?
ANDREW CRONK 02:29
So a use case for us is see anywhere where people are measuring more things than they used to. So a great example is smart grid right? I used to have 12 measurements on my home per year, now thereÕs multiple meters measuring every five minutes. ItÕs an explosion of DataNav and those customers having trouble storing it.
DAVID LINTHICUM 02:42
Awesome, awesome. So Emil Eifrem, Founder and CEO Neo Technology; tell us about your technology?
EMIL EIFREM 02:48
Sure. So I work for Neo Technology which is the commercial sponsor of Neo4j. Neo4j is the world’s leading graph database. A graph database is part of the NoSQL family if you will and the real differentiation of a graph database versus other NoSQL databases is that the building blocks that it exposes are nodes, tight relationships between nodes and then key value properties that you can attach to both these nodes and to the relationships. You build up a large graph. It’s inspired by how the human brain works with neurons and synapses, and it’s great for storing any kind of connected data. It’s an old TP system, unlike some of the other products here on the panel, which means that it can be used to back your application runtime. Typical use cases, social is the obvious one. Most of you in here probably think that social graph equals graph database or graph database equals social graph, and thatÕs very true. But there is actually a lot of horizontal use cases, recommendations, fraud detection, network management, master data management, etc.
DAVID LINTHICUM 03:56
Awesome. So last but not least, Ryan Garrett, VP Product MemSQL. Tell us a bit about what you have?
RYAN GARRETT 04:04
Thanks David. MemSQL offers a real-time analytics platform thatÕs built for big data. So at the heart of that of our solution is an in-memory distributed database, standard SQL skills out on commodity hardware and some code conversion under the covers to speed up your analytics. Typical use cases are a lot of the things that we’ve been talking about here at the conference, whether thatÕs smart grid monitoring, network security, [inaudible] analytics, lot of the things that we’ve been discussing here.
DAVID LINTHICUM 04:38
I got a question for you guys. How many people in the next couple years are going to use databases or not Hadoop related? Ultimately will those databases be traditional databases, Oracle, IBM, those sorts of things or would it be the new technologies such as the technology you see up here. Good. Ultimately, this technology has value because refining it, thereÕs different requirements as rebuilding some of these larger systems, and that in many instances Hadoop-based technology may be a fit and many other instances, itÕs finding out there is a consultant Hadoop technology ultimately may not be a fit. And these are the instances there. So let me start with you Ryan. How are you describing non-Hadoop database technology today? Ultimately, what do you consider non-Hadoop database technology?
RYAN GARRETT 05:29
Well, I think it really gets down to the use cases that you’re trying to solve for. HadoopÕs great for batch processing, batch loading, but there a lot of, at least in our particular case, real-time use cases where you want to analyze data as it’s being generated rather than batch loading into a system and then analyzing later.
DAVID LINTHICUM 05:49
So what would you say and this question is for you Damian, that the limitations of Hadoop are that people need to consider before they jump feet first into that technology?
DAMIAN BLACK 05:58
I think a lot of people think that Hadoop sort of will magically solve all of your problems and a lot of the reality is that it takes a lot of work to build applications particularly using MapRejuice and to manage the infrastructure and to debug those applications. IÕve heard that often people will spend more time and effort building the applications to debug the app rather than the actual app youÕre building. But I also see now a renewed interest in SQL as people want to be able to get the value out of the information thatÕs stored in each base, stored in HDFS and they want to get the results out and in our case, itÕs the matter also of re-streaming data out, so you can do scenario analysis over complex time series analytics.
DAVID LINTHICUM 06:45
Good answer. So Emil, where kind of in the point where we have hundreds and hundreds of databases weÕre tracking right now. We talked a bit about this in the speaker longue. What advice would you give to people who are trying to match the requirements up to the database and what kind of requirements would typically lend themselves well to your technology?
EMIL EIFREM 07:04
ThatÕs a super complex question, so thanks for shooting it my way. I appreciate that.
DAVID LINTHICUM 07:08
Okay.
EMIL EIFREM 07:10
I think the top little thing, the first thing you should lead with is look at the shape of your data and I think actually the role of the data architect has been which of the commercial relational databases should I purchase. Should I give my money to Larry or to IBM or to build, that has basically been the role before. I think the role in the future is going to be look at my huge data set because all data sets will be huge, identify parts of it that have different shape. So for example, youÕre going to have parts of your data set, which is relational, which is tabular, which fits really well in a relational database. YouÕre going to have part set thatÕs really like tall skinny tables, which is very key value oriented often put that in the key value store, and then youÕre going to have part set thatÕs very complex and connected and graph oriented, then you should put that in the graph database. So itÕs a very complex question. There are so many facets, obviously, but I like to lead off with matching the shape of the data to an appropriate data model, because all of a sudden now we have different data models where as we used to have only one data model and I think thatÕs the high order bit.
DAVID LINTHICUM 08:20
ItÕs a good answer. So Andrew, hereÕs one of the questions, another tough one. How would you differentiate your technology from Hadoop? What are the three or four things you would list?
ANDREW CRONK 08:31
So I sort of view Hadoop as sort of a bigger, better mousetrap, but why are we still trying to catch a mice. I think we should give our developers better abstractions and so we have picked one very specific use case, time series. If you want to measure something, hereÕs the tool you should use, right? If you want to measure something in first, itÕs going to be lots of data surely you can put it in, in HDFS. Then what? I think we need to do more for our developers to push them further down the line, like you said, building on. If you know how the data looks like, you can actually build tools so the developers can do less work and move beyond just playing on Hadoop. So you donÕt get a use case for it. Okay, I purchased Hadoop, now what? I think we need to more to build on top of that for our developers.
DAVID LINTHICUM 09:10
So Damian, back to you. In-memory database is seen to be on the rage right now, that the price of memory is kind of fallen greatly, raced to the bottoms of its peak. What would you consider the advantage of the in-memory databases and how does it relate to your technology, and what advice would be to make it more complex? What advice would you give people who are considering in-memory databases?
DAMIAN BLACK 09:30
I think itÕs absolutely clear that memory is getting very cheap, relative to how it was before. You can now get very large and hold large amounts of data in main memory. Oracle currently announced strategy according to Larry is that is all going to be held in RAM, install everything, not that youÕre going to bother with flash, itÕs going to hold everything in RAM in the X series. So, from our perspective, weÕre holding the data in memory long enough to be able to do the continuous analysis and then feed other systems and other databases. But complementing that also with in-memory technology is to be able to enhance the data and enrich it very quickly. Again, itÕs obviously going to be faster if youÕre pulling it from memory and very often what you want to do is to make a real-time analysis through continuously looking for something, it might be a pattern of fraud, it might be looking for a good prospective customer or a problem with the service, but you want to complete the analysis with a very high-performance historical query and that’s where you need to pull in then. You get a potential candidate for all store or a candidate customer, prospective customer, you want to see what have they bought in the past. In the case of fraud, what damage have they done in the past as well. By bringing all those together and if you want to do that in real-time, you need to have everything really available in main memory.
DAVID LINTHICUM 10:50
Makes sense. So Ryan, same question. I think itÕs also related to your technology, isnÕt it?
RYAN GARRETT 10:56
I think like you said youÕre seeing memory get much cheaper. The use case is that the worst thing are people that have real-time data coming in that they need to act on instantly to what you were speaking to Damian. If you think about the time value of data, your most valuable data tends to be the data that was most recently created. So when you need to sort of act on something instantly whether itÕs the virility of a mobile app or whether itÕs a network attack that you need to know about immediately, anything thatÕs going to affect your user experience negatively, you want to be able to act on that right away. ItÕs a sort of prevent any losses that you might incur.
EMIL EIFREM 11:35
Actually, can I jump in there?
DAVID LINTHICUM 11:36
Absolutely.
EMIL EIFREM 11:37
One of the things with in this role end up being a panel, itÕs a lot and I hate panels when you agree. ItÕs like super boring panels. So, IÕm going to take this opportunity to disagree.
DAMIAN BLACK 11:48
Looks certainly like your stuff in-memory or not. I donÕt know.
EMIL EIFREM 11:51
Well, thatÕs what IÕm getting.
DAMIAN BLACK 11:52
Thank you.
EMIL EIFREM 11:53
So I never understood the holding memory thing, right? So in Neo4j, weÕre a database. We store shit on disc and thatÕs what we do, weÕre a database right? But–
DAMIAN BLACK 12:06
Swear jar.
EMIL EIFREM 12:07
WhatÕs that?
DAMIAN BLACK 12:08
I said swear jar.
EMIL EIFREM 12:09
Okay. Thank you for keeping me in line. The point is that if we get deployed on a box with One Gig of RAM, weÕre going to use that RAM. If we get deployed on a box with four Gigs of RAM, weÕre going to use that RAM. If you get deployed in a box with 100 Gigs of RAM, weÕre going to use that RAM. We have a customer who has more than half of FacebookÕs graph in four instances of Neo4j. We also run on Android; same code, same product. So IÕve never really understood the whole point with in-memory. WeÕre also in-memory if needed. If you have enough memory, weÕre going to be in-memory. If not, weÕre going to be on desk because weÕre database.
DAVID LINTHICUM 12:50
IsnÕt that fair that youÕre going to say something down the line?
DAMIAN BLACK 12:54
Well, there are different designs. I mean if you know itÕs going to be a main memory, you can do things differently. You could index the data differently. I mean there are certain architectural advantages you can take better advantage of the level 1, 2, 3 caching. So, there are differences.
DAVID LINTHICUM 13:09
Are you guys developing your stuff, so itÕs native down to the metal. So youÕre very aware not to the platform APIs but down to the ultimate native APIs, exactly what kind of hardware systems youÕre using?
EMIL EIFREM 13:23
Are we?
DAVID LINTHICUM 13:23
IÕm again looking at you.
EMIL EIFREM 13:26
So we build on the JVM. So we work across any hardware that networks with the JVM. We do some things, for example, if you use Fusion-io or we like SSDs a lot. So then we try to better at that, but generally speaking, weÕre going to be as horizontal as possible.
DAVID LINTHICUM 13:44
In doing some of the benchmark testing that IÕve involved with over the last year, we find that sequential rates to mechanical drives are not that bad versus random rates to mechanical drives which are horrible versus random rates to memory is not that bad. So should the database be a sign to figuring how to lay things down in IO and if so, how is the decision made by the database providers to whether to put in-memory or put it on a mechanical disc, or should it be their decision?
EMIL EIFREM 14:17
So thatÕs actually something that we as a time series database service, we went through that exact routine last year. Should we go with SSD, should we go with spinning discs, should our customers care about what that is? So the stance we take as a service is no our customers shouldnÕt care, thatÕs our problem to figure out. And we end up going with is yes, spinning disc we find maybe 10% slower than SSD, but also three times cheaper. So itÕs this kind of things that we go through internally, but when we talk about offering abstractions to developers, itÕs something that we want to have it straight away.
DAVID LINTHICUM 14:51
So this is a question and IÕm going to go back to Damian on this one. So your product seems to be middleware related, is that accurate? So we get into integration between various systems and basically putting technology between points and ability to become very good at that. And there seems to be an advantage from a performance aspect of it to make that happen, really kind of data integration. Give me your perspective in terms of your technology and how you deal with data integration, communicating to other systems, things like that?
DAMIAN BLACK 15:19
Whether weÕre middleware or database technology, it depends on your perspective. Now what we are is querying the future continuously versus polling the past, which we need to do both from the database perspective. From the middleware perspective, what we are is subscribing to relational views of information and underneath those relational views, there can be a deep pipeline of analytics that is processing all of the information, doing the shifting, joining the aggregation. So it’s really a great way of solving your real-time continuous data integration challenges. It’s sort of raising the level of middleware to data analytics and data management, so that you can reuse streams of information coming from machine data, log file, sensors, whatever it might be and assembling just those dynamic streams, those views of information that each application wants to be able to consume in the format that they want, that they need it in the time that they can process it in the manner that they can accept it. So for us, itÕs the kind of holy grout or really trying to finally declaratively that means automatically optimizing and providing very high level languages to describe the information processing to be able to solve that problem of the dynamic data integration while at the same time solving the problem of real-time analytics.
DAVID LINTHICUM 16:46
So you basically have built a system that approaches the problem and has a systemic way to solve the problem?
DAMIAN BLACK 16:54
We think itÕs a fundamentally different way of looking at it. ItÕs incredibly simple when you think about it, save the queries and run the data and when you start thinking that way, it just changes the perspective on how you look at information. The biggest problem that remains out there period is, is real-time data integration. It saved IBM when the mainframe fell off the cliff. ItÕs just been extremely labor intensive, people are prepared to pay literally billions of dollars on projects that may or may not succeed, but there are great software infrastructure solutions that can make this stuff incredibly simple. One example rather than sort of two million lines of traffic analytics application that you can get from an Inrix or Google and not to particularly pick on those ones. For us, it ended up being 20 pages of SQL high-level queries running continuously, integrating and analyzing data, and that kind of thing makes a big difference in terms of the economics and the value of the systems that you deliver.
DAVID LINTHICUM 17:50
It does indeed.
ANDREW CRONK 17:51
I actually had a question about that. So save the queries, run the data. What if you want to ask a different question later, would you use something like a permanent store like TempoDB to than Be Your Source to run the data again or how do you think about that?
DAMIAN BLACK 18:03
ThatÕs a great question. I think itÕs really important that you are able to materialize the use that you want. So youÕre driving your real-time views, integrated information, stream it into an HBase, stream it into TempoDB, so that you can pull the information out to enrich your continuous analysis or at the same time using databases, in-memory or otherwise, because you have queries that are running continuously looking for some important pattern, something thatÕs going to really hurt you if you donÕt act now. But to complete you analysis, you need to bring in the historical information. Historical information processing is absolutely critical. What people are starting to think about is the continuous processing as well and up until now, there hasnÕt been a lot of attention on that. ItÕs being done in programs only.
DAVID LINTHICUM 18:48
So question for Emil and then the same question for Ryan. How would you solve the data integration problem, is it built into your core technology or youÕre depending on other technologies to solve that problem for you?
EMIL EIFREM 19:01
So we’re completely not solving that problem. WeÕre saying, ÒHey, letÕs try to expose this good APIs as possible and letÕs try to be as good a citizen is possibly in what we consume,Ó but the data integration problem is someone elseÕs problem.
DAVID LINTHICUM 19:17
But Emil, you are populating your graph databases from social networking sources right? So–
EMIL EIFREM 19:22
Amongst others, yes.
DAVID LINTHICUM 19:24
So is that not data integration solution? Integrating Four Square data with Twitter data.
EMIL EIFREM 19:33
Well, you can certainly take a lot since the graph model is so flexible, you can certainly take a lot of data from very many disparate sources and put them into a graph. If that is the data integration problem, then we are tackling it. But in terms of data exchanging over here now, I need to communicate that over to this other database over here and I need to take this old mainframe DB2 installation over here and integrate it with this stuff over here. We don’t really do that.
DAVID LINTHICUM 20:02
Okay, Ryan same question.
RYAN GARRETT 20:04
So what weÕre focused on is both your real-time and recent historical. So that’s where we play. We want to obviously be able to work with the tools that people are familiar with, but it’s not a heavy data integration focus.
DAVID LINTHICUM 20:21
Makes sense. Okay, I got a question for all four of you; raise your hands if you guys think you have the best performing database?
S? 20:28
The bigger, the best.
DAVID LINTHICUM 20:30
Best performing database.
S? 20:30
On what dimension?
S? 20:31
What does that mean?
DAVID LINTHICUM 20:34
Best performing database out of all the databases here. So what about performance?
S? 20:42
I want to disagree, I want to argue, but even I canÕt raise my hand to that.
DAVID LINTHICUM 20:45
So another instance of consultant thing, it depends. It depends on the size of the data and itÕs like that.
ANDREW CRONK 20:49
So I think thatÕs the point at least I view the future as a series of purpose-built tools to solve the problems that we are encountering right? To me big data is anything that breaks the stuff with that before. Therefore, perishable tools war arised to solve it.
DAVID LINTHICUM 21:01
Right. I will change my question then, and you will go first.
ANDREW CRONK 21:03
Okay.
DAVID LINTHICUM 21:05
What is the scenario where your database is the best performing database?
ANDREW CRONK 21:09
Developer wants to measure something, store it all for ever, create really fast–
DAVID LINTHICUM 21:13
Okay.
ANDREW CRONK 21:13
–over time.
DAVID LINTHICUM 21:14
Damian, same question for you?
DAMIAN BLACK 21:15
ItÕs very simple. ItÕs where you want to process queries, process the data continuously. So youÕre not polling, youÕre running the queries continuously. Then, we have the best performing solution in the market.
DAVID LINTHICUM 21:25
Emil?
EMIL EIFREM 21:26
Anytime you worry of data that is connected. So data elements that relate to one another. Anytime you would use a foreign key in a relational database or invent some sort of identifier in a document, itÕs a graph and we kick ass.
DAVID LINTHICUM 21:40
Ryan?
RYAN GARRETT 21:42
Anytime you want to be able to analyze both your real-time streaming data and then data that is recent historical, so real-time minus a day, real-time minus a week, thatÕs where we excel.
DAVID LINTHICUM 21:53
Okay, and if the Oracle guy was here, he would say how much money do you have?
S? 21:56
When we scale that monetary hardware or something.
DAVID LINTHICUM 22:01
So where do we find the talent to implement this stuff. One of the things – IÕm a consultant and IÕm out there working with this larger enterprises out there and IÕm like ÒWell, you need to use this database in this case and this database in this case.Ó We are assembling these very complex solutions and theyÕre like, ÒI got a couple of guys who know the Oracle stuff, couple of guys who know IBM DB,Ó youÕre naming five do technologies I have to find talent to implement. So what should I advise them in terms of finding talent to leverage your technology? IÕm going to start with you Damian. Will that work?
DAMIAN BLACK 22:30
Well, I think it’s something that everyone on this panel will agree with, but weÕre based on SQL standards. We took a lot of effort and time to implement the SQL standard and itÕs been around for years. WeÕre all speaking English on this panel and I know that Emil is Swedish, like a brotherly [inaudible] but we are not speaking Swedish here, weÕre actually speaking English. SQL is the lingua franca for data management. ThatÕs huge and there are literally millions of users that understand SQL at some levels, thereÕs a huge investment. So our message is, donÕt learn something new, stick with what has been proven and understood and then you know that youÕre going to be safe and itÕs going to work. And you know what the clarity of it is high level, itÕs auto-optimizing, itÕs proven.
DAVID LINTHICUM 23:19
Andrew?
ANDREW CRONK 23:20
So everything we do is marketed towards developers. We see the future as developers using services, and that cuts out a lot of the traditional DB or IT type of roles, but weÕre offering a specific abstraction. So itÕs very easy to reprimand around, okay, itÕs a time series stream; I want to store it all. ItÕs very easy to understand, so we have a lot of times when people are evaluating us, they say, ÒOkay, how many distributed systems engineers do I need to put on this.Ó We say, ÒWell, do you have a web developer that understands APIs, great here we go.Ó So for us, lowering the barrier to actually getting the power out of these big data systems is what we focus on all the time. ItÕs all in the API design.
DAVID LINTHICUM 23:55
Emil?
EMIL EIFREM 23:57
I have to agree with Andrew. I think it’s all about providing easy-to-use APIs and I think we’re now entering a phase where middleware has been for a while where it’s not a crazy thing to have to learn something new. And to DamianÕs point, I actually think weÕre now getting into a point where SQL doesnÕt solve everything. So we have a problem that we can solve with SQL, the query language or relational databases the implementation. So then we build a great new implementation and then we slap SQL on to it. So we had a problem, now we have two problems.
DAMIAN BLACK 24:33
In some sense, thatÕs right because there are a bunch of things that you canÕt express with SQL or itÕs just awkward. Obviously, the benefits are big in terms of existing install base, but I just think that modern developers don’t have a problem learning a nuclear language or learning new APIs. They are forced to do that all over the place in the stack anyways. So that’s the approach we take.
DAVID LINTHICUM 24:58
Makes sense, Ryan?
RYAN GARRETT 25:00
I have to second what Damian said. Every enterprise has a team that knows SQL. So we go for SQL. ThatÕs something important I think that thereÕs a trend thatÕs maybe not clear to everybody, but developers like NoSQL, because it has low impedance, itÕs easy to fit them to store the information in the way thatÕs most convenient for them. ItÕs enterprises like SQL because you want to be able to get the information out. ItÕs all about sharing and reusing the information in ways that weÕre not originally envisaged and makes it more difficult for developers. ItÕs true, thereÕs more work on them. So if you have a problem where you want to try and build an application thatÕs got this processed information, then NoSQL is really good way to go, but if you want to try and get information thatÕs going to be reused across applications and across the organization, with standard reporting and report generation and so on, then clearly there are lots of advantages with SQL and thatÕs why I think there are two sets of technologies and itÕs been on the revenge of the developers coming back saying, ÒGive me controls back againÓ versus the people saying, ÒLook, this is an important corporate asset, we want to get some kind of competitive advantage, we want to extract value from the data and turn it into useful information,Ó and thatÕs really where the two different perspectives come from. But thereÕs also I think a tight–
EMIL EIFREM 26:23
Sorry the graphing stuff is different.
S? 26:26
I think it is.
S? 26:26
I agree with that.
S? 26:28
I think it is. I think the difference is that of timeline right? I think early on in a process when weÕre building something completely new, a new product category, I think premature standardization truly is the root of all evil. It will kill innovation. If I were to go to all the–
DAVID LINTHICUM 26:44
Should there be a standardized graph query language?
EMIL EIFREM 26:47
Yes. Eventually, but not now. Right now, it would kill everything. Well, not everything. Everything that I care about.
DAVID LINTHICUM 26:52
Thanks for you guys. Anybody have a question in the audience, please come back to the microphone, love to hear from you. Looks like thereÕs a fight about to happen. Stand in line.
AUDIENCE MEMBER 1 27:04
This is fine, I really appreciate your comments. The history of databases is there are a few core technologies they soak up 98% of the dollars and there are lots of specialized engines around the edges that serve really niche problems. Could you guys comment a little about what you think that sort of very strong monolithic core or one solution taking most of the dollars and the rest being scraps for middle niche players? Well, itÕs really breaking down and you see much more of a landscape of a diversity of database engines suitable to different types of problems where thereÕs actually reasonable market sizes of the billion dollars and more supporting a rich eco-system?
S? 27:55
IÕll start briefly. I think youÕll see that the traditional Oracle for everything breakdown when the business drives it. So the use case for us with insurance right? They want to measure your car, how fast youÕre driving, whatÕs going on in your home, all such stuff like that. ItÕs actuaries who have a business need that requires more data that breaks the current database. Then, a perishable solution will come in and so what we see across many industries is thereÕs these individual business drivers that are making it happen. So, I could go through all sorts of examples but thatÕs just one that we see thatÕs changing their purchasing behavior.
S? 28:28
I unfortunately agree. The relational database market has grown with a healthy 5% to 12% to the past umpteen years right? I think itÕs going to continue doing that for many years going forward, but I think weÕre going to see triple digit growth in these new markets. So I think by the end of the decade, thereÕs going to be several substantial companies with alternative data models out there.
S? 28:56
I think weÕre getting more complex and therefore more distributed and I guess thatÕs the word. It’s going to get more complex before it gets more simple, but the reality is weÕre going to head a huge wall with the relational stuff. ThatÕs where weÕre seeing the divergence away from databases today and I think it’s going to be a good thing. But it is very difficult for me to explain the market to somebody whoÕs even been in the database market for a long time.
DAVID LINTHICUM 29:18
Next question.
AUDIENCE MEMBER 2 29:22
Can you guys hear me?
DAVID LINTHICUM 29:23
Yeah.
AUDIENCE MEMBER 2 29:24
So my question is more direct to all four of you. So I come from a mega corporation and for us, we can do all the prototypes you want. We can download all four of your software today, but management will never accept it without the seal of approval. So how do you guys propose to deal with that part?
S? 29:46
Its part of the reason why I think itÕs important to embrace standards where they exist rather than inventing your own new approach. And I think itÕs really important to EmilÕs point as well. For the graph stuff to takeoff, you need to have a standard because who wants to start investing in a proprietary language that may not be there in 12, 18, or 24 monthsÕ time. YouÕre investing more than ever now. People’s time is so expensive relative to the hardware investments and even the software infrastructure investments that you have to protect that kind of investment, and thatÕs when again why we think SQL is more important than ever, and if you have some new technology that requires something that SQL can’t handle in those situations, then I think you need to have a standard so that people can feel freer to make those investments.
DAVID LINTHICUM 30:39
Anymore questions from the audience, would you guys want to address that or follow up on that one? Sorry, any other questions from the audience? Okay, going back to panel, 30 seconds or less, where is this technology going to be in five years and IÕm going to start with you Damian, looking right at you now?
DAMIAN BLACK 30:54
Well, I think this has been a very interesting show compared to last year. Everybody now is talking about streaming, suddenly it was the kind of red-headed stepchild; red heroine can feel that one. We have been marketing streaming technology for a while now, but it seems to be this is the year when it’s all happening, people are suddenly getting it. All of the Hadoop distributions are talking about streaming technology. Often itÕs more marketing than actual technology, but it does show that thereÕs a big awareness now of what’s happening and we are seeing that everybody wants to move now to real time and to be getting on top of the fire hose of data that’s gushing out and itÕs only going to grow even faster.
S? 31:43
It kind of makes my data sound gross, but go ahead.
DAMIAN BLACK 31:45
So we think weÕre seeing a fantastic market and everybody now seems to be looking at continuous execution of streams, operational analytics and weÕre seeing now technologies like Splunk that just really donÕt do anything of no real power and querying and SQL can just provide the rich time series analytics that you really need.
DAVID LINTHICUM 32:09
Andrew?
ANDREW CRONK 32:10
So if you believe GE’s marketing of industrial Internet or CiscoÕs internet of everything, if you believe there would be more connected devices measuring more stuff about ourselves and environment, itÕs going to be explosion of connected devices and time series data. In five years, I think it will all be in TempoDB [laughter].
DAVID LINTHICUM 32:30
Emil?
EMIL EIFREM 32:33
So I first think that we need to look at all TPs separately from OLAP. I think that in both of these, weÕve seen a Cambrian explosion of project and product, but I do feel like in the OLTP the one thatÕs typically called NoSQL, weÕre starting to see consolidation. There’s really only five major players left now in NoSQL, thereÕs Mongo, thereÕs Couch, thereÕs Cassandra, there’s Neo4j and thereÕs Basho, whereas there used to be 20 just a year ago. So I think five years from now, itÕs going to be shaken out a little bit between them. I think the key value stores and the document databases will emerge and compete with one another. So thereÕs going to be maybe two to three left five years from now. I think on the OLAP side, where Hadoop plays and some of the guys in the panel play, I think weÕre still earlier, itÕs still a Cambrian explosion. So there may be more players alive five years from now.
DAVID LINTHICUM 33:30
Okay.
RYAN GARRETT 33:31
I think in our case thereÕs going to be much wider adoption, memory is going to continue to get cheaper, the use cases are going to be more widely recognized and data volumes are going to continue to grow. So–
DAVID LINTHICUM 33:42
Absolutely. I think thereÕs a great position. Great predictions guys. So I would like to thank our panel, Damian Black, CEO SQLstream; Andrew Cronk, CEO TempoDB; Emil Eifrem, Founder and CEO Neo Technology and Ryan Garrett, VP Product MemSQL. And also thank you very much audience for coming and listening to our panel today. Take care [applause].
ANNOUNCER 34:14
WeÕre almost done. But before we go, itÕs been a lot of fun, but letÕs bring out our conference chair, Derrick Harris, who helped put this all together along with Om Malik, Founder GigaOM, without which there would be no GigaOM structured data. LetÕs welcome him out for the closing remarks [applause].

You’re subscribed! If you like, you can update your settings

firstpage of 2
  1. Kazuya Mishima Friday, March 22, 2013

    The Emerging Big Returns on Big Data report by TCS provides insights into how large companies across North America, Europe, Asia-Pacific and Latin America have invested in Big Data and how they intend to derive returns from it. ……http://bit.ly/YHCpQp

  2. Reblogged this on pazikas.com.

Comments have been disabled for this post