1 Comment

Summary:

Financial institutions have a lot of data — as in multiple petabytes– so storing that data for use in new products and for regulatory compliance will move to the public cloud.

Ron Bodkin Think Big Analytics Ann Neidenbach The NASDAQ OMX Group
photo: Albert Chau


Session Name: Analytics At NASDAQ Scale.

Speakers: S1 Announcer S2 Barb Darrow S3 Ron Bodkin S4 Ann Neidenbach S5 Audience member 1 S6 Audience member 2

ANNOUNCER 00:00

… think big in analytics. And Ann Neidenbach, she’s the SVP of Global Technology Products and Services at the NASDAQ OMX Group, and they’re going to be talking about analytics at a NASDAQ scale. Please welcome our next panel.

BARB DARROW 00:21

The diehard crowd. Thanks for sticking around. We’ve got a great panel for you. We’re not going to be talking about frivolous Big Data applications or analytics. We’ll be talking about analytics that affects your DOE, so it’s not frivolous at all. I’d like you guys to introduce yourselves and I’ll kick off with a few questions, and I would love for the audience to ask questions. You just have to make yourselves known, we’ll leave time at the end. I’m Barb Darrow, senior writer for Giga OM.

RON BODKIN 00:48

Hi I’m Ron Bodkin, founder and CEO of Think Big Analytics. We’re a consultant working with NASDAQ. We help enterprises get value out of Big Data with engineering and data science services.

ANN NEIDENBACH 00:58

Hi there, I’m Ann Neidenbach – wow, [laughs], I’m speaking loudly – and I’m Senior Vice President of Technology. I run all of the development teams that support not only our internal market systems but we provide technology to markets all over the world. You guys are probably familiar with NASDAQ, the stock market in the US, where we have equities and options. A few years ago we acquired OMX, and we actually have 13 European exchanges, mostly in the Nordics and the Baltics. But we also have this incredible technology company in which we provide exchanges, clearing houses, compliance regulators technology solutions, trading systems, clearing systems and surveillance systems. We have over 84 clients all over the world that we provide this technology for. We have a 20-year heritage of doing that, and a very big part of our business, as part of our overall corporate revenues, is that technology business.

BARB DARROW 02:01

So from that introduction, you can get an idea of the scale NASDAQ is dealing with, but maybe you can talk a little bit about the sheer amount of data that you’re handling.

ANN NEIDENBACH 02:08

So the US markets alone – I’m sure you guys are very familiar – as the index keeps going up and down the volatility – the VIX – there’s a lot of trading that goes on. I mean, we’re talking about billions of rooms a day, terabytes of data a day, that we’re collecting – whether it’s market data or whether it’s orders-executions – that we’re collecting a lot of that data. We, running the business, look at that data and try to figure out what is the execution quality in the fill rates as we’re going out and talking to our customers, trying to track more and more of that market share to our markets, but also looking at our competition. What is NYSE doing? What is BATS doing? What are the other options exchanges doing? How can we tweak the pricing to get more people to post their liquidity on our markets or to incentivize some of the takers? We’re constantly looking at that data, we have a large amount of analytics in that. That’s in our running the business.

ANN NEIDENBACH 03:05

What we’ve also been doing in the technology space that I talked about is we’ve been acquiring a lot of different companies. We have a company called SMARTS which is providing surveillance and compliance-type tools to, again, regulators and exchanges and brokers around the world. We also have another risk product called BWise, which we sell to corporates in terms of handling their risk management. We have another company called Glide that is doing social network trending. We also sell trading and clearing solutions to 70 plus exchanges around the world. We have a lot of data. What we’re doing now is saying what can we do to build on top of that, what kind of business intelligence and offerings can we do. What if we take the trading data and we blend in the social media? Right now, we can look and see if we think there’s insider trading activity or we can look at it from a business perspective – execution quality – but let’s layer in some of that social media and the trending on Twitter, and we’ve got sentiment analysis. So what can we do to start to blend these different disparate spaces or data-types together, and that’s where we really look to think big, to help us look at all of our various data products and see how we can start to bring them together.

RON BODKIN 04:30

One of the things we see is this interesting trend where a lot of these technologies first grew up on technology companies, internet companies, web scale companies, so we’re actually in this interesting position of taking some of these patterns and practices for agile analytics, data science, and bringing them into the financial industry and saying ‘look at how you can take advantage of some of these capabilities to drive value for your customers with your unique data sets.’

ANN NEIDENBACH 04:57

What we did is we have, again, legacy analytics tools and databases – internal. We are – and I think you guys may have heard about this – we have a partnership with Amazon, with AWS, and we have Fin Cloud. Again, this is providing the financial community the ability to have compliant data storage, WORM-type storage in the cloud. Real big focus on information security and making sure that storing of that information data is secure. We’re going from leveraging Hadoop-Cassandra-type technologies, public cloud technologies, to older, traditional databases. How can we take all of this disparate data and making something of that? That’s really where we were asking these guys to come in, help us with a reference architecture to start to build on top of that and business opportunities on top of that.

BARB DARROW 05:51

This lets you preserve your investment, and it’s not a rip-and-replace thing necessarily because–

ANN NEIDENBACH 05:56

No. We didn’t know what to do, to be honest with you. My experience being at NASDAQ is you buy companies, you look for the synergies, figure out, okay, what trading system lives, which one goes away, and so I did think when I brought in Ron and Rick from Think Big, okay, which of these little databases is going away, we’re going to put everything in Fin Cloud. I have to tell you, honestly, that was my expectation. We will be leveraging Fin Cloud because the economics are so phenomenal but we also see that we can leverage and layer on top of some of the already-core investments that we have with our existing infrastructure.

RON BODKIN 06:38

This is indeed what we see. People are bringing in Big Data technologies to complement their existing systems. We’re not seeing high frequency trading and nano-second latency applications moving to Big Data architectures, at least not yet. But there’s a whole range of surveillance, risk-monitoring, intraday applications where low latency big data is valuable as well as the deep analytics of when you’ve got those terabytes of data every day, and you’re trying to aggregate it together, looking back over the history to be smarter is super valuable. Stepping back and building that roadmap, we have what we call a ‘Think Big and Imagine’ service where we build roadmaps for customers and the first step of that is starting with ‘What are the business opportunities?’ NASDAQ OMX, as Ann talked about, has all these different business units that create value in the enterprise, and each of them had different areas where they could get value from data. It might be an internal use case, it might be a supply chain – customer collaboration use case – it might be creating new offerings around analytics in the market place. We started with talking to the different units, understanding their needs, and match that to roadmap of how to roll out the capabilities in Fin Cloud and beyond for Big Data.

BARB DARROW 07:47

That’s a huge theme from this show, throughout the two days. There’s been a really interesting meme about how you need to talk to the people before you roll out the solutions, you need to know the problem you’re addressing – we’re beyond the hype stage, where things are getting deployed so you have to know the problem. It sounds like that’s what you – came in and interviewed people and–

RON BODKIN 08:11

Absolutely. Our method is we want to help organizations get measurable value from these technologies and have a good sense of what they need to invest and what they’ll get back, and so that definitely means talking to the business units, understanding business priorities, and coming up with a strategy for how to use them where you get organizational buy-in, you have access to the data, you’ve got a plan that can be executed on quickly to get results.

ANN NEIDENBACH 08:40

I would say one of the interesting parts about the financial community is – and it depends on the country. In this country you have to save everything for seven years. Could you imagine saving the opera feed in all the US exchanges and the options data? We’re talking, over a five-year period, that’s probably ten petabytes worth of data. It’s a ridiculous amount of data. If I look back – just our tier four storage, just what we have at NASDAQ – that’s at least three and a half petabytes worth of data, and we’re talking about back-ups. We’re not even into the whole email and what you have to capture in terms of instant messaging. You start to multiply every single bank and financial institution that has SEC oversight, that’s a tremendous amount of data. That’s why we launched into this whole partnership with AWS. Obviously – there’s some storage vendors out here, they probably don’t like to hear this – but you start to leverage the public cloud and you’re getting a scale where storage is so much cheaper per gigabyte. Now, let’s go into the cloud. I’ve got my emails, I’ve got my IM, I’ve got my trade data, I’ve got my customer account information – how do I make sure that that’s protected? It’s one thing to have Netflix on it. It’s another thing to have customer account information. There’s a new initiative – it was just announced last month and there’s an RFP out on the street for the consolidated audit trail – in which the SEC has mandated that all the SRO’s – NASDAQ, NYSE, BATS, Direct Edge and all the options exchanges – that we need to provide a data repository for all of the trading data, all the order data – the life of the order, from me buying shares in the stock through execution – as well as my account information. That’s just going to be a massive effort. It’s a journey. It’s not just a project. We’ll be doing that over the next couple of years. It’s just an incredible amount of data. Trading is good, it’s prolific. There’s over 50 lit and dark pools here in the US. So the amount of data is incredible.

RON BODKIN 10:52

Having the ability to have that in a public cloud where the different industry participants can have access to that data and that value is a big difference – a big value-add – versus having it locked away somewhere in some private infrastructure where none of the other participants can do anything useful with the data.

ANN NEIDENBACH 11:08

That’s really the tipping point where we are, at least in the financial industry. We have a lot of data, we haven’t quite figured out what to do with it yet. Yes, we have our analytic tools, whether we’re doing pre-trade analytics or post-trade, and looking at transaction cost analysis and what-not, but we can do so much more with the data. When I think back to some of the products that we have – the trading systems, the clearing systems that we have – we have risk tools on that, but it’s monitoring real-time risk, if I can say that, pre-trade and post-trade risk. We’re not even taking advantage of that data and looking at trends. With our surveillance tool, we’ve got great graphics, great front-end, heat lamps and all that, but a lot of our customers want to do a lot of what-if analysis and we haven’t really built that tool ecosystem around that to give them the ability to look at that, to look at the trends. I was at an event the other night – the opening event. I met a couple of folks, we were talking about analytics – analytics versus visualization – and the trending questions and I think that’s where the financial industry can start to leverage those types of technologies: looking at the data, what kind of questions can be asked?

BARB DARROW 12:22

Can you talk a little bit more about the Amazon partnership? The thread you always hear is that, financial services, you can’t put anything in a public cloud. Clearly it’s not a hard and fast rule. Talk about what goes into this cloud, what doesn’t, and when other things can go?

ANN NEIDENBACH 12:39

When I look back – I’ve been in the financial industry for years, I started when I was 10. No I’m joking. We had the boxes that went over into the mountain, let’s just say that. You saved every piece of paper and you saved for seven years. As technology evolved, we went into tape drives and then, to be honest with you, you went into different tiers of storage – really cheap storage to expensive. What the cloud has really opened up to us is the opportunity to leverage an infrastructure that is there for other reasons that we can harness in the financial community. What do we have to store? We have to store running the business, the back-ups, that’s kind of easy. Any kind of syslog information. We have to store emails. Seven years. Emails. I worked at Citi. Could you imagine? 330, 000 employees worth of emails on a daily basis. That’s crazy. All IM’s have to be stored.

ANN NEIDENBACH 13:43

Then we have the trading data. You have to keep not only the trading data of what’s been executed, you have executed, but all of the market data, and we’re just talking about terabytes worth of data when you think of all of the options, depth feeds as well as the equities feeds, and layer on top of that the account information, which is really the PII type of data, very sensitive data that we have to make sure is protected. We’re looking at AWS and our partnership there, where we are a financial infrastructure, we have looked at it from an InfoSec perspective, vetted it, in terms of we’re comfortable putting our data in it, and then working with our customers and with our trading partners in terms of having them have a level of comfort putting that data in. One of the requirements from the SEC is that it has to be WORM-compliant. As part of this, we have an R3 offering in which we meet that SEC requirement in putting that data in the cloud.

RON BODKIN 14:47

By having that – talking about the earlier question around the new analytics – you’ve got that ability to store that data in a way that you can meet the requirements around appropriate governance and access to the data, it can open this world of better analytics where instead of having to pre-design specific questions you can allow people to ask questions of the data and spend a lot more time getting insights into problems that weren’t–

BARB DARROW 15:11

Doing what-if analysis.

[cross-talk]

RON BODKIN 15:16

… and ultimately building predictive models.

BARB DARROW 15:18

I just want to say, if anyone has any questions please come to the mics because we’d love to have questions, don’t be shy, but we’ll keep on here until we see someone. Can you talk a little bit about surveillance again, and also whether the Glide acquisition plays into that, the social media information along with your other data and–

ANN NEIDENBACH 15:37

Sure. Surveillance in its traditional path is you take in the market data, whether it’s direct feeds from the exchanges or a market data vendor, and then you have also all of the transactions. We sell SMARTS Broker. We have probably 60 different broker clients all over the world in which the compliance officers are monitoring that trading, and they’re looking for inside trading, any kind of misbehavior, wash trades, anything like that. It’s really just the trading data that we’re looking at. Now as we have this product called Glide – our company called Glide – that’s doing a lot of the trending analysis of what’s happening in social media, on Facebook, on Twitter, we can start to put that in. That’s just another data feed for us. Oh, we know that there is something going on at, let’s just say Apple – was there a lot of Twitter activity and, oh, by the way, this trader on the desk, he was having some unusual trading activity before the announcement. Our tools are as good as the data that we get, and we love data. Whether it’s electronic media, machine-readable news – which is another investment that we have – sentiment analysis from this social media monitoring, this is all data feeds. We bring that in and it makes our compliance tools smarter.

BARB DARROW 17:07

I think we have a question right here.

AUDIENCE MEMBER 1 17:10

What would you say are the biggest challenges and biggest opportunities for the financial market in this sphere?

ANN NEIDENBACH 17:16

The large data sets are daunting. I was at a bank for a few years and they were talking about massive amounts of storage that is required to retain trade data that you may not ever see again unless the regulators come in and do an audit. It’s an expensive proposition. It’s a large cost to a lot of these banks. It’s a large cost to the SRO’s as well. What we see is very attractive are these public cloud offerings. We’re able to leverage relatively inexpensive opportunities. What’s scary about that is that it’s public cloud offerings – this is very confidential data, highly regulated data and data that we need to feel protected. So I think it’s the balance between trying to find the most cost-effective way to manage all this data while making sure we have a secure solution. That’s the background and the proposition with Fin Cloud.

RON BODKIN 18:22

These capabilities in the cloud where you could have from something like Amazon Glacier, which is cold, cheap storage through to RON BODKIN and up to HDFS, we might have a Hadoop system that gives you relatively quick access to data in batch, even up to a Cassandra for real-time access or a Storm. You’ve got this range of technologies which give you different trade-offs for how to create value, whether you’re in real-time, instant-response or more analytic, analysis phase.

BARB DARROW 18:54

One more question.

AUDIENCE MEMBER 2 18:57

There’s this idea from Eric Raymond: all bugs are shallow – if you have enough eyes, the bugs become shallow. That’s the Linux principle. With all this data in a public cloud – set aside the stuff that’s confidential – do you think the customers you’re going to have, or the use-cases, or – is there a long tail to this data that might be unlocked. I hear you saying that, ok, we’re making this acquisition, we’re taking this initiative, we have an ecosystem, but it sounds like the players are the same players they’ve always been. For example, the different participants in this ecosystem sound like they’re established financial firms. Is there space, once you have this data in a public cloud, for the quote-unquote ‘public’? Is there some opportunity there?

ANN NEIDENBACH 19:51

That’s where I think Ron was going, and I’ll let him jump in. I do think that there’s ample room for innovation in this space, in terms of looking at that data and seeing what are the trends, what are the opportunities, what is the data telling us? We’ve been maybe a little old-fashioned in that we’ve captured the data, we look at the data for our needs, but a lot of that data’s public. Once it’s pushed out on a data feed it’s very public and anybody can have at it. How can I use that for backtesting, for my algos? And it’s not just the market data. Blend it in with the social media trends, blend it in with machine-readable news, in all different types of feeds, I think you’re going to see different trends that people can take advantage of that data.

RON BODKIN 20:44

It really does democratize access. Having a common infrastructure in a cloud environment where best-of-class tools are available and different data sets can be accessed in a raw form or in a processed form changes the equation, so that smaller players have more access to data. They don’t have to own a supercomputer in order to crunch the data – they have access to one, rented by the hour.

BARB DARROW 21:09

I’m sorry, we’re out of time. Thank you very much. Have a great last session.

firstpage of 2
  1. Kazuya Mishima Friday, March 22, 2013

    With many utilities facing the task of storing petabytes of smart meter data for as long as seven years in order to satisfy regulatory requirements, the ability to house and leverage the massive load of data accumulating from the smart grid is a significant IT challenge….http://bit.ly/YHCpQp

    Share

Comments have been disabled for this post