Forget in-memory — SiSense raises $10M for in-chip analytics


While the rest of the world is agog about big data and in-memory analytics, SiSense is taking a different tack. It’s rethinking business intelligence with higher-speed analysis on smaller (relatively speaking) data sets by taking advantage of multicore, 64-bit processors. The approach has been paying off with some impressive customer uptake, and on Wednesday SiSense announced a $10 million series B funding round from Battery Ventures along with Opus Capital and Genesis Partners.

Technologically, SiSense is trying to split the difference between just about everyone else doing analytics — expensive full-stack business intelligence vendors such as Oracle (s orcl), Microsoft (s msft), IBM (s ibm) and SAP(s sap); big data and data warehouse vendors pushing massive scale databases; and next-generation, visualization-centric vendors such as Tableau and QlikView (s qlik). It’s fast, it’s has its own columnar database and HTML5 visualization technologies, can scale comfortably up about 100 terabytes, and is designed for business users rather than advanced data analysts.

SiSense’s secret sauce is a processing architecture built for speed even on small machines such as laptops. According to CEO Amit Bendov, the company’s product, called Prism, can handle a terabyte of data on a machine with 8GB of RAM because it relies primarily on disk for storage. Data is only moved to RAM as necessary, and then Prism uses vectorization and optimized instructions (that do one thing only, but do it across all the data that fit the query) to handle as much work as possible in parallel on the processor.

Source: SiSense

Source: SiSense

“We say in-memory is not the future, it’s the past,” said Bendov. “We’re already two steps ahead.” Using Hadoop or Teradata for a handful of terabytes, he added, is overkill, “like driving a Humvee to the grocery store.”

Eldad Farkash, SiSense’s co-founder and CTO, uses a different analogy — that of buying beer — to explain the technology’s underlying rationale. Latency to the CPU from the processor’s L1 cache is like grabbing a beer from the refrigerator, whereas using the L2 or L3 cache is like riding a bicycle to the corner store. RAM is the equivalent of driving a car to the grocery store, and accessing data from disk is like going to the brewery itself. Prism knows it will have to go to the grocery store, but it gets as much beer as possible from the fridge and corner store first.

dashboard-imgOnce users are actually in the product and analyzing data, it’s a drag-and-drop experience to connect various data sources and points (although custom SQL is allowed, too). The actual analysis window features a canvas that can display numerous widgets (e.g. pivot tables, charts or dashboards) at once.

Bendov said SiSense’s revenue grew 520 percent in 2012 and its notable customers include Target, Merck, Samsung and Cisco. The new investment will be used primarily to bolster the company’s sales and marketing efforts — which thus far have been largely relegated to in-bound inquiries — and to support customers in different geographies (the company is based in Redwood Shores, Calif.). “Now’s the time to add oil to the fire,” he explained.

As impressive as it all sounds, though, SiSense’s biggest challenge might well be getting noticed above the fray that is the analytics space right now — especially among more well-known and arguably future-proof vendors and technologies. That said, being a low-cost option that users like and that actually works has proven remarkably effective in an era of cloud computing and bring-your-own-device, and SiSense appears to racking up users at a pretty rapid clip.

Any product that can prove its worth initially with the people who have to use it stands a good chance of sticking around and becoming a permanent part of IT budgets for years.

Feature image courtesy of Shutterstock user Iscatel.


Vectorwise User

How is SiSense different from Actian Vectorwise?They have been using Vector Processing and In-Chip / In-Cache for years and have the benchmarks to prove the performance benefits. Seems like SiSense is using the same features and analogies (e.g. the beer analogy). BTW, it is a LOT faster and less expensive than SAP HANA.

B Glazman

Sounds like hocus-pocus. the speed of the disk will never be fast enough to permit user to interactively analyze the data.

Rob Klopp

There are some marketing semantics here… The amount of cache on a Xeon processor is small… 30MB of L3, 2.5MB of L2, and 256K of L1 on the highest end Xeon E7-8870 (see

Since a single analytics user drinks gallons of beer per second… and 100 users drink hundreds of gallons per second… the refrigerator is hardly adequate as is the bicycle… and you end up driving all of the time… in other words Sisense is really an in-memory DBMS (after all… every DBMS and every program ultimately gets data from the L1 cache/refrigerator every time).

Further, you need to go to the brewery for 100TB every time… and you need a big truck or lots of trucks running in-parallel to keep up. So the idea that this is in-chip and comfortably supporting 100TB is hard to rationalize unless Sisense is also a multi-node, shared-nothing, scale-out DBMS.

Finally, HANA works hard to load data into cache in full cache lines and to use SIMD instructions (vectorized and optimized)… so Sisense feels like HANA R01. This is very cool… but the “Forget in-memory… in-cache” headline is pure marketing.

Rob Klopp

Could you change the last sentence from “in-cache” to “in-chip”, please? I cannot edit it in moderation… Thanks – Rob

Eldad Farkash

Hi Rob,

since you used a technical hypothesis on a marketing article, I’ll try to give some details on why “in-chip” is precisely the term to describe what we do:

With “in-chip”, we mean 4 things:

1. Having a query kernel that is both vectorized & cache-aware
2. Having a JIT compiler that converts SQL into c/asm code
3. decompress data in-cache to save memory bandwidth and avoid roundtrips between RAM & CPU
4. apply columnar format not just for storage, but for re-using intermediates between query operators

finally, a few comments of my own:

1. There is nothing “hard” about loading “full cache lines”.
You always load full “cache lines” into the cache, thats just how CPU caches work

2. SIMD is trivial, its cache-awareness thats complex
The way you slice arrays into vectors inside the CPU cache is extremely complex because it must happen in real time, its based on the actual query expression, the available memory bandwidth, and the instruction set version of your CPU.

3. You can use SIMD intrinsics as much as you like, but you’ll still get a lousy performance out of your in-memory database because of the excessive branching and software abstractions that db’s are so hooked on. If your CPU can’t predict, it can’t prefetch, and your “fridge” will be empty most of the time, waiting for the trucks to arrive with “yet another single bottle of beer”.

4. SiSense is in-memory because it utilizes the RAM as much as it can, but it also uses compression & vertical fragmentation to keep ALL data on disk. Unlike most databases, we don’t use a page buffer but rather memory mapping. Aside from the raw performance in streaming data from disk, for us, keeping the CPU on actual query operators instead of buffer management is crucial.

5. Every program gets data from L1 cache, but most of them are not cache aware. The difference between having the data in cache when you need it, and waiting for it to be fetched from RAM is x50-x100 times slower.
the beer analogy was used to describe exactly that…

I hope this gave you some more details into the abstraction we used to explain the technology.

– “Software abstraction is beautiful, but there’s nothing beautiful about slow software”

Comments are closed.