How Would a Distributed SIEM Look?

SIEMs have been the main workhorse for security operations centers, constantly scaled up throughout the years to accommodate the increased volume of security data. But instead of buffing a single horse to cope with this workload, can we distribute it across multiple horses?

At GigaOm we’ve been following this space for several years now, and as I’ve been researching the space for the third iteration of the Radar Report, I came across the same challenges and narratives from vendors, which boil down to “do more with less”. 

That is: more logs, more threats, more integrations, with less time needed to solve incidents, less tolerance for undetected events or false positives, and fewer analysts needed to analyze incidents. This trend will continue. IT systems are only getting more complex and the attack surface continues to increase. 

An IBM study found that it took an average of 277 days—about 9 months—to identify and contain a breach. So, SIEMs need to store data for roughly one year to support threat hunting activities. 

As a first, obvious response, vendors are facilitating more storage. Cloud Data Lakes are a cheap and scalable option to do this, and appear to be increasingly common.

A second, just as obvious response, entails SIEM vendors increasing the efficiency of their solution to detect threats faster and automate as many workflows as possible. To do this natively, you must bring in outside capabilities. Low-hanging fruit are SOAR, UEBA, and XDR. SOAR, for example, was essentially a response to resolving SIEM’s inefficiencies. SOAR capabilities within SIEM make sense—automate response processes inside the box.

However, log ingestion and alert curation is still a core SIEM function, regardless of how many more features you cram under one roof. Integrating other tools’ capabilities in SIEM is a good solution right now, but tackling billions and trillions of logs, with or without ML, would simply become inefficient from a compute, networking, and storage point of view. It will become virtually impossible to manage a distributed environment with a centralized solution.

Historically, when solutions become too large and bulky to manage, we’ve seen improvements moving towards a distributed architecture that can support horizontal scalability.

Can we do the same to a SIEM? How would it look? I imagine it as follows :a centralized management plane or orchestrator will control lightweight, distributed SIEM agents deployed across different log sources. Each agent will collect and store data locally, correlate and identify suspicious activities, and use alarm rules defined specifically for the types of logs it is analyzing.

OpenText’s ESM has first announced a Distributed Correlation feature as far back as 2018. In essence, enterprises can add multiple instances of correlators and aggregators that run as individual services and distribute the correlation workload across these services. 

Instead of just distributing the correlation engine, we can imagine the whole solution and its components in lighter deployments, which include log ingestion, storage, filtering, alert rules and the like, perhaps even specialized for a specific type of event source. For example, we can have SIEM agents solely responsible for employee devices, network traffic, server logs, end-user web applications applications, and so on. Or, have agents dedicated for cloud environments, on-premise deployments, or colocation facilities.

Let’s not forget that one of the main selling points of SIEMs is the aforementioned correlation feature, which entails making obvious or non-obvious connections across multiple data sources. Here, the orchestrators can coordinate correlations by pairing only relevant information from different sources. These can be filtered for something as basic as timestamps, be guided by pre-trained ML algorithms, or leverage the MITRE ATT&CK framework for common patterns. 

There’s a lot of engineering and ingenuity required in scaling systems, and all vendors are scaling up to accommodate hundreds of thousands of events per minute in one way or another. If current developments are helping to scale SIEM systems incrementally, a new architecture could help accommodate future ingestion requirements. When centralized systems cannot accommodate, perhaps a distributed one should be considered.