Dependence on Log Data | The Limitations, Hidden Costs, and Additive Nature of SIEM

Christian Wiens Director of Marketing

Christian Wiens is Director of Marketing at MixMode. He has 10+ years of experience as a cybersecurity professional. He has his BA from The University of California, Berkeley and resides in Austin, TX.

This is the second installment of a three-part series on the vulnerabilities of log data dependence. Read part one here.

SIEM and Proprietary Log Data

You may be surprised to learn that log data is proprietary to each security platform vendor. There is no standard format or even a standard labeling mechanism. Your data only has context within the parameters of your SIEM vendor. That’s a big problem, for example, when data needs to be in a standard format to be usable to examine information like an endpoint’s log information.

SIEM developers built out entire infrastructures around mapped data from their own log output to another vendor’s log output, purportedly to marry API information and rules-based information into a comprehensive log output from various systems of record. It makes sense on the surface. Automating this process reduces the manual labor of individual team members.

What that approach ignores, Coulehan says, is that even when there is predefined mapping, it’s “akin to a traditional extraction, transformation, and load routine.” Analysts still need to manually massage, standardize, correlate, and provide additive data to make the intelligence usable. A critical and often intentionally ignored outcome of Log-based correlative analysis SIEM vendors, is that true real-time analysis is simply not possible. Hence you’ll see selective marketing language like “real-time context”, or “real-time search” to distract buyers from the reality that the log data they are analyzing is outdated.

Because log-based approaches require analysts to extract, transform, and marry information from disparate systems into a centralized common repository, your security systems are definitionally post-hoc, delayed and not performing real-time analysis. This has a direct impact on the effectiveness of threat intelligence platforms to perform their primary duty, protecting networks against attacks and cyber threats.

SIEM and Hidden Costs

It’s a given that SIEM cannot function without log data. No company is surprised when the initial cost outlay includes fees for data storage and normalization. What many vendors fail to disclose from the outset, however, is that these costs can catapult as enterprises scale, and require the same data to be stored in multiple repositories, in proprietary formats, to enable the most basic search and investigate functionality. Enterprise data is constantly expanding, which leaves savvy customers wondering, why would I unnecessarily pay to duplicate and store my data in a vendor proprietary format?

Vendors rarely clarify specifics about the costs of hot and warm storage, both of which are required to keep data on hand to feed the SIEM. Typically, SIEM pricing proposals involve an initial licensing fee that seems reasonable. By the end of the first year, however, companies often find they’ve spent three or four times the base license cost.

The Additive Nature of SIEM

SIEM platforms are inherently limited by how much data they can examine. As we’ve explained, the only data a SIEM, on its own, can examine is the data recorded by historical logs.

Vendors recognize this limitation and how it fails to meet even the most basic network security needs for most organizations. The solution, they assure clients, is to sign up for supplemental products that can feed and normalize data from other sources into the SIEM. To address SIEM’s fundamental failings, vendors bolt on additional products to perform network detection and response (NDR) and enterprise network visibility functions.

It becomes increasingly difficult for organizations to break away from an endless cycle where data storage keeps growing and analysts spend more and more time sifting through mounting piles of false positive alerts. To add insult to injury, expensive additive solutions are still inadequate when it comes to true real-time, comprehensive network monitoring.

As Ritu Jyoti, Industry AI and Automation Analyst explains, “Billions are spent on products like a SIEM or SOAR that do not operate efficiently because they are ingesting too much data and an overwhelming number of false positives. ‘Garbage in; garbage out.’”

Can SIEM Even Deliver Real-Time Analysis? (Spoiler Alert: No)

From the start, SIEM systems weren’t designed to deliver real-time insights or to give users a holistic view of how applications interoperate. SIEM delivers information about specific applications after the fact. Analysts can use SIEM-generated data reporting to understand what happened in the past, but only within narrowly defined parameters.

Without a supportive context around the data, one that paints a picture of how data is actually moving through a network, including how it is used and accessed by all the various endpoints that might be connected, network security analysis will always be incomplete. Log-based platforms cannot meet the real-world realities of twenty-first century enterprise networking.

Still, the assumption that log files represent the best sources of information is widely accepted across the Cybersecurity marketplace. Vendors market their products by claiming log files are the only way to deliver network insights in environments where the reality is that some network traffic is out of reach by their platforms. For example, encrypted data and much of the data floating through ultra-high volume environments.

Log files cannot deliver better granularity and visibility than the full network visibility, anomaly detection at the raw signal level, and real-time detection made possible with third-wave AI solutions like MixMode.