Deep Dive: How much time do security teams spend labeling with Supervised Learning?

Virtually every industry was affected when the coronavirus started to spread across the world in early 2020. Within a few months, the situation quickly devolved into a full-blown pandemic. Adverse effects caused by the virus were coming into focus. 

For the cybersecurity industry, the impact was immediate and severe. All at once, a vast portion of the world’s workforce switched to remote work arrangements. Suddenly, networks everywhere were hit with thousands of unexpected remote connections from thousands of unknown devices. 

It was a security analyst nightmare — and a hacker’s dream come true

What Went Wrong?

Conflicting Priorities and a Lack of Cybersecurity Talent

First, there was the ever-pressing issue of the cybersecurity talent shortage. There weren’t enough people on hand at many organizations to quickly implement contingency plans and mitigate new network vulnerabilities, and the cherry on top? Budgets were being slashed quickly and drastically.

Many CISOs and SecOps teams were faced with a gut-wrenching choice: addressing the operational challenges of keeping workers connected, or shoring up vulnerabilities before hackers exploited them. Both options involved time-consuming, repetitive, manual work

With sufficient staff, these would be equal priorities. Ultimately, companies can’t survive without focusing on both operations and cybersecurity. But without enough staff, they had to prioritize one or the other. 

To no one’s surprise, management at most organizations directed security teams to worry about operations now, security later. It’s par for the course. A recent Gartner survey revealed that as few as 30% of organizations take cross-organization steps to drive a business-led approach to digital risk. 

By mid-summer, a few companies were back on track, relieved to have sidestepped a significant security event. Many were not so lucky. In response to a VMWare Carbon Black survey conducted in June, a shocking number of organizations — 91 percent — reported an increase in cyber attacks since their workforce shifted to telework. 

Compounding the issue of conflicting priorities between cybersecurity and operations is the widening economic downturn. Expensive, reactionary measures are not realistic for many companies. They remain vulnerable to phishing schemes or a malware, zero-day, or ransomware attack that could literally destroy their businesses

Security Platform Failure

Incredibly, the multi-million dollar security platforms used by many organizations simply failed in the face of the rapidly changing security environment. The problem, it turns out, was foundational: These “supervised machine learning” platforms were designed around expected network behavior. 

Why Supervised Machine Learning Failed

In the past, machine learning was the gold standard for enterprise networks. The concept is straightforward:

1.   Security teams “teach” the platform what normal baseline behavior should look like by sorting and labeling various data classifications. 

2.   When the platform encounters network behavior that doesn’t match the labels, it triggers an alert or flags data for review.

With this setup, the network is only as protected as it is informed. When it encounters unlabeled data or surprise network behavior, it flags it or triggers an alert. 

Remember that lack of security professionals? Even in the best case scenario, where the flagged data is not a genuine threat, security teams have to spend many, many precious hours on relabeling and threat-hunting, anyway. 

To be safe, they need to check every single flag. That’s a big problem when the platform suddenly flags hundreds or thousands of unusual behavior instances (like the whole company abruptly switching the way they access the network). 

The only people happy in this scenario are the bad actors who have been lurking in the shadows waiting for an opportunity like this for years. 

While security teams are battling a mountain of false positives and relabeling baseline data, many of these cybercriminals are sneaking into networks from all sides. By the time their handiwork is discovered, it’s too late for teams to react in a useful way.

In short, labeling takes far too much time, effort, and human resources for a sector already struggling with understaffing and insufficient resources. By some estimates, security teams spend 80 percent of the time it takes to set up a security system on labeling tasks. 

“The number of people and the amount of time it takes to label everything is massive and just doesn’t make sense when some big change happens,” said Dr. Igor Mezic, MixMode CTO. “By then, the label doesn’t matter anymore because you’re having to relearn everything. With an unsupervised system, it is constantly adapting and learning.” 

To recap, here’s the bad news: All that time spent on labeling can turn out to be utterly worthless in the face of an event like the pandemic. 

Here’s the good news: There is a better way, and it’s called unsupervised machine learning. 

Who Got It Right?

While many enterprises continue cleaning up from their coronavirus-induced cybersecurity disasters, not all organizations wound up in this ongoing, overwhelming scenario. Future-minded enterprises that had evolved their security systems to unsupervised models before the pandemic have fared far better. 

Unsupervised Machine Learning

A platform that uses unsupervised, or self-supervised, AI can predict constantly-evolving network behavior, and therefore behavioral anomalies, in real-time by continually updating the network traffic baseline, free from the limitations of manual data labeling. 

When a workforce abruptly switches to telework, or a power outage takes out a server, or there’s a big layoff, or any other unexpected activity happens, unsupervised machine learning AI can keep up. Security teams become more valuable per hour and can focus their energies on other priorities. 

“It can save you an enormous amount of money to eliminate tedious jobs, like sifting through false positives, that take such a long time for human workers,” said Dr. Mezic. “These tasks can be processed almost instantaneously by unsupervised AI.”

Modernize Your Cybersecurity Approach

Stop wasting time and money on inadequate cybersecurity. Trusting outdated technology to meet the challenge of today’s chaotic environment is a major security event waiting to happen. 

MixMode’s self supervised AI takes seven days to study the network and develop an initial baseline for regular network traffic. Reach out to our friendly client service team and set up a demo today. 

MixMode Articles You Might Like:

Why a Platform With a Generative Baseline Matters

Why The Future of Cybersecurity Needs Both Humans and AI Working Together

Our Q2 Top Cybersecurity Insights

NTA and NDR: The Missing Piece

The Problem with Relying on Log Data for Cybersecurity

The (Recent) History of Self-Supervised Learning

Guide: The Next Generation SOC Tool Stack – The Convergence of SIEM, NDR and NTA

Redefining the Definition of “Baseline” in Cybersecurity