Generative Unsupervised Learning vs. Discriminative Clustering Technology: Which Prevents Zero-Day Attacks?

Ana Mezic

Unsupervised Learning is becoming more widely known among cybersecurity professionals as the way of the future, with Supervised Learning’s limitations being a bit too vulnerable for the advanced hacking methods we’re starting to see.

MixMode is leading the pack on the Unsupervised Learning front, according to conversations with several analysts from firms in the security and IT space who recently interviewed CTO Dr. Igor Mezic.

However, there are other companies out there claiming to do advanced Unsupervised Learning when all they’re really managing is discriminative models, not the Generative method that makes network anomaly detection possible.

Knowing the difference between Discriminative and Generative Unsupervised Learning can tell you a lot about the effectiveness of a cybersecurity solution’s artificial intelligence, for example, whether or not that security solution can perform actions like identifying and stopping a zero-day attack.

Generative and Discriminative Models of Unsupervised Learning

There are two types of Unsupervised Learning: discriminative models and generative models.

Discriminative models are only capable of finding the probability of X being there if they have a specific Y, whereas the generative model can tell you the total probability that you’re going to see X and Y at the same time.

So the difference is as follows:

Discriminative model assigns labels to inputs, and has no predictive capability. If you gave it a different X that it has never seen before it can’t tell what the Y is going to be because it simply hasn’t learned that.

With generative models, once you set it up and find the baseline you can give it any input, ask it for an answer, and it will provide that answer. Thus, it has predictive ability.

For example, it can generate a possible network behavior that has never been seen before.

Let’s say it’s noon and a person within the network sends a 30 megabyte file. What is the probability that he would do that? And would the discriminative model or generative model know whether that was anomalous?

If you asked a discriminative model whether this is normal, it would check to see if the person had ever sent such a file at noon before… but only specifically at noon. A generative model would look at the context of the situation and check if they had ever sent a file like that at 11:59 a.m. and 12:30 p.m. too, and base its conclusions off of a number of other surrounding circumstances in order to be more accurate with its predictions.

Among many of the Network Security Monitoring tools available to the industry today that tout “Unsupervised Learning,” almost all of the tools use the discriminative model. This limits their ability to think contextually and can cause issues beyond missed potential threats like an over abundance of security alerts.

Generative Unsupervised Learning

The Artificial Intelligence that we are using at MixMode is in the class of generative models in Unsupervised Learning, that basically gives it this predictive ability. It collects data to form a baseline of the network and will be able to predict what will happen over time because of its knowledge of what a day of the week looks like for the network.

If anything strays from this baseline, the platform will alert whichever security team oversees it that there has been an irregularity detected in network performance that should be adhering to the baseline standard.

For example, It collects data as it goes and then it says ‘I know what’s going to happen on monday at 09:00. People are going to come in and network volume will grow, then at 12:00 they will go for lunch so the network level will drop a bit, then they’ll continue working until 18:00 and go home and the network level will go down to the level it is during the night.’

Because of its predictive power, the Generative Unsupervised learning model is capable of preventing Zero-Day attacks, which makes it the best security method out there and has the fastest response time to any breach.

Advances from Clustering and the Bayesian Method

Clustering is a procedure in which the data is sorted out in different groups. Let’s say you measure heights, and in an unsupervised way, without actually saying whether this person is short or tall, you can cluster them into groups just by looking at the numbers. You’re not looking at anything but the data, so when the next person comes you could assign them to a group without putting a short and tall label. However, this is inadequate for network security, because there are too many attributes out there and the data is dynamically changing quite a bit.

Just because network traffic may be really low in the night, doesn’t mean that it is completely harmless. So the way we pursue this is not really clustering. We have a generative model which knows the dynamics that could be happening in the next five minutes. There is no clustering involved but there is a separation of different modes of behavior. We have the normal, day-to-day behavior of coming in at 09:00 and leaving at 17:00, and we have the random element of emails sent arbitrarily throughout the day.

“Our AI is able to adaptively learn that, which is very different from the Bayesian Learning other companies are doing while they are claiming to do Unsupervised Learning,” said Dr. Igor Mezic. “Our approach is quite new. It has never been deployed in this form of network security, and there is really no one out there that utilizes a generative model of unsupervised learning like we do.”

The basic Bayseian idea is that you base the probability of something happening off of an observed history of that thing. However, that doesn’t account for the random elements like arbitrary email times for network security, so it is an ineffective method.

At MixMode, we have an equation that tells instead of just looking at the past, will use the context of the past to create an image of what the future 5 minutes should look like, including accounting for all those random elements too.

This makes it a much more accurate picture of whether what was supposed to happen is actually happening.

“We have the ability of prediction while they don’t,” Mezic said. “We have a generative model, it’s actually going to tell you what the network should do, because it can see the future. It’s not just going to guess what the future should be because it has seen the past.”

Signup for the MixMode Wave Newsletter

Your Monthly Resource for the Latest News, Events and Resources