Now more than ever, adequate cybersecurity protection must be an organizational priority for the financial services industry. Cyber attacks, including ransomware, are becoming more sophisticated, and industry regulations increasingly require more precise reporting on how data is being used and stored (and detailed information about data breach response), data retention and protection, and enhanced network security. Each of these high-impact issues can benefit significantly from an unsupervised machine learning cybersecurity platform.
Dr. Igor, Chief Scientist and CTO at MixMode explains:
Over the last decade, advancements in machine learning, including supervised and reinforcement, have transformed the technology behind everything from photo recognition to self-driving cars.
Still, supervised learning is limited in its network security abilities like finding threats because it only looks for specifics that it has seen or labeled before, whereas unsupervised learning is constantly searching the network to find anomalies.
Supervised learning is simply inadequate for complex, linked networks common to financial services enterprises. These networks often include a mix of on-premise and cloud-based infrastructure, legacy equipment and physical hard drives that contain extremely sensitive financial and personal data. The right unsupervised machine learning platform can seamlessly monitor network traffic across complicated networks.
Labeling vs. learning
Supervised learning relies on a process of labeling in order to “understand” information.
The machine learns from labeling lots of data and is able to “recognize” something only after someone, most likely a security professional, has already labeled it, as it can not do so on its own.
This is beneficial only when you know exactly what you’re looking for, which is definitely not commonly the case when monitoring financial services networks. Most often, hackers are using a method of attack that the security program has not seen before, in which case a supervised system would be totally useless.
The benefit of unsupervised learning
This is where unsupervised learning comes in. Unsupervised learning actually draws inferences from datasets without labels. It is best used if you want to find patterns but don’t know exactly what you’re looking for.
This makes it useful in cybersecurity where the attacker is always changing methods. It’s not looking for a specific label, but rather any pattern that is out of the norm will be flagged as dangerous, which is a much better method in a situation where the attacker is always changing forms.
Unsupervised Learning will first create a baseline for your network that shows what everything should look like on a regular day. This way, if some file transfer breaks the pattern of regular behavior by being too large or sent at an odd time, it will be flagged as possibly dangerous by the Unsupervised system.
One primary example of a financial services breach where unsupervised machine learning might have thwarted disaster is the 2019 Capital One data breach, where information from 100 million credit card applications was compromised. In this case, an Amazon Web Services (AWS) employee illegally accessed an AWS storing Capital One data and stole the applications. While the FBI caught the perpetrator quickly, she had already posted the stolen data on GitHub.
A modern unsupervised machine learning platform would have recognized the employee’s unusual network behavior, which no doubt deviated significantly from the expected norm.
A supervised learning program will miss an attack if it has never seen it before because it hasn’t yet labeled that activity as dangerous, whereas with unsupervised learning security, the program only has to know that the action is abnormal in order to flag it as a potential threat.
Generative and discriminative models of unsupervised learning
There are two types of unsupervised learning: discriminative models and generative models. Discriminative models are only capable of telling you, if you give it X, then the consequence is Y. Whereas the generative model can determine the total probability that you’re going to see X and Y at the same time.
So the difference is as follows: the discriminative model assigns labels to inputs, and has no predictive capability. If you gave it a different X that it has never seen before it can’t tell what the Y is going to be because it simply hasn’t learned that.
With generative models, once you set it up and find the baseline you can give it any input and ask it for an answer. Thus, it has predictive ability — for example it can generate a possible network behavior that has never been seen before.
So let’s say some person sends a 30 megabyte file at noon, what is the probability that he would do that? If you asked a discriminative model whether this is normal, it would check to see if the person had ever sent such a file at noon before… but only specifically at noon.
A generative model would look at the context of the situation and check if they had ever sent a file like that at 11:59 a.m. and 12:30 p.m. too, and base its conclusions off of surrounding circumstances in order to be more accurate with its predictions. Again, the 2019 Capital One breach (among many others) could have been wholly avoided with a generative network security platform.
How MixMode uses generative unsupervised learning
The artificial intelligence that we are using at MixMode now falls within the class of generative models in unsupervised learning that essentially gives it this predictive ability. It collects data to form a baseline of expected network behavior to predict what will happen over time because of its knowledge of what a day of the week looks like for the network.
If anything strays from this baseline, the platform will alert whichever security team oversees it that there has been an irregularity detected in network performance that should be adhering to the baseline standard.
For example, It collects data as it goes and then it says I know what’s going to happen on monday at 9 a.m.: People are going to come in and network volume will grow, then at noon they gonna go for lunch so the network level will drop a bit, then they’ll continue working until 6 p.m. and go home and the network level will go down to the level it is during the night.
Because of its predictive power, the generative unsupervised learning model is capable of preventing Zero-Day attacks, which makes it the best security method out there and has the fastest response time to any breach.
Active learning is the future
MixMode plans to add semi-supervised or Active Learning to the platform in the near future, which takes the best of both unsupervised and supervised learning and puts them together in order to make predictions on how a network should behave.
Active learning starts with unsupervised learning by looking for any patterns on a network that deviate from the norm, then once it finds one it can label it as a threat, which is the supervised learning portion.
An active learning platform will be extremely useful because not only is it constantly scanning for any deviations on the network, but it is also constantly labeling and adding metadata to the abnormalities it does find which makes it a very strong detection and response system.