Using Machine Learning to Catch Attackers

November 3, 2018 | by Seiji Armstrong

Machine Learning

Hackers use smart tools these days. There is a growing breed of attacks that routinely bypass the web application firewall (WAF), the first line of defense at most internet sites. These attacks appear legitimate to a WAF. So how do you catch a sophisticated web attacker posing as a legitimate client? One answer is to look for inconsistencies in their story.

Each web, mobile, or API transaction tells a story, with the details buried in the many layers of the transaction. A legitimate actor’s many examinable aspects and attributes will all be consistent. A malicious actor, however, will inevitably fall short somewhere when put under enough scrutiny. Like a seasoned bouncer at a nightclub, if you know where to look you can often spot the teenagers with fake IDs.

This requires analyzing many details of the presented story, and exhaustive comparisons across all available dimensions. This can be arduous when using traditional methods.

Cequence Security tackles this by employing algorithms that leverage Machine Learning (ML). We generate a range of ML models that are trained with a diverse collection of web, mobile, and API requests. A simplified explanation of the underlying mechanism is that the ML models are trained to recognize patterns. Stronger than this, the models are designed to inductively learn and to reason. The huge advantage here is the general visibility and scalability of the approach. As our models are incrementally exposed to more traffic in the field they strengthen their reasoning abilities. Each attribute is then automatically cross-examined by the models using logic that has been learned and extrapolated from exposure to all previous traffic.

The ultimate benefit comes when the models make accurate predictions of what a client really is (not what they superficially advertise) without ever having seen this client before. While this may appear remarkable, and counter-intuitive in terms of traditional algorithms, what is happening is akin to a child telling you with confidence that the neighbor’s new pet is not only a dog, but a Labrador, and by the way she’s a puppy. The child has never met the dog before, but has been exposed to many examples of dogs, and not-dogs, and has internally generalized the many features of animals. Our models do the same with internet traffic.

Seiji Armstrong

Author

Seiji Armstrong

Additional Resources