Browser/OS Stats from Half Billion Blocked Exploit Attempts

The need to make better sense of markets is paramount to the way businesses are run and decisions are made. We see this with the proliferation of online services that allow us to better gauge and understand our respective markets.

If I think of it from an engineering perspective, one case might be the type of browsers we should plan to support. One such way would be to use a service like W3Schools. Is IE6 dead yet?

Interestingly enough, there is very little data on web attacks and exploits – specifically things like the type of browsers they use or operating systems they leverage. Being that this information is of value to us, I decided to dive into our data.

Attack Browser Statistics

We parsed through 30 days of traffic metadata, analyzing over 480,000,000 blocked requests via the Sucuri Firewall.

Note: Browser statistics can never be 100% reliable since the user agent is easily spoofed. We mixed it with pOf (passive OS fingerprinting) by analyzing the TCP stack to try to filter them better.

Top Browser versions:

Sucuri-WebAttackBrowserDistribution

These top eight user agents account for 80% of all malicious traffic we blocked during the 30 days we analyzed. It includes SQL injection, brute force attempts, and a variety of other exploit attempts.

What’s really interesting from this data is that nearly a third of the exploit tools make no effort to set the user agent (i.e., 29% of the attacks had no user agents set). That is followed by MSIE/6, which is also a common browser “emulated” (faked) by exploit tools. When you combine these two, you have close to 50% of the user agents used by attackers and their exploit tools.

GoogleBot is also relatively high, but that makes sense as it’s used to distract a webmaster by making it appear as a legitimate request by Google.

Other less popular mentions:

WordPress: 0.9%
Java: 0.8%
BingBot: 0.8%
PHP: 0.5%
Perl: 0.4%

I did want to mention WordPress, Java, PHP, Bingbot and Perl (libwww-perl), as they were close enough to the 1% mark. Most of these happen when the exploit tool is not modified to change the user-agent or when it is using a specific platform as a middleman (in the case of WordPress). While they account for a small percentage overall, it does show how attackers could use out-of-the-box exploits.

Attack Operating Systems Statistics

The operating system specified in the user agent is also something we can look at to gain more insights. In our review, a very large subset (approx. 45%) were set to Windows-based devices.

Sucuri-OperatinsSystemDistribution

Only a small percentage were set to Linux, Mac, and iOS devices. Less than 50% of the requests did not specify an operating system.

Interestingly enough, when we look at the passive operating system fingerprinting, it paints a very different story:

Sucuri-OS-Distribution-Passive

We see a big jump in the use of the Linux OS, which I presume accounts for a very large percentage of the “undefined” market above. In fact, when analyzing via TCP fingerprinting, Linux is right up there with Windows OS devices.

Attack Geolocation Statistics

We can’t talk about attacks without spending some time on where those attacks came from. There are misconceptions that attacks only come from red-flag countries and by blocking them you’re now safe, or that one can quickly identify and block based on location. The data below provides better insight.

Let’s look at the stats:

Sucuri-DistributionAttacks-Geolocation

The majority of attacks come from the United states, followed by Indonesia, China, and Canada. In fact, California alone accounts for 11% of all attacks – more than any other country.

So, even though a partial geo-blocking may be effective as a noise reducer, it won’t really stop most attacks, unless you are willing to block the USA.

What Does the Data Teach Us?

This data shows us that attacks are very diverse. You can’t just block attackers using one specific bit of data without looking at the complete picture.

Second, user-agents cannot be trusted and can be deceiving. Do not base decisions solely on that data.

Lastly, sometimes what we think we know is further from reality than we realize.

3 comments

Michael Starks says:
July 27, 2016 at 6:59 pm
This is interesting data. In my organization, I have profiled the legitimate user agents in our company and defined alerts for those that fall out of the norm. This can lead to identifying stuff like cryptoware. Of course, it’s a lot easier when you have control of the environment. The Internet is another story.
1. Daniel Cid says:
  July 28, 2016 at 2:49 pm
  Oh yes, looking inside->out is very useful. I used to do that with squid/proxy logs and find tons of interesting stuff (from malware to compromised servers).
MarkSeifert says:
August 2, 2016 at 5:56 am
It’d be interesting to be able to contrast the os/browser profile of attacks coming from within the US vs attacks coming from other locations such as the locs noted as the top “attack countries” in the cloudproxy settings. Come to think of it, it’d be useful to see some stats on attacks as a percentage of total traffic volume from each country (or more practically, a top list of that).
Historically, geo was always more about spam control than it was about mitigating other (and typically more malicious) types of attacks, though I have a general sense that patterns in that area have shifted dramatically over the years for several reasons (not the least of which being the idea that you’re more likely to use a facebook group or other 3rd party service rather than a vbulletin install these days if you want to build up some community discussion).
In any case, interesting post.