The need to make better sense of markets is paramount to the way businesses are run and decisions are made. We see this with the proliferation of online services that allow us to better gauge and understand our respective markets.
If I think of it from an engineering perspective, one case might be the type of browsers we should plan to support. One such way would be to use a service like W3Schools. Is IE6 dead yet?
Interestingly enough, there is very little data on web attacks and exploits – specifically things like the type of browsers they use or operating systems they leverage. Being that this information is of value to us, I decided to dive into our data.
Attack Browser Statistics
Note: Browser statistics can never be 100% reliable since the user agent is easily spoofed. We mixed it with pOf (passive OS fingerprinting) by analyzing the TCP stack to try to filter them better.
Top Browser versions:
These top eight user agents account for 80% of all malicious traffic we blocked during the 30 days we analyzed. It includes SQL injection, brute force attempts, and a variety of other exploit attempts.
What’s really interesting from this data is that nearly a third of the exploit tools make no effort to set the user agent (i.e., 29% of the attacks had no user agents set). That is followed by MSIE/6, which is also a common browser “emulated” (faked) by exploit tools. When you combine these two, you have close to 50% of the user agents used by attackers and their exploit tools.
GoogleBot is also relatively high, but that makes sense as it’s used to distract a webmaster by making it appear as a legitimate request by Google.
Other less popular mentions:
I did want to mention WordPress, Java, PHP, Bingbot and Perl (libwww-perl), as they were close enough to the 1% mark. Most of these happen when the exploit tool is not modified to change the user-agent or when it is using a specific platform as a middleman (in the case of WordPress). While they account for a small percentage overall, it does show how attackers could use out-of-the-box exploits.
Attack Operating Systems Statistics
The operating system specified in the user agent is also something we can look at to gain more insights. In our review, a very large subset (approx. 45%) were set to Windows-based devices.
Only a small percentage were set to Linux, Mac, and iOS devices. Less than 50% of the requests did not specify an operating system.
Interestingly enough, when we look at the passive operating system fingerprinting, it paints a very different story:
We see a big jump in the use of the Linux OS, which I presume accounts for a very large percentage of the “undefined” market above. In fact, when analyzing via TCP fingerprinting, Linux is right up there with Windows OS devices.
Attack Geolocation Statistics
We can’t talk about attacks without spending some time on where those attacks came from. There are misconceptions that attacks only come from red-flag countries and by blocking them you’re now safe, or that one can quickly identify and block based on location. The data below provides better insight.
Let’s look at the stats:
The majority of attacks come from the United states, followed by Indonesia, China, and Canada. In fact, California alone accounts for 11% of all attacks – more than any other country.
So, even though a partial geo-blocking may be effective as a noise reducer, it won’t really stop most attacks, unless you are willing to block the USA.
What Does the Data Teach Us?
This data shows us that attacks are very diverse. You can’t just block attackers using one specific bit of data without looking at the complete picture.
Second, user-agents cannot be trusted and can be deceiving. Do not base decisions solely on that data.
Lastly, sometimes what we think we know is further from reality than we realize.