Web Crawler & User Agent Blocking Techniques

From .tk Redirects to PushKa Browser Notification Scam

This is a simple script that allows hackers to block specific crawlers based upon website requests from specific user-agents. This is useful when you don’t want certain traffic from being able to load certain content – usually a phishing page or a malicious download.

if(preg_match('/bot|crawler|spider|facebook|alexa|twitter|curl/i', $_SERVER['HTTP_USER_AGENT'])) {
    logger("[BOT] {$_SERVER['REQUEST_URI']} - 500");

    header('HTTP/1.1 500 Internal Server Error');
    exit();
}

Using preg_match, the script looks for certain known crawler strings in the user-agent. If it finds a match, then instead of serving any website page, it instead reports a 500 Internal Server Error to the detected crawler. It accomplishes this through the header function, which can modify HTTP headers for incoming requests.

This can be verified by checking out the HTTP access logs for the website. Here is an example of a request sent from Googlebot, which receives a 500 Internal Server Error, and a request sent from an iPhone that goes through successfully (200 response code instead of 500):

127.0.0.1 - - [09/Jul/2020:11:36:52 -0500] "GET /test.php HTTP/1.1" 500 185 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

127.0.0.1 - - [09/Jul/2020:11:37:10 -0500] "GET /test.php HTTP/1.1" 200 147 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_2 like Mac OS X) AppleWebKit/601.1 (KHTML, like Gecko) CriOS/47.0.2526.70 Mobile/13C71 Safari/601.1.46"

These types of scripts can also be used to trick bots or users from being able to determine if a phishing page or malicious download still exists.

To detect and prevent these issues, we highly recommend having file integrity monitoring in place and clean backups of your files/database. If your website becomes compromised, you’d be able to identify indicators of compromise and malicious behavior within your environment.

Luke Leal

Luke Leal is a member of the Malware Research team and joined the company in 2015. Luke's main responsibilities include threat research and malware analysis, which is used to improve our tools. His professional experience covers over eight years of deobfuscating malware code and using unique data from it to help in correlating patterns. When he’s not researching infosec issues or working on websites, you might find Luke traveling and learning about new things. Connect with him on Twitter.

Related Tags

Skimmers in Images & GitHub Repos

Denis Sinegubko
July 22, 2020

MalwareBytes recently shared some information about web skimmers that store malicious code inside real .ico files. During a routine investigation, we detected a similar issue.…

Read the Post

How to Recognize a Phishing Campaign

Antony Garand
November 20, 2019

Phishing attacks and campaigns have always been a hot topic in online security. With many posts tagged as “phishing” on our blog — the first…

Read the Post

Malicious Android Application Used in Phishing Scam

Krasimir Konov
November 13, 2019

While we deal with a lot of phishing cases, we rarely see mobile applications used as part of a phishing campaign—these apps add a layer…

Read the Post

Undesired Redirects

Cesar Anjos
June 13, 2017

Whether it is your own or a website you are visiting, undesired redirects and pop-ups are always annoying. The situation gets worse when your visitors…

Read the Post

Web Skimmer with a Domain Name Generator

Denis Sinegubko
April 17, 2020

Our security analyst Moe Obaid recently found yet another variation of a web skimmer script injected into a Magento database. The malicious script loads the…

Read the Post

UCEPROTECT Scam: When RBLs Go Bad

Marc Kranat
February 12, 2021

What is a Realtime Blackhole List (RBL)? A Realtime Blackhole List (RBL) contains lists of email servers, domain names, and IP addresses that are associated…

Read the Post

Security Education

What is a Malware Attack?

Stephen Johnston
October 6, 2022

A malware attack is the act of injecting malicious software to infiltrate and execute unauthorized commands within a victim’s system without their knowledge or authorization.…

Read the Post

40 New Domains of Magecart Veteran ATMZOW Found in Google Tag Manager

Denis Sinegubko
December 7, 2023

Hackers like Google Tag Manager: millions of sites use it, and they can inject custom scripts and HTML code via a script from the highly…

Read the Post

Phishing Campaign Targets Poste Italiane & SMS OTP Verification

Luke Leal
April 29, 2020

When creating phishing lures, attackers may cite recent major regulatory changes within the context of their social engineering scheme to confuse or further entice victims…

Read the Post

Reflected XSS in WordPress v5.5.1 and Lower

Marc-Alexandre Montpas
October 30, 2020

WordPress released version 5.5.2 yesterday, which fixed a reflected XSS vulnerability we reported earlier this year. The root cause of this issue is a bug…

Read the Post

Related Tags

Related Categories

You May Also Like