Nobody likes spam. It’s never fun (unless you’re watching Monty Python). For us it comes with the territory; removing SEO spam has been at the core of what we deal with since our inception, giving us some pretty good insights into the various strategies black hats employ. From time to time however, we find ourselves with a curious case, and in this instance it was with a website built on the Drupal CMS that was suffering from conditional SEO spam.
When accessing the site using Googlebot as the user agent, the website would deliver a page filled with spammy content, specifically basketball shoes, erectile dysfunction pills, Viagra and other similar pharmaceutical placements. These kinds of spam injections are normally quick and dirty. In this particular case however, the infection was very specific to Drupal, employing several methods to hide its tracks.
Spam, Spam, Jordans, Spam, Viagra, Spam
Here is a snapshot of the type of content the website owner was dealing with:
What a pain, right? This was impacting the website’s rankings on Google search, including changing some of the search result listings. A quick check showed us how bad it was:
The infection was on a Drupal site. Compared to the way WordPress is structured, Drupal is a monster! There are lots of included files, modules, nested sites… a lot of places for malware to hide.
The first step we take is running our tools to assess what we’re working with and remove the obvious infections. We then analyze any possible anomalies in the code. The automated tools remove a lot, including backdoors, mailers, and spam files. It also gets rid of some malicious code in the database. Unfortunately this was not enough, the spam was still there. My initial thought was that the results might be cached.
Cache is disabled. Spam is still there.
I checked the code anomalies and possible malware warnings identified during our initial review. I found a few hidden backdoors, but no spam. At this point, the investigation gets more interesting. Time to check if we missed anything in the code. Performing a quick diff between a fresh copy of the same version of Drupal and the infected site should point out the bad guy.
All clear. No changes. Nada.
Debugging Drupal Core Files
Time to roll up our sleeves and debug Drupal! For you, dear reader, who may not be familiar with Drupal, the index.php looks like this:
All the rendering starts with the drupal_bootstrap function, so this will be the target during our analysis. This function lives inside /includes/bootstrap.inc and it handles different types of requests like DRUPAL_BOOTSTRAP_FULL.
Let’s take a quick look at that function:
The drupal_bootstrap function uses a switch statement to decide where to go afterwards. Most of the available functions are calling other functions inside /includes/bootstrap.inc except for: DRUPAL_BOOTSTRAP_SESSION and DRUPAL_BOOTSTRAP_FULL, which requires external files. While the second instance of require_once is static (to include code from /includes/common.inc) the first instance is variable.
That first instance is using the variable_get function to fetch the path to the included file. This function will check if the variable is set (session_inc in this case). If the variable isn’t set, it will return a default value provided as a second argument (includes/session.inc in this case).
Since we know that all core files are good, includes/session.inc and /includes/common.inc are off the hook in this case, but where is session_inc being stored?
Checking Drupal Database Tables
Next step is to check what’s inside the variable table under the session_inc function:
We found our first suspect: DRUPAL_BOOTSTRAP_SESSION is including /tmp/.ICE-unix (the path is relative, but it translates to this). Seems legit, right? Looking a little closer I found another issue with this specific site – it was running with root privileges, giving the attackers full access to the server files.
Checking the Suspicious File
In the first block of code we can see that the malware is specific to Drupal. The malicious code is designed to avoid breaking the site by including the original session.inc file. Next, it creates a function that matches the Drupal naming convention. It used this to sneak a backdoor into the site:
As we continue our investigation, we come to another interesting part of the code showing that the attacker had to hard code the site’s name in the spam section. All the spam pages are dynamic and created based on the response of 95cdn.com/n/natural.php – this is only created if Googlebot is the user agent, or if the site was reached via a Google IP address. If the referrer was Google, AOL, or Ask, the site would be redirected to another malicious page:
If no conditions match, it throws a 404 Not Found message and redirects visitors back to the home page.
Now that we are 1000% sure this file is malicious, we proceed with the cleanup which consists of having session_inc deleted from the variable table, as well as deleting the content from the bootstrap_cache table (which will be recreated automatically). Then, it’s safe to delete the malicious file.
Attackers are always finding new ways to inject their code on vulnerable sites in a way that their SEO spam campaigns remain active as long as possible, evading detection by minimizing the effect on the actual visitors while still targeting the Search Engine Result Pages (SERP). As we can see here, the attackers were very specific to Google, if the website visitor was using the Googlebot user agent, or coming from Google’s IPs, it would present its payload. An offline infected site is just as good as a clean site for this type of spam campaign.
When looking for malware/spam on your site, don’t focus solely on encoding and obfuscation methods like base64 and gzinflate. Look for anomalies, misplaced files, and anything that just doesn’t seem right. Yeah, it’s a little paranoid… but that’s how we roll.