Finding Conditional Drupal Database Spam

Nobody likes spam. It’s never fun (unless you’re watching Monty Python). For us it comes with the territory; removing SEO spam has been at the core of what we deal with since our inception, giving us some pretty good insights into the various strategies black hats employ.  From time to time however, we find ourselves with a curious case, and in this instance it was with a website built on the Drupal CMS that was suffering from conditional SEO spam.

When accessing the site using Googlebot as the user agent, the website would deliver a page filled with spammy content, specifically basketball shoes, erectile dysfunction pills, Viagra and other similar pharmaceutical placements. These kinds of spam injections are normally quick and dirty. In this particular case however, the infection was very specific to Drupal, employing several methods to hide its tracks.

Spam, Spam, Jordans, Spam, Viagra, Spam

Here is a snapshot of the type of content the website owner was dealing with:

buyjordans
Screenshot of one of the 91,300 spam results indexed by Google

What a pain, right? This was impacting the website’s rankings on Google search, including changing some of the search result listings. A quick check showed us how bad it was:

Google search results page with some of the spammy links
Google search results page with some of the spammy links

The infection was on a Drupal site. Compared to the way WordPress is structured, Drupal is a monster! There are lots of included files, modules, nested sites… a lot of places for malware to hide.

The first step we take is running our tools to assess what we’re working with and remove the obvious infections. We then analyze any possible anomalies in the code. The automated tools remove a lot, including backdoors, mailers, and spam files. It also gets rid of some malicious code in the database. Unfortunately this was not enough, the spam was still there. My initial thought was that the results might be cached.

Cache is disabled. Spam is still there.

I checked the code anomalies and possible malware warnings identified during our initial review. I found a few hidden backdoors, but no spam. At this point, the investigation gets more interesting. Time to check if we missed anything in the code. Performing a quick diff between a fresh copy of the same version of Drupal and the infected site should point out the bad guy.

All clear. No changes. Nada.

Debugging Drupal Core Files

Time to roll up our sleeves and debug Drupal! For you, dear reader, who may not be familiar with Drupal, the index.php looks like this:

Drupal’s index.php file
Drupal’s index.php file

All the rendering starts with the drupal_bootstrap function, so this will be the target during our analysis. This function lives inside /includes/bootstrap.inc and it handles different types of requests like DRUPAL_BOOTSTRAP_FULL.

Let’s take a quick look at that function:

Bootstrap phases on Drupal’s bootstrap.inc
Bootstrap phases on Drupal’s bootstrap.inc

The drupal_bootstrap function uses a switch statement to decide where to go afterwards. Most of the available functions are calling other functions inside /includes/bootstrap.inc except for: DRUPAL_BOOTSTRAP_SESSION and DRUPAL_BOOTSTRAP_FULL, which requires external files. While the second instance of require_once is static (to include code from /includes/common.inc) the first instance is variable.

That first instance is using the variable_get function to fetch the path to the included file. This function will check if the variable is set (session_inc in this case). If the variable isn’t set, it will return a default value provided as a second argument (includes/session.inc in this case).

Since we know that all core files are good, includes/session.inc and /includes/common.inc are off the hook in this case, but where is session_inc being stored?

Checking Drupal Database Tables

Next step is to check what’s inside the variable table under the session_inc function:

Database row storing the path to be included by bootstrap.inc
Database row storing the path to be included by bootstrap.inc

BINGO!

We found our first suspect: DRUPAL_BOOTSTRAP_SESSION is including /tmp/.ICE-unix (the path is relative, but it translates to this). Seems legit, right? Looking a little closer I found another issue with this specific site –  it was running with root privileges, giving the attackers full access to the server files.

Checking the Suspicious File

In the first block of code we can see that the malware is specific to Drupal. The malicious code is designed to avoid breaking the site by including the original session.inc file. Next, it creates a function that matches the Drupal naming convention. It used this to sneak a backdoor into the site:

Malware also play it safe.

As we continue our investigation, we come to another interesting part of the code showing that the attacker had to hard code the site’s name in the spam section. All the spam pages are dynamic and created based on the response of 95cdn.com/n/natural.php –  this is only created if Googlebot is the user agent, or if the site was reached via a Google IP address. If the referrer was Google, AOL, or Ask, the site would be redirected to another malicious page:

Both addresses were registered using GoDaddy at the same date and are using CloudFlare.
Both malicious addresses were registered using GoDaddy on the same date and both are using CloudFlare.

If no conditions match, it throws a 404 Not Found message and redirects visitors back to the home page.

Now that we are 1000% sure this file is malicious, we proceed with the cleanup which consists of having session_inc deleted from the variable table, as well as deleting the content from the bootstrap_cache table (which will be recreated automatically). Then,  it’s safe to delete the malicious file.

Conclusion

Attackers are always finding new ways to inject their code on vulnerable sites in a way that their SEO spam campaigns remain active as long as possible, evading detection by minimizing the effect on the actual visitors while still targeting the Search Engine Result Pages (SERP). As we can see here, the attackers were very specific to Google, if the website visitor was using the Googlebot user agent, or coming from Google’s IPs, it would present its payload. An offline infected site is just as good as a clean site for this type of spam campaign.

When looking for malware/spam on your site, don’t focus solely on encoding and obfuscation methods like base64 and gzinflate. Look for anomalies, misplaced files, and anything that just doesn’t seem right. Yeah, it’s a little paranoid… but that’s how we roll.

As always, if you need help dealing with website infections, we are here to help. We also have a DIY guide you can follow to help you fix Drupal database spam.

3 comments
  1. This isn’t an instance of site spam, someone hacked the site to place that custom file there in the first place and to update the {variable} table. At this point you need to start looking at identifying & fixing the security hole in your site that allowed it to happen in the first place.

    1. Hey Damien, you are completely right! The site was attacked and a lot of files were uploaded to maintain access and to inject SEO spam.
      However, it’s not always that we have access to logs or to the whole server for a forensics analysis. In this particular case, the site was on a VPS, we had root level access, but Apache logging was disabled. That’s the reason I didn’t cover how it was infected. I could speculate on that, but I rather focus on the malware itself.

  2. Quite a splendid summary. Though just like to point out that /tmp is writeable on nearly all shared hosting servers. Root access is not required to write to /tmp.

    This just goes to show that sometimes we need to check outside the account for situations like one, where there’s an almost 100% certainly the malware does not reside ‘in the document root’.

Comments are closed.

You May Also Like