Whitespace Steganography Conceals Web Shell in PHP Malware

Last November, we wrote about how attackers are using JavaScript injections to load malicious code from legitimate CSS files.

At first glance, these injections didn’t appear to contain anything except for some benign CSS rules. A more thorough analysis of the .CSS file revealed 56,964 seemingly empty lines containing combinations of invisible tab (0x09), space (0x20), and line feed (0x0A) characters, which were converted to binary representation of characters and then to the text of an executable JavaScript code.

It didn’t take long before we found the same approach used in PHP malware. Here’s what our malware analyst Liam Smith discovered while recently working on a site containing multiple backdoors and webshells uploaded by hackers.

Suspicious license.php file

One of the files Liam found looked a bit strange: system/license.php.

As the filename implies, it contains text for a license agreement — more specifically, text for GNU General Public License version 3.

The license text is placed inside a multi-line PHP comment. However, on line 134 we see a gap between two comments that contains executable PHP code.

PHP Code in fake license file — PHP code inside license.php

Hiding malicious code between comment blocks is a common obfuscation technique used by hackers.

The code is obviously malicious but, after first glance, it was not clear whether the malware was complete or had any other parts in that file. It’s easy to notice that it tries to read itself and do something with its contents using file_get_contents(basename($_SERVER[‘PHP_SELF’]))). A quick visual inspection of the file, however, didn’t reveal any other sections that can be converted into working PHP code.

Analysis of the Visible Malicious Code

To understand what exactly the malicious code does, we analyzed each statement piece by piece.

The first clause splits the file contents into sections divided by semicolon characters and assigns the last section to the $cache variable. In plain words, this code works with the part of the file found after the last “;”.

It turns out that the final semicolon in the file is also the last character of the license agreement: “But first, please read <http://www.gnu.org/philosophy/why-not-lgpl.html>;”. In the original license agreement the last character is a period, indicating that the file was intentionally modified by the attacker for this clause.

There is nothing clearly visible after this last semicolon. To understand what’s being done with the trailing part of the file, we need to analyze the next section of malware clauses.

Whitespace Decoder

for($i=0;$i<strlen($cache);$i++){

  $out.=chr(bindec(str_replace(array(chr(9),chr(32)),array('1','0'),substr($cache,$i,8))));

  $i+= 7;

}

Here we can see that the code reads the rest of the file in chunks of eight characters at a time (substr($cache,$i,8)) and converts tabs (9) and spaces (32) into ones and zeroes. The resulting binary string is then converted to a decimal number (bindec) and then to a character using the chr() function. This way, octet by octet, the rest of the file’s whitespaces are converted into a visible string.

At this point, the string is neither meaningful nor executable. To completely decode and execute the payload, the following combination of functions are used : ‘base64_decode(str_rot13(gzdecode(…‘.

As a backup way to execute the payload, the malware also tries to save the decoded contents to a file named “ “ (just a space — to make it less visible in file listings) and includes it on the fly using include $cachepart;. Afterwards, this file is deleted in an attempt to evade detection.

That being said, we have reports that for some reason some of these blank-named files can still be found on compromised sites.

Revealing the Hidden Payload

Now that we know that this malware is looking for any tabs and spaces after the last semicolon, lets find and decode that hidden payload.

As it turns out, there are almost 300 Kilobytes of invisible tabs and spaces at the end of the last line in license.php file — compared to only 30 Kb in the visible license text.

These invisible characters can be revealed if you check the hex code of the last line or select content after the last “;” in a text editor (with word wrapping on).

Hex view of code contents found in fake license file — Hex view of the last line of license.php

Invisible content highlighted — Beginning of the invisible content selected in a text editor

While this picture resembles Morse code just like the similar obfuscation found in JavaScript malware that described last November, this sample doesn’t use line feed characters (0x0A). This means that the invisible content doesn’t create a ridiculous amount of suspicious empty lines at the bottom of the files.

When we use the malware’s algorithm to decode the whitespace, we get a 74Kb file of a web shell that provides hackers with tools to work with files and databases on the server, collect sensitive information, infect files, and conduct brute force attacks. It can also work as a server console or anonymizer to hide the attackers real IP address.

Decoded web shell found hidden in fake file — Web shell decoded from the whitespace

Origin of the Algorithm

As frequently revealed in our investigations, many of the tricks and algorithms found in malware are not created by hackers. Much of the code is pre-existing and simply copied from sites like StackOverflow.

A quick search for the code snippet of this particular whitespace decoder revealed a 2019 article on the popular Russian language IT community site Habr.ru. The article’s author shared their proof of concept for PHP whitespace obfuscation, which had been inspired by a previous article on obfuscation published back in 2011 that discussed the concept of encoding using only tabs and spaces.

The malware author simply took the whitespace decoder part from that article without any modifications or changes, then added some code to work with additional layers of obfuscation and to execute the decoded payload.

Malicious Uploader

It’s quite rare to find only one type of a backdoor on a compromised server. There are usually several varieties responsible for specific tasks.

For example, the inconspicuously named license.php file is intended to stay under the radar for a long time, providing access to the compromised site even when other malware is found and removed.

The files or code that attackers initially plant on a server in order to infect the site are another type of backdoor. Usually found as a small file in the compromised environment, these backdoors either allow attackers to execute arbitrary code or create specific files. They don’t even need to be very stealthy or heavily obfuscated — hackers often delete them after use to cover up their tracks.

In this particular case, we found malware uploaders which create fake license.php files and inject malware into .htaccess and index.php files.

Conclusion

While making malicious content invisible to a naked eye seems like a good idea, the whitespace obfuscation used in this malware is far from ideal. It contains an easily detectable section of PHP — and removing it renders the invisible payload unusable. Another downside of this approach is the bloated size of the file. The malware increased the size of the file 10 times — making it significantly more suspicious.

Obfuscation techniques are commonly used by hackers to hide code and conceal malicious behavior. There are hundreds of known types of obfuscations, and attackers are always looking for new ways to avoid detection.

The good news for webmasters is you don’t have to be able to decode or understand exactly how they work to find and remove malware. A simple integrity control solution is enough to detect unwanted modifications in your files.

Whenever you review changes and are not sure whether it’s malicious or not, the safest approach is to revert the file to the known clean version — you have a backup, right?