Google Bots Doing SQL Injection Attacks

One of the things we have to be very sensitive about when writing rules for our CloudProxy Website Firewall is to never block any major search engine bot (ie., Google, Bing, Yahoo, etc..).

To date, we’ve been pretty good about this, but every now and then you come across unique scenarios like the one in this post, that make you scratch your head and think, what if a legitimate search engine bot was being used to attack the site? Should we still allow the attack to go through?

This is exactly what happened a few days ago on a client site; we began blocking certain Google’s IP addresses requests because they were in fact SQL injection attacks. Yes, Google bots were actually attacking a website.

The Requests

It all started when we saw a real Google IP address being blocked as a SQL injection. This is what the audit logs showed (slightly modified the protect the innocent):

66.249.66.138 - - [05/Nov/2013:00:28:40 -0500] "GET /url.php?variable=")%20declare%20@q%
20varchar(8000(%20select%20@q%20=%200x527%20exec(@q)%20-- HTTP/1.1" 403 4439 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Our first thought was that it was a fake Google bot, but when inspecting the IP we found that it wasn’t, it was a real Google bot:

$ host 66.249.66.138
138.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-138.googlebot.com.

NetRange:       66.249.64.0 - 66.249.95.255
CIDR:           66.249.64.0/19
OriginAS:       
NetName:        GOOGLE

Further investigation showed other similar request signatures all coming from Google IP addresses.

What is going on?

It seems that while Google has no real interest in hacking you (far from it), their automated bots can be used to do the heavy lifting for an attacker.

In this scenario, the bot was crawling Site A. Site A had a number of links embedded that had the SQLi requests to the target site, Site B. Google Bot then went about its business crawling pages and following links like a good boy, and in the process followed the links on Site A to Site B, and began to inadvertently attack Site B. It’s hard to say if it’s something we hadn’t seen, or maybe it’s just something we hadn’t really paid much attention to, but it makes you really think…

Can someone create fake links with malicious keywords and have bots follow them and run malicious strings on another website?

Stealth Attacks Using Bots

Let’s assume we have an attacker, his name is John. John is your everyday hacker, he spends his day crawling the web looking for new vulnerabilities. In the process, he finds a number of vulnerable sites or web servers, ripe for the picking. John though, is not your average hacker, he is very aware of the forensics process, and knows that to be a successful hacker, you must cover your tracks.

As a forensics analyst, one of the first things we look at is the logs. John knows this. What if John does enough to find the vulnerability passively, allowing him to go unnoticed? John now has a list of possible weaknesses, one being an SQLi or RFI vulnerability on Site B.

John goes to his site, Site A, he adds all this awesome content about kittens and cupcakes, but in the process he adds a number of what appear to be benign links that are unsuspecting to the user reading, but very effective to the bot crawling the site. Those links are riddled with RFI and SQLi attacks that allow John to plead ignorance, also allowing him to stay two arms lengths away from Site B. This doesn’t mean he can’t verify success, it just means he doesn’t open himself to early detection by more active scanning and attacks.

Maybe this is just conjecture, but then again, maybe it’s not.. Thoughts?

Another possibility is that if a site is running a WAF (or IDS) that does protect against SQL injection attacks and the attacker can’t get through. But what if the Google’s IP address is white listed? That allows an easy bypass for the bad guy.

We are contacting Google about it, but it is always something important to keep in the back of your mind. You can’t just whitelist their IP, and allow through without any type of inspection.

57 comments
  1. But if you blacklist Google, you are effectively blacklisting yourself *from* Google, no?

    This is all very troubling. I hope they respond to you quickly. I’ve noticed some weirdness in the past as well. Since it was Google I wrote it off and basically didn’t dig any further. Obviously that was flawed logic on my part, but I’m not sure that blocking search engines is the answer either.

    This seems like a lose-lose unless Google can clarify the matter. 🙁

  2. Very interesting. The reflected attack vector could be very tempting. To your last thought though, I think we’re beyond the point of simply whitelisting/blacklisting IP’s. This is a good reminder why a defense in depth approach is simply better. Sure, whitelist until the cows come home. But don’t forget to inspect the whitelisted traffic as well.

  3. I have blocked Google before and it is not pretty. I lost all Google standings. In fact, my domain was removed completely from Google search results. So, it would seem that we are left with two different levels of defense. One would be to virus scan the database, looking for after-the-fact injections, and 2, create rel=nofollow for each link we put into our own posts. If Google follows the rules, the nofollow would not make a site B scenario where it could attack the recipient’s database… but as I understand it, this would also damage any page ranking you might have. I’ll admit, my SEO experience is mostly learned by experience. I think it’s important that we have good SEO practices… but this is the only way I can think of to remedy this problem until Google works on their bots.

    1. Google should work on the bots to prevent this, but they can’t predict every vulnerability. Someone will always find a new link-based exploit, and using Google or other bots to exploit it will always be a potential vector.

  4. you are protected if you have used SQL injection preventive measures in the first place.
    Second, one cannot (easily) detect the injection if firewall has strict rules. If he can, then sql injection is possible even without Google bot.

      1. Then you let them get pwned. If their site is vulnerable to SQLi, there’s really nothing you can do to help them.

        1. On a shared hosting site thats an incredibly dangerous idea. At any one time one, many or all of the common hosting OS’s will have any number of escalation exploits, and this can lead to the rest of your customers getting pwned. The host could end up used as a bandwidth destroying DDOS host , or even worse it could be used as a springboard into your own organization potentially killing your business. The smarter hackers use this for mass exploits. They find the weakest link (like an old blog using an ancient copy of wordpress or whatever) and use it as as a beachhead to take down all the other hosted sites, stealing credit cards, passwords and so on.

          So no you don’t let them get owned. You fix it yourself, or if your policy doesn’t allow that, you remove them from your service. Anything else is suicide.

          1. Disqus swallowed my post while logging me in, so in condensed key point fashion. Sorry, shayneo… That was an elaborate reply and I should have copied the text before logging in…
            Put your business things on a separate server from your clients.
            Sandbox your clients. Proper monitoring and quotas along with notification of your customers go a long way as well…
            Few SLAs and TOS put the burden of security on the client. While you can deactivate an account on “suspicious activity” this probably also means that you loose the client that way. (yeah, you can say that such clients are unwelcome, but that helps little, if you cater to such users as your target audience.)
            Webhosters had these issues since the rise of scripting languages (and before), so a good place to find answers is somewhere, where they meet (e.g. webhostingtalk & co.). Hint, there’s no answer, a whole industry works to secure servers against threats…
            So choose your poison: sandbox, disable dynamic features or get pwned at some point (“it’s a matter of ‘when’ this will happen, not ‘if’ it will.)

        2. Well, it sounds like the author of this article is not in a position to ignore things like that. He referred to the target site as being a “client site”, which suggests that he may not have access to the site’s code. Also, even if you think you’ve got all of your bases covered, it’s generally not a good idea to permit malicious activity from IPs that can be reliably blocked. That’s the reason he even wrote this article in the first place: to call attention to the unusual dilemma. If it was a simple as “just let them get pwned” or “you should have guards against SQL injection in the first place”, he wouldn’t have bothered blogging about it.

      1. Hey Guesty, so nice of you to drop by help little me to understand the world a bit better.
        Thanks to your insightful and extensive explanation of TCP I now understand that IP Spoofing wouldn’t work in this case.
        Though I am still a bit puzzled about your signature. Why are you signing with “Silly idiot” if your real name is “Guesty”?

        Yours Truly,
        Frank

  5. What you’re suggesting is that crawlers should contain heuristics to detect whether a particular link is an SQL injection attack attempt, and ignore or penalise those links.

    Firstly, that’s a pretty difficult thing to do (which is at least partly why your product exists, yes?), especially since you don’t want false positives. Secondly, maybe they are doing this already but you’re seeing inbound links that are not meeting that threshold.

    1. It’s not just difficult, it’s impossible.

      Every SQL injection attack could be a legitimate request. To prove the point, imagine you are running a pastebin, where people can submit:

      * chunks of SQL code
      * something that turned up in their logs

      …or search for snippets of text that contain these things (which will be GET requests, even if the former are POST requests).

      OK, that’s probably an extreme example, but a simpler one is any forum where you might want to discuss SQL injection or other programming related stuff. It gets incredibly frustrating or confusing if your attempts to contribute to the conversation get rejected or ‘sanitised’ in some way.

      These examples prove that you cannot detect “malicious” input without getting false positives. There is no such thing as “malicious input” – malice is an attribute of the person sending the data, not the data itself. The data could be perfectly innocent and useful in another context. This problem is entirely the responsibility of the code that is vulnerable to SQL injection.

      1. Maybe impossible, but not for the reasons you provided. A crawling bot can already distinguish a link in example code from an actual link, the same way your web browser can distinguish the difference. IMHO penalization should never be automated for a company with thousands and thousands of employees, but there is no legitimate reason for a googlebot to follow a link with sql code in the argument.

    1. I’ve seen the cute little page inversion pages of years ago used like that. Google Translator now uses some very complex Javascript, so it may not work as a proxy anymore, instead running in the browser.

  6. Let’s wait for the response from Google, but i think it’s more possible that someone using the Google Cloud (App Engine) is conducting “security test”. For example, Getaround site makes use of Google Cloud then if my site receive a request like you showed I will think the Google is attacking my site.

  7. Someone failed at the most basic level here and it wasn’t Google. From RFC 2616 (HTTP) Section 9.1 Safe and Idempotent Methods – “In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered “safe”.”

    1. I think you misunderstand how SQL injection works. An SQLi attack may turn a “safe” database SELECT into something destructive, or bypass user access restrictions.

      1. SQL Injection works because the programmer didn’t follow decent security practice. There is NO WAY a database select statement should be created by concatenating strings supplied by a user. That’s what stored procedures are for.

        1. Yes, that’s how it SHOULD be done, but there are still a lot of developers who have no clue about it, YOU didn’t know it before someone told you to not do it that way.. I must admit that SQL injection is something one should know by now.. And let’s not forget, there are a lot of older sites which never have been updated (because it’s too expensive or whatever reason)…
          What is ‘decent security practice’? what today may be a decent security practice can be a stupid practice tomorrow… As I said SQL injection is something you can prevent very easy these days, but there are so many other problems which a beginning developer has never heard of (most are only known to people who work on security everyday, but most webdevelopers won’t know about them)..

          1. Andrew Jakobs seems to be suggesting that web developers should not be expected to know how to defend against SQL injection. That’s bullshit. SQL injection has been the top of the CWE/SANS common vulnerability lists for many years, and any webdev who is unaware of it has clearly made no effort to achieve competence.

            It is one of the bigger problems in software development that it is very easy to become a poor developer and much harder to become truly competent.

            Hire an amateur to develop your web site and you will be pwned.

      2. Well for starters the article above doesn’t say that site A’s links are intentionally malicious, it instead seems to blame Google for following them unexpectedly. Any web crawler should be able to follow and link with a GET request at any time. The only web crawler behaviour you can reasonable characterize as “bad” would be repeated and too frequent requests. Second, there is no such thing as a “safe” SQL statement that can be user supplied. If any part of a GET request can be modified to perform an action other than retrieval then the developer failed big time at following RFC 2616. In the case of query parameters and the like that are used as part of a SQL query some type of technology (PreparedStatements for Java, etc) should be used to protect against names like “Robert’); DROP TABLE Students;”. The idea that Google needs to be contacted about this or their ips blocked is silly, instead all traffic to site B should be blocked until they learn how to write code to handle a GET request.

  8. Any idea what are the vulnerable components the bots were targeting? The only real defense for this if you want your site to still show up in Google searches is to remove or mitigate the vulnerable software. Could you please share with us what the bots were targeting?

  9. Hi Daniel, The Robots, including Google, will crawl links, and use them without filtering. You can check yourself, by posting “an attack link” (or a unique query string) to your site, in Twitter, in a Google Group, a comment here, or new page on a different site. Within 24 hours, you’ll see this showing up in your web server logs. I see this frequently on my https://libinjection.client9.com/diagnostics page, but here it’s intentional. Also you are able to use the google link:your-url and/or their webmaster tools to see what page might be hosting this link.
    @ngalbreath

  10. This is a great vector of attack but really non consequential. If you are unprotected from this most basic attack then whether or not G bots are carrying it or not is not going to affect you. You will be someone’s pony shortly.

  11. Although the “Attack” may have come from the G-Bots, it’s not Googles fault. The way the article is written, you give Google the blame.
    Every standard Webcrawling Bot would behave the same. The bad thing here is the hacker putting those links on frequented websites.

  12. What is going on, paragraph 1. could *not* care less. You are saying the opposite to what you mean (if they could care less, it means they care)

  13. What I need is the security system must be safe and not have any holes for hackers to penetrate easily. The company’s website we were playing and they knocked several times. This really makes us worry

  14. Alternatively, embed the SQLi links in an email, add the words ‘explosive’ and ‘countersurveillance’, and send the email to whomever you like. The US NSA will then execute the attack for you.

  15. Hi,

    I already see this happening in other situations.
    In my opinion, malicious users are using Open Redirects Google URLs to launch web attacks. They’re a couple in the wild and most of them ignored by Google Security Team.

  16. It is an interesting attack vector but if you are worried about what an attacker can do to you through Google then you should also be worried about what all the attackers who don’t care about hiding their identities can do when attacking directly.

    If you are vulnerable to Google then you are vulnerable to everyone else.

  17. In several cases, the SQLi target was posted in a hacking forum, blog or exploit site, then Google bots perform a request to the link and indexes it (title, content).

  18. Limit what clients are able to use in regards to SQL and resort to mandatory, even simple, captchas.

  19. In real life every bot author has to face this problem at some point. If you’re conservative you might just drop any parameters from the reques or at least sanitize them – limit length (DoS), remove “suspicious” characters etc. But this way you won’t be able to scan some websites whose navigation is based on complex URL queries. So it’s compromise between being nice and having broader coverage and I guess this is why Google just follows any links it finds.

  20. In 2001, Michal Zalewski published an article in
    Phrack #57 entitled “Against the System: Rise of the Robots” describing this very thing. In it, Zalewski described
    a series of experiments he performed showing how the search engine indexing
    bots of the day could be misused to launch a variety of web attacks.

    I independently rediscovered these issues in 2009 and reported them to Microsoft Bing and Google. Microsoft eventually made some changes to their search bot and credited me on their website, but Google basically responded that it “works as designed”.

  21. In the case of tricking a user into clicking a malicious link, you should be seeing Referrer headers from the attacking site, which would still leave a fingerprint pointing to John’s Site A. He’d still need to anonymize the links somehow (maybe by bouncing them off of a URL that blindly redirects users to arbitrary URLs?) to avoid ending up in the logs.

  22. Anyone tried this method is really working ? How the bot can be used to execute the SQL injection ?

  23. Not sure how Sucuri WAF is coded and implemented but probably is doable to make specific combinations with regex and block/redirect/suspend queries like select, insert, update, declare, cast, union, create, drop, delete… To reduce false positives rules for blocking can be set to active if two or more of specific sqli commands are in query, for example, or even to compare parts of specific attack vectors, databases of attacks that can be found in Vega, Acunetix and similar scanners; they are pretty extensive and can be implemented in short time.

  24. Good work and great research for sharing such an informative piece for better database management.Security play a very crucial role and every business wants to keep his database more secure from any external attack.

Comments are closed.

You May Also Like