So in light of it being the beginning of October I decided to celebrate by spending a little time on re-writing some of the functionalities of FileFuzz.
As I had mentioned in a previous post about this topic, fuzzing typically falls into two different categories – brute force (mutation-based fuzzing) and intelligent brute force (generation-based) fuzzing. Just to recap, mutation based fuzzing is where we get some sample files of the file type and the fuzzer generator creates mutations of them. With intelligent brute force fuzzing we actually have to research the file specification. An intelligent fuzzing engine is still brute force attacking, but it relies on configuration files from the user, making the process more intelligent. Think of templates that the fuzzing engine will use that has a list of data structures, positions relative to each other, and possible values.
In the last little while, in my free time, I’ve been looking into different fuzzing generators in order to generate malformed files for different file formats. I decided to go with Autodafe for the generation based fuzzing. I have a newer release of Autodafe than the public version, (just because I emailed Martin Vuagnoux and he was cool enough to send me the newest he had at the time) – and I modified how it generates files for hex fuzzing. The stock version of Autodafe is missing the hex generator functionality. You can pick up the hex fuzzing mod that I did for it here.
I also decided to spend a little time and modify FileFuzz for the mutation-based fuzzing.
One thing I tried to keep in mind while trying to figure out how to generate the data was exactly what data to generate. I figured I’d have greater success of finding bugs if I could generate smart data sets.
So let’s talk about “smart data sets” for a minute pertaining to mutation-based fuzzing (I also like to think of mutation-based fuzzing as file bit-flipping).
Bit flipping files can be really handy for not only finding integer overflows/underflows but also for other length calculations on data.
Let’s look at an example of length calculation stuff (this is pretty much taken from Pedram’s fuzzing book – “Fuzzing – Brute Force Vulnerability Discovery”). For example, MS04-028 “Buffer Overrun in JPEG Processing (GDI+) Could Allow Code Execution.” The JPEG format allows comments to be embedded within the image itself. Comments are preceded by the 0xFFFE byte sequence, followed by a 16-bit word value indicating the total size of the comment. The size includes the two bytes used for the size and the header ends with the comment itself.
We would see something like this in the file:
FF FE 00 06 66 75 7A 7A
Breakdown:
FF FE Comment Preface
00 06 Length of comments in bytes
66 75 7A 7A ASCII value of ‘fuzz’
Now if the Length of comments in bytes was changed (flipped) from 0×0006 to 0×0000 we have an overflow in the vulnerable version of Windows Picture and Fax Viewer (shimgvw.dll).
The other big class of vulns. that typically found with bit flipping are the integer related overflow/underflow guys.
Continuing with “smart data sets” we can guess that the two extreme border cases (0 and 0xFFFFFFFF) are obvious choices to flip bits to – but let’s think of other choices as well.
For example (again from Pedram’s book) it’s not uncommon for additional space to be included with the specified size to accommodate a header, footer, or terminating NULL byte. The following code is an example:
int size = read_ccr_size(packet);
// save space for NULL termination.
buffer = (char *) malloc(size + 1);
Therefore, it might be a good idea to include near-border test cases such as 0XFFFFFFFF-1, 0XFFFFFFFF-2, 0XFFFFFFFF-3, (or even + sometimes) etc.
Then there’s also 16-bit integers (0xFFFF) and 8-bit integers to think about (0xFF).
Also there’s a host of other meaningful (“smart”) values we might want to try that could yield results:
0×100
0×1000
0×3fffffff
0×7ffffffe
0×7fffffff
0×80000000
0xfffffffe
0×10000
0×100000
0×2000
0×8000
etc…
So as I had mentioned previously I decided to work with FileFuzz to generate malformed (bit-flipped) files. The way Sutton designed FileFuzz there’s essentially two different ways to bit flip files. One is to use the “All Bytes” option which you can specify the Byte(s) to Overwrite (flip to) and the number of bytes to overwrite. It might be clearer with a small example:
Let’s say I have a file with the current values:
FF FE 00 06 66 75 7A 7A
If I specified the “All Bytes” option with the Byte(s) to Overwrite to be 0xBB and the number of bytes to be overwritten to be 2 the following would be the first file generated:
BB BB 00 06 66 75 7A 7A
The second file would be:
FF BB BB 06 66 75 7A 7A
The third file:
FF FE BB BB 66 75 7A 7A
and so on…
Using this technique one could generate a lot of files with combinations like 0×00 x 4 (to create 0×00000000 in place of our BB BB above we would overwrite 4 bytes at a time with 00 00 00 00), etc.
But we couldn’t generate different value pairs of bytes to overwrite like 0xFFFE or 0×00000001, or even 0×7FFE….
There’s also the “Depth” option in FileFuzz in which case you specify a location (bytes number) to fuzz in the file and the bytes to overwrite that location with.
Using our example hex values again:
FF FE 00 06 66 75 7A 7A
If I were to specify location 2 with the bytes to overwrite being 0×01to 0×03 the result would generate the following date in the files:
First file:
FF FE 01 06 66 75 7A 7A
Second file:
FF FE 02 06 66 75 7A 7A
Third file:
FF FE 03 06 66 75 7A 7A
So the “Depth” option just concentrates on one byte location at a time.
These options given to us by Sutton are pretty limiting. When I first used this and generated my first case of malformed zip files for fuzzing I used the “All Bytes” function to create files for 0×00 (x 1) which gave me files for 0×00, then I generated files with 0×00 (x2) which gave me 0×0000, then 0×00 (x3) which is 0×000000, and finally (x4) 0×00000000, I did the same for 0xFF, 0×01, 0×80, 0×10, 0×3F, 0X7F and a few others.
I had generated a bunch but in the end I had to enter the Byte(s) to Overwrite / flip to (0×00, 0xFF, 0×01, 0×80, 0×10, 0×3F, 0X7F, etc.) and the number of bytes to overwrite (1-4) and for each instance I had to enter the information in FileFuzz manually.
Well, luckily it came with the source code. What I did was I modified the “All Bytes” function to go ahead and produce a lot of malformed files for different cases without having to enter in the data in FileFuzz each time.
The different cases are the following at the moment:
It will generate files using “Bytes to Overwrite” with values of 0×00 and 0xFF and the number of bytes to overwrite from 1-4.
I also prepend and append the following values if the number of bytes is >= 2:
“00″, “FF”, “3F”, “7F”, “01″, “02″, “80″, “FE”, “10″, “20″, “40″, “60″
So, if the “Bytes to Overwrite” is 0×00 and the number of bytes is 1 then it will simply overwrite each byte at a time with 0×00.
If the “Bytes to Overwrite” is 0×00 and the number of bytes is 2 then it will create the following values to bit flip each bit location with:
0×0000
0xFF00
0×3F00
0×7F00
0×0100
0×0200
0×8000
0xFE00
0×1000
0×2000
0×4000
0×6000
0×00FF
0×003F
0×007F
0×0001
0×0002
0×0080
0×00FE
0×0010
0×0020
0×0040
0×0060
Then for good measure I if the number of bytes to overwrite is >= 2 then I also prepend the values with 7F and append the value with FE. In which case I’d also generate files with bits that flip to
0×7FFE
0×7FFFFE
0×7FFFFFFE
In the end, it’s a lot of test cases – which brings me to my next thought: creating the base test file which FileFuzz will use to do the mutations against.
So, at first I thought it was important to use a very simple test file (for example in my zip test base file I zipped up one text file in a directory that had like 4 bytes written in it). The goal was to keep the file simple and small – because brute force fuzzing (bit flipping) is inefficient. We want to focus on the file headers not the data itself. For example, if I was fuzzing a JPEG I’d want to create a JPEG image with like a 1 x 1 white pixel.
My base test zip file is 248 bytes, which by my modified FileFuzz I got back 36207 different files to fuzz.
If we used a meduim or large (any format type) file the amount of files we’d generate would be way too much to realistically try and fuzz – it would generate way too many test cases, and most of them would be useless because we’d be fuzzing mostly the data in the file, not the headers (the stuff we generally really care about).
Which makes total sense. However, there are some files (like, say randomly downloaded from the Internet) that have more and/or wacky extraneous stuff in the headers that seem to trip up programs much better than just some simple vanilla test file that you create yourself.
For instance, I’ve fuzzed programs with pdf files using a base test case with the same data content (the word “test” in the pdf file, highlighted, bolded, red in color, etc.) from a pdf generated from a Microsoft Word plugin and a pdf generated from a different program. One made Adobe reader crash and the other one didn’t.
So I modified another version of FileFuzz: one that has a modified Range of bytes functionality. The modifications I did to the previous version only pertained to the All bytes functionality. Now with this newer version we can modify a range of bytes in the file. This way we can concentrate on just fuzzing interesting ranges of a larger file (like the file headers).
Also – there was a bug in FileFuzz that I had to fix. It kept crapping out on me when generating certain files (generally larger ones over a certain size) and I tracked the problem down to line 78 in Read.cs:
while (brSourceFile.PeekChar() != -1)
has to be changed to this:
while (brSourceFile.BaseStream.Position != brSourceFile.BaseStream.Length)
So this thing is semi-interesting too.
So, just for the background info here’s brSourceFile being declared:
private BinaryReader brSourceFile;
instantiated like this:
brSourceFile = new BinaryReader(File.Open(sourceFile, FileMode.Open));
Then later in the code we get the while loop to read the data from our test file:
while (brSourceFile.PeekChar() != -1){
//readfile…
}
Turns out the problem is in PeekChar(). This method tries to peek at a char (with UTF8 encoding as default) and can get into an error state if it sees a byte that it can’t understand.
I guess Microsoft is “actively looking to obsolete BinaryReader.PeekChar in a future release since it has some design issues.”
Heh – I guess PeekChar() is a method in the BinaryReader class that can only handle chars (not binary as the class name suggests)
Anyways, if anyone comes up with more interesting datasets to use for brute-force fuzzing or any other thoughts on how I might improve my modified version of FileFuzz let me know…
Cheers,
Chuck B.
2 responses so far ↓
1 Schlafwandler // Aug 5, 2008 at 9:04 pm
Very interesting text & thx for the autodafe extension. One question about this: The 0.1 version not only misses the hex generation feature, also hex values seem to be completely ingnored in the debugging/weighting process. Have you also looked into this or are you using autodafe just for the generation of testfiles?
2 Chuck B. // Aug 6, 2008 at 12:57 pm
Yeah – at the moment I’m just using Autodafe to generate hex files for file fuzzing, and I haven’t looked into the hex functionality for the debugging/weighting process…
However, if I were you I’d hit up Martin (the author of Autodafe) via email (autodafe@vuagnoux.com) and ask him. He seemed pretty cool when I reached out to him. It did take a few days for him to respond though – I don’t think he checks that email address every day.
Let me know the outcome, hopefully he’ll release a newer version of Autodafe with some bug fixes – or even a windows version for the debugging /weighting process.
Cheers,
Chuck B.
Leave a Comment