[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 11/14] fuzz/x86_emulate: Make input more compact
On 08/25/2017 06:59 PM, Andrew Cooper wrote: > On 25/08/17 17:43, George Dunlap wrote: >> At the moment, AFL reckons that for any given input, 87% of it is >> completely irrelevant: that is, it can change it as much as it wants >> but have no impact on the result of the test; and yet it can't remove >> it. >> >> This is largely because we interpret the blob handed to us as a large >> struct, including CR values, MSR values, segment registers, and a full >> cpu_user_regs. >> >> Instead, modify our interpretation to have a "set state" stanza at the >> front. Begin by reading a byte; if it is lower than a certain >> threshold, set some state according to what byte it is, and repeat. >> Continue until the byte is above a certain threshold. >> >> This allows AFL to compact any given test case much smaller; to the >> point where now it reckons there is not a single byte of the test file >> which becomes irrelevant. Testing have shown that this option both >> allows AFL to reach coverage much faster, and to have a total coverage >> higher than with the old format. >> >> Make this an option (rather than a unilateral change) to enable >> side-by-side performance comparison of the old and new formats. >> >> Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx> > > I continue to think this is a bad idea. You are taking a genuine > problem and adding a complicated algorithm to try and fool alf, rather > than fixing the problem. > > The reason 87% of input is irrelevant is because it really is. The > input state is full of 64bit values being used for a one or two bits > which we ever look at. My take on it is this. The state struct is very large -- I think it's around 500 bytes without the FPU state, and over 1k with it. Nearly every bit of the state has *some* instruction that interacts with it. But in order for AFL to notice that, it has to find a combination <state, instruction> such that the instruction and the state actually interact. For any given <state, instruction> tuple, changing the vast majority of the state will have absolutely no effect -- meaning that AFL's fuzzing is almost entirely wasted. AFL will spend a huge amount of time fuzzing bits of the state struct that cannot possibly have an effect on the outcome, since the instuctions in the test case don't interact with those bits at all. With the "compact" input interpretation, once AFL finds a <offset, content, instruction> tuple that correspond, changing any of the three is pretty much guaranteed to have some effect; finding a set that correspond should be much easier, and in any case fuzzing it should be a lot faster. As such, I don't think I'm trying to "fool" AFL: I'm just giving it tools to search more effectively. > The solution to this problem is remove the irrelevant information from > fuzz_corpus. I already started doing this with the alf-fast work for > the Xen 4.9 release, but I've basically been doing security work ever > since and haven't had time to continue it. > > For the record, this hunk is how I intended to continue the work: > > diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c > b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c > index 74e8c85..dafe435 100644 > --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c > +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c > @@ -24,7 +24,27 @@ > /* Layout of data expected as fuzzing input. */ > struct fuzz_corpus > { > - unsigned long cr[5]; > + /* %cr0 */ > + bool pe:1; > + bool mp:1; > + bool em:1; > + bool ts:1; > + bool pg:1; > + > + /* %cr4 */ > + bool vme:1; > + bool pvi:1; > + bool tsd:1; > + bool osfxsr:1; > + bool osxmmexcpt:1; > + bool umip:1; > + bool fsgsbase:1; > + bool osxsave:1; > + > + /* EFER */ > + bool sce:1; > + bool lme:1; I'm 98% sure that the handful of bits in cr0 and cr4 aren't the problem. AFL isnt' looking at *bits* in the cr0 register, it's looking at that as an interger, and I'm sure that it notices, "I add X to this and it changes stuff, so it must be used." The problem is these: > uint64_t msr[MSR_INDEX_MAX]; > struct cpu_user_regs regs; > struct segment_register segments[SEG_NUM]; And in particular this one: char fxsave[512] __attribute__((aligned(16))); Your proposal doesn't change the fact that for any given test case, the vast majority of state won't interact with the instructions it contains at all. I've still got the scripts and analysis stuff, so if you send me a patch you think is better, I'm happy to run it through and add it to the comparison. In any case, prediction based on expert intuition is good, but empirical evidence is better*. The graph I sent clearly shows that the 'compact' format, with unlimited number of instructions, generates far more map coverage in far less time than any other combination. Unless you can find some flaw in my methodology, or an alternate interpretation of the data, I think we should go with the more 'compact' format. The old format is still available, and it would be easy to switch the default back (or entirely remove the 'compact' format) anytime it is shown empirically to be more effective. -George * For instance, if I hadn't done these graphs, I would have suggested an instruction limit of 2 or 3, because my intiution told me that unlimited instructions wasted AFL's time. Which it does in the old format, but apparently does not in the new. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |