[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 11/13] fuzz/x86_emulate: Add --rerun option to try to track down instability



On 10/04/2017 09:27 AM, Jan Beulich wrote:
>>>> On 25.09.17 at 16:26, <george.dunlap@xxxxxxxxxx> wrote:
>> --- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
>> +++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
>> @@ -14,6 +14,7 @@ extern unsigned int fuzz_minimal_input_size(void);
>>  static uint8_t input[INPUT_SIZE];
>>  
>>  extern bool opt_compact;
>> +extern bool opt_rerun;
> 
> Seeing a second such variable appear, I think it would really be better
> to introduce a local header, included by both producer and consumer.

Yes, sounds good.

> 
>> @@ -886,21 +896,138 @@ int LLVMFuzzerInitialize(int *argc, char ***argv)
>>      return 0;
>>  }
>>  
>> -int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
>> +void setup_fuzz_state(struct fuzz_state *state, const uint8_t *data_p, 
>> size_t size)
> 
> static (also for other new helper functions)?

Ack

> 
>>  {
>> -    struct fuzz_state state = {
>> -        .ops = all_fuzzer_ops,
>> -    };
>> -    struct x86_emulate_ctxt ctxt = {
>> -        .data = &state,
>> -        .regs = &state.regs,
>> -        .addr_size = 8 * sizeof(void *),
>> -        .sp_size = 8 * sizeof(void *),
>> -    };
>> +    memset(state, 0, sizeof(*state));
>> +    state->corpus = (struct fuzz_corpus *)data_p;
> 
> Please don't cast away constness here. Perhaps best to make the parameter
> const void *, allowing for the cast to be dropped altogether.

Didn't notice that I was casting away const-ness, because state->corpus
is const. :-)  But I like `const void *`.

> 
>> +int runtest(struct fuzz_state *state) {
>>      int rc;
>>  
>> -    /* Reset all global state variables */
>> -    memset(&input, 0, sizeof(input));
>> +    struct x86_emulate_ctxt *ctxt = &state->ctxt;
>> +    
>> +    state->ops = all_fuzzer_ops;
>> +
>> +    ctxt->data = state;
>> +    ctxt->regs = &state->regs;
>> +    ctxt->addr_size = ctxt->sp_size = 8 * sizeof(void *);
> 
> Is this actually necessary? I don't see a way for set_sizes() to be
> bypassed.

I was just duplicating the functionality that was already there (these
were initialized at declaration).  My instinct is to want to leave these
initialize these for safety, but the code definitely should call
set_sizes() first, so I'll take this out.

> 
>> +void compare_states(struct fuzz_state state[2])
>> +{
>> +    // First zero any "internal" pointers
>> +    state[0].corpus = state[1].corpus = NULL;
>> +    state[0].ctxt.data = state[1].ctxt.data = NULL;
>> +    state[0].ctxt.regs = state[1].ctxt.regs = NULL;
>> +
>> +    
> 
> No double blank lines please.
> 
>> +    if ( memcmp(&state[0], &state[1], sizeof(struct fuzz_state)) )
>> +    {
>> +        int i;
> 
> unsigned int (and then %u in the format strings below)

Is there really an advantage to specifying 'unsigned int' for something
like a loop variable?  It hardly seems worth the effort to consider
signed / unsigned in such a case.

>> +        printf("State mismatch\n");
>> +
>> +        for ( i=0; i<5; i++)
> 
> Blanks missing and please use ARRAY_SIZE() again (also further down).

Ack.

> 
>> +            if ( state[0].cr[i] != state[1].cr[i] )
>> +                printf("cr[%d]: %lx != %lx\n",
>> +                       i, state[0].cr[i], state[1].cr[i]);
>> +        
>> +        for ( i=0; i<MSR_INDEX_MAX; i++)
>> +            if ( state[0].msr[i] != state[1].msr[i] )
>> +                printf("msr[%d]: %lx != %lx\n",
>> +                       i, state[0].msr[i], state[1].msr[i]);
>> +        
>> +        for ( i=0; i<SEG_NUM; i++)
>> +            if ( memcmp(&state[0].segments[i], &state[1].segments[i],
>> +                        sizeof(state[0].segments[0])) )
>> +                printf("segments[%d] differ!\n", i);
> 
> The actual values would likely be helpful to be printed here, just like
> you do for all other state elements.

Sure.

> 
>> +        if ( state[0].data_num != state[1].data_num )
>> +            printf("data_num: %lx !=  %lx\n", state[0].data_num,
>> +                   state[1].data_num);
>> +        if ( state[0].data_index != state[1].data_index )
>> +            printf("data_index: %lx !=  %lx\n", state[0].data_index,
>> +                   state[1].data_index);
>> +
>> +        if ( memcmp(&state[0].regs, &state[1].regs, sizeof(state[0].regs)) )
>> +        {
>> +            printf("registers differ!\n");
>> +            /* Print If Not Equal */
>> +#define PINE(elem)\
>> +            if ( state[0].elem != state[1].elem ) \
>> +                printf(#elem " differ: %lx != %lx\n", \
>> +                       (unsigned long)state[0].elem, \
>> +                       (unsigned long)state[1].elem)
>> +            PINE(regs.r15);
>> +            PINE(regs.r14);
>> +            PINE(regs.r13);
>> +            PINE(regs.r12);
>> +            PINE(regs.rbp);
>> +            PINE(regs.rbx);
>> +            PINE(regs.r10);
>> +            PINE(regs.r11);
>> +            PINE(regs.r9);
>> +            PINE(regs.r8);
>> +            PINE(regs.rax);
>> +            PINE(regs.rcx);
>> +            PINE(regs.rdx);
>> +            PINE(regs.rsi);
>> +            PINE(regs.rdi);
>> +
>> +            for ( i = offsetof(struct cpu_user_regs, error_code) / 
>> sizeof(unsigned);
>> +                  i < sizeof(state[1].regs)/sizeof(unsigned); i++ )
>> +            {
>> +                printf("[%04lu] %08x %08x\n",
> 
> I think this wants to be %04zu (or perhaps better %4zu or %04zx). Same
> for ctxt printing further down.
> 
>> @@ -908,7 +1035,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data_p, 
>> size_t size)
>>          return 1;
>>      }
>>  
>> -    if ( size > sizeof(input) )
>> +    if ( size > sizeof(struct fuzz_corpus) )
> 
> What's the difference between the two variants?

One fewer 'dereferences'.  Rather than input -> struct fuzz_corpus ->
[structure definition], you can just do struct fuzz_corpus -> [structure
definition].

>> @@ -916,25 +1043,24 @@ int LLVMFuzzerTestOneInput(const uint8_t *data_p, 
>> size_t size)
>>  
>>      memcpy(&input, data_p, size);
>>  
>> -    state.corpus = &input;
>> -    state.data_num = size;
>> -
>> -    setup_state(&ctxt);
>> +    setup_fuzz_state(&state[0], data_p, size);
>> +    
>> +    if ( opt_rerun )
>> +        printf("||| INITIAL RUN |||\n");
>> +    
>> +    runtest(&state[0]);
>>  
>> -    sanitize_input(&ctxt);
>> +    if ( !opt_rerun )
>> +        return 0;
> 
> Could I talk you into inverting the condition such that there'll be
> only a single "return 0" at the end of the function?

Why is that valuable?

If I don't return here, then the rerun code has to be indented, which to
me makes it slightly more difficult to see that it's identical to the
state setup & running code above.

> And then - has this patch actually helped pinpoint any problems? The
> ones deaslt with by the next patch perhaps?

Yes, it helped find the one dealt with in the subsequent patch.
Surprisingly, it didn't find anything else.

Since the patch represented a non-trivial amount of work, I thought it
might be useful to include so nobody would have to re-implement it again
in the future.  But I'd also be happy discarding this patch, as it's
fairly invasive and I don't expect it to be used very often.

Let me know what you think so I know whether to try to print something
sensible for the segment differences or not bother. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.