[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common



> -----Original Message-----
> From: Julien Grall <julien@xxxxxxx>
> Sent: 30 September 2020 18:48
> To: Oleksandr <olekstysh@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: paul@xxxxxxx; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@xxxxxxxx>; 
> 'Andrew Cooper'
> <andrew.cooper3@xxxxxxxxxx>; 'George Dunlap' <george.dunlap@xxxxxxxxxx>; 'Ian 
> Jackson'
> <ian.jackson@xxxxxxxxxxxxx>; 'Jan Beulich' <jbeulich@xxxxxxxx>; 'Stefano 
> Stabellini'
> <sstabellini@xxxxxxxxxx>; 'Wei Liu' <wl@xxxxxxx>; 'Roger Pau Monné' 
> <roger.pau@xxxxxxxxxx>; 'Jun
> Nakajima' <jun.nakajima@xxxxxxxxx>; 'Kevin Tian' <kevin.tian@xxxxxxxxx>; 'Tim 
> Deegan' <tim@xxxxxxx>;
> 'Julien Grall' <julien.grall@xxxxxxx>
> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> 
> Hi,
> 
> On 30/09/2020 14:39, Oleksandr wrote:
> >
> > Hi Julien
> >
> > On 25.09.20 11:19, Paul Durrant wrote:
> >>> -----Original Message-----
> >>> From: Julien Grall <julien@xxxxxxx>
> >>> Sent: 24 September 2020 19:01
> >>> To: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>;
> >>> xen-devel@xxxxxxxxxxxxxxxxxxxx
> >>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>; Andrew
> >>> Cooper <andrew.cooper3@xxxxxxxxxx>;
> >>> George Dunlap <george.dunlap@xxxxxxxxxx>; Ian Jackson
> >>> <ian.jackson@xxxxxxxxxxxxx>; Jan Beulich
> >>> <jbeulich@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>; Wei
> >>> Liu <wl@xxxxxxx>; Roger Pau
> >>> Monné <roger.pau@xxxxxxxxxx>; Paul Durrant <paul@xxxxxxx>; Jun
> >>> Nakajima <jun.nakajima@xxxxxxxxx>;
> >>> Kevin Tian <kevin.tian@xxxxxxxxx>; Tim Deegan <tim@xxxxxxx>; Julien
> >>> Grall <julien.grall@xxxxxxx>
> >>> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> >>>
> >>>
> >>>
> >>> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
> >>>> +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
> >>>> +{
> >>>> +    unsigned int prev_state = STATE_IOREQ_NONE;
> >>>> +    unsigned int state = p->state;
> >>>> +    uint64_t data = ~0;
> >>>> +
> >>>> +    smp_rmb();
> >>>> +
> >>>> +    /*
> >>>> +     * The only reason we should see this condition be false is
> >>>> when an
> >>>> +     * emulator dying races with I/O being requested.
> >>>> +     */
> >>>> +    while ( likely(state != STATE_IOREQ_NONE) )
> >>>> +    {
> >>>> +        if ( unlikely(state < prev_state) )
> >>>> +        {
> >>>> +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition
> >>>> %u -> %u\n",
> >>>> +                     prev_state, state);
> >>>> +            sv->pending = false;
> >>>> +            domain_crash(sv->vcpu->domain);
> >>>> +            return false; /* bail */
> >>>> +        }
> >>>> +
> >>>> +        switch ( prev_state = state )
> >>>> +        {
> >>>> +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> >>>> +            p->state = STATE_IOREQ_NONE;
> >>>> +            data = p->data;
> >>>> +            break;
> >>>> +
> >>>> +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} ->
> >>>> IORESP_READY */
> >>>> +        case STATE_IOREQ_INPROCESS:
> >>>> +            wait_on_xen_event_channel(sv->ioreq_evtchn,
> >>>> +                                      ({ state = p->state;
> >>>> +                                         smp_rmb();
> >>>> +                                         state != prev_state; }));
> >>>> +            continue;
> >>> As I pointed out previously [1], this helper was implemented with the
> >>> expectation that wait_on_xen_event_channel() will not return if the vCPU
> >>> got rescheduled.
> >>>
> >>> However, this assumption doesn't hold on Arm.
> >>>
> >>> I can see two solution:
> >>>      1) Re-execute the caller
> >>>      2) Prevent an IOREQ to disappear until the loop finish.
> >>>
> >>> @Paul any opinions?
> >> The ioreq control plane is largely predicated on there being no
> >> pending I/O when the state of a server is modified, and it is assumed
> >> that domain_pause() is sufficient to achieve this. If that assumption
> >> doesn't hold then we need additional synchronization.
> 
> I don't think this assumption even hold on x86 because domain_pause()
> will not wait for I/O to finish.
> 
> On x86, the context switch will reset the stack and therefore
> wait_on_xen_event_channel() is not going to return. Instead,
> handle_hvm_io_completion() will be called from the tail callback in
> context_switch(). get_pending_vcpu() would return NULL as the IOREQ
> server disappeared. Although, it is not clear whether the vCPU will
> continue to run (or not).
> 
> Did I miss anything?
> 
> Regarding the fix itself, I am not sure what sort of synchronization we
> can do. Are you suggesting to wait for the I/O to complete? If so, how
> do we handle the case the IOREQ server died?
> 

s/IOREQ server/emulator but that is a good point. If domain_pause() did wait 
for I/O to complete then this would have always been a problem so, with 
hindsight, it should have been obvious this was not the case.

Digging back, it looks like things would have probably been ok before 
125833f5f1f0 "x86: fix ioreq-server event channel vulnerability" because, 
wait_on_xen_event_channel() and the loop condition above it did not dereference 
anything that would disappear with IOREQ server destruction (they used the 
shared page, which at this point was always a magic page and hence part of the 
target domain's memory). So things have probably been broken since 2014.

To fix the problem I think it is sufficient that we go back to a wait loop that 
can tolerate the IOREQ server disappearing between iterations and deals with 
that as a completed emulation (albeit returning f's for reads and sinking 
writes).

  Paul

> > May I please clarify whether a concern still stands (with what was said
> > above) and we need an additional synchronization on Arm?
> 
> Yes the concern is still there (See above).
> 
> Cheers,
> 
> --
> Julien Grall




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.