[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] event delay issue on SMP machine when xen0 is SMP enabled



Hi Ian/Keir,
We found event delay issue on SMP machine when xen0 is SMP enabled, the
worse case is that sometimes seems like it's lost, the phenomena is:
When we start VMX domain on 64bit SMP xen0 on a SMP system with 16
processors, most cases, QEMU device model window pops up with black
screen and stops there, "xm list" shows that VMX domain is in block
state.  If we use "info vmxiopage" in QEMU command line, it reports that
the IO request state is 1, STATE_IOREQ_READY state.  I added printf just
after select() in QEMU DM main loop, and found evtchn fd never got
readable, that's to say, QEMU DM never got notified by the IO request
from VMX domain.
The root cause is:
1) QEMU DM does an evtchn interdoamin bind and a port number greater
than 63 is allocated on SMP xen0 on a big SMP machine, here it's 65, and
in xen HV, this will notify vcpu0 of xen0:
            /*
             * We may have lost notifications on the remote unbound
port. Fix that up
             * here by conservatively always setting a notification on
the local port.
             */
            evtchn_set_pending(ld->vcpu[lchn->notify_vcpu_id], lport);in
evtchn_set_pending:

Then in evtchn_set_pending:
            /* These four operations must happen in strict order. */
            if ( !test_and_set_bit(port, &s->evtchn_pending[0]) &&
                 !test_bit        (port, &s->evtchn_mask[0])    &&
                 !test_and_set_bit(port / BITS_PER_LONG,
                           &v->vcpu_info->evtchn_pending_sel) &&
                 !test_and_set_bit(0,
&v->vcpu_info->evtchn_upcall_pending) )
            {
                evtchn_notify(v);
            }

If the port is not masked, bit 1 in evtchn_pending_sel of vcpu0 will be
set, this is a typical case.  But when doing interdomain evtchn binding,
this port is masked, so bit 1 in evtchn_pending_sel of vcpu0 is not set.

2) just after returning from xen HV, this port will be unmasked in xen0
kernel:
        static inline void unmask_evtchn(int port)
        {
                shared_info_t *s = HYPERVISOR_shared_info;
                vcpu_info_t *vcpu_info =
&s->vcpu_info[smp_processor_id()];

                synch_clear_bit(port, &s->evtchn_mask[0]);

                /*
                 * The following is basically the equivalent of
'hw_resend_irq'. Just
                 * like a real IO-APIC we 'lose the interrupt edge' if
the channel is
                 * masked.
                 */
                if (synch_test_bit(port, &s->evtchn_pending[0]) && 
                    !synch_test_and_set_bit(port / BITS_PER_LONG,
        
&vcpu_info->evtchn_pending_sel)) {
                        vcpu_info->evtchn_upcall_pending = 1;
                        if (!vcpu_info->evtchn_upcall_mask)
                                force_evtchn_callback();
                }
        }

But this is done on the current vcpu, for most cases, it's not vcpu0, so
this event is notified on the current vcpu, not vcpu0, and bit 1 in
evtchn_pending_sel of current vcpu is set.

3) however, this event won't be handled on the current vcpu.
        asmlinkage void evtchn_do_upcall(struct pt_regs *regs)
        {
                unsigned long  l1, l2;
                unsigned int   l1i, l2i, port;
                int            irq, cpu = smp_processor_id();
                shared_info_t *s = HYPERVISOR_shared_info;
                vcpu_info_t   *vcpu_info = &s->vcpu_info[cpu];
        
                vcpu_info->evtchn_upcall_pending = 0;
        
                /* NB. No need for a barrier here -- XCHG is a barrier
on x86. */
                l1 = xchg(&vcpu_info->evtchn_pending_sel, 0);
                while (l1 != 0) {
                        l1i = __ffs(l1);
                        l1 &= ~(1UL << l1i);
        
                        while ((l2 = active_evtchns(cpu, s, l1i)) != 0)
{
                                l2i = __ffs(l2);
                                l2 &= ~(1UL << l2i);
            
                                port = (l1i * BITS_PER_LONG) + l2i;
                                if ((irq = evtchn_to_irq[port]) != -1)
                                        do_IRQ(irq, regs);
                                else
                                        evtchn_device_upcall(port);
                        }
                }
        }

This is because on a SMP kernel active_evtchns is defined as:
#define active_evtchns(cpu,sh,idx)              \
        ((sh)->evtchn_pending[idx] &            \
         cpu_evtchn_mask[cpu][idx] &            \
         ~(sh)->evtchn_mask[idx])

While cpu_evtchn_mask is initialized as:
        static void init_evtchn_cpu_bindings(void)
        {
                /* By default all event channels notify CPU#0. */
                memset(cpu_evtchn, 0, sizeof(cpu_evtchn));
                memset(cpu_evtchn_mask[0], ~0,
sizeof(cpu_evtchn_mask[0]));
        }

So vcpu other than vcpu0 won't handle it, even it sees this event is
pending there, and it won't be delivered to the evtchn device.
Only under some cases, bit 1 in evtchn_pending_sel of vcpu0 is set
because some other port in this select area is notified, this event will
be delivered, but if we are unlucky, bit 1 in evtchn_pending_sel of
vcpu0 is not set by chance of any other port notification, we get stuck
there and seems this event is lost, though it still may be delivered to
evtchn device in unknow time :-(

The reason why we didn't meet this problem before is that, we got an
event port smaller than 64 on most machines, and bit 0 in
evtchn_pending_sel of vcpu0 is very likely to be set, since this is a
hot event port erea.

We do need fix this issue because this is also a common issue in complex
environment. Actually that will also be a performance issue, even when
event channel port is <64 since it depends on other events to get
notified.

BTW, why vcpu other than vcpu0 won't handle event by default?

thanks
-Xin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.