[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8 for-xen-4.5 2/2] dpci: Replace tasklet with an softirq (v8)



>>> On 21.10.14 at 19:19, <konrad.wilk@xxxxxxxxxx> wrote:
> +/*
> + * These two bit states help to safely schedule, deschedule, and wait until
> + * the softirq has finished.
> + *
> + * The semantics behind these two bits is as follow:
> + *  - STATE_SCHED - whoever clears it has to ref-count the domain (->dom).

s/clears/modifies/

> + *  - STATE_RUN - only the softirq is allowed to set and clear it. If it has
> + *      been set the hvm_dirq_assist will RUN with an saved value of the

s/ an / a / (and perhaps also ditch the first "the" on that line,
and the similarly further down)

> +    STATE_SCHED, /* Bit 0 */

Bogus comment (effectively re-stating what the language
specification says)?

> +/*
> + * Should only be called from hvm_do_IRQ_dpci. We use the
> + * 'state' as an gate to thwart multiple interrupts being scheduled.

s/ an / a /

> + *
> + * The 'state' is cleared by 'softirq_dpci' when it has
> + * completed executing 'hvm_dirq_assist' or by 'pt_pirq_softirq_reset'
> + * if we want to try to unschedule the softirq before it runs.
> + *

Stray blank comment line.

> +/*
> + * If we are racing with softirq_dpci (state is still set) we return
> + * -ERESTART. Otherwise we return 0.
> + *
> + *  If it is -ERESTART, it is the callers responsibility to make sure
> + *  that the softirq (with the event_lock dropped) has ran. We need
> + *  to flush out the outstanding 'dpci_softirq' (no more of them
> + *  will be added for this pirq as the IRQ action handler has been
> + *  reset in pt_irq_destroy_bind).
> + */
> +int pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
> +{
> +    if ( pirq_dpci->state & (STATE_RUN | STATE_SCHED) )
> +        return -ERESTART;
> +
> +    /*
> +     * If in the future we would call 'raise_softirq_for' right away
> +     * after 'pt_pirq_softirq_active' we MUST reset the list (otherwise it
> +     * might have stale data).
> +     */
> +    return 0;
> +}

Having this return -ERESTART and 0 rather than a simple boolean
is kind of odd as long as there are no other errors possible here.

> +static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
> +{
> +    struct domain *d = pirq_dpci->dom;
> +
> +    ASSERT(spin_is_locked(&d->event_lock));
> +    /*
> +     * The reason it is OK to reset 'dom' when STATE_RUN bit is set is due
> +     * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom' in
> +     * a local variable before it sets STATE_RUN - and therefore will not
> +     * dereference '->dom' which would result in a crash.
> +     */
> +    if ( test_bit(STATE_RUN, &pirq_dpci->state) )
> +    {
> +        pirq_dpci->dom = NULL;
> +        return;
> +    }
> +    /*
> +     * We are going to try to de-schedule the softirq before it goes in
> +     * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'.
> +     */
> +    if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->state) )
> +    {
> +        put_domain(d);
> +        pirq_dpci->dom = NULL;
> +    }

Would it not be easier to follow if instead of the two if()-s you
used switch(cmpxchg(..., STATE_SCHED, 0)) here?

> @@ -128,6 +234,21 @@ int pt_irq_create_bind(
>      }
>      pirq_dpci = pirq_dpci(info);
>      /*
> +     * A crude 'while' loop with us dropping the spinlock and giving
> +     * the softirq_dpci a chance to run.
> +     * We MUST check for this condition as the softirq could be scheduled
> +     * and hasn't run yet. We do this up to one second at which point we
> +     * give up. Note that this code replaced tasklet_kill which would have
> +     * spun forever and would do the same thing (wait to flush out
> +     * outstanding hvm_dirq_assist calls.
> +     */

Stale comment - there's no 1s timeout here anymore.

> +    if ( pt_pirq_softirq_active(pirq_dpci) )
> +    {
> +        spin_unlock(&d->event_lock);
> +        process_pending_softirqs();

ASSERT_NOT_IN_ATOMIC() between these two (the assertion
process_pending_softirqs() does seems too weak for the purposes
here, as we really shouldn't be holding any other spin locks; otoh
it's not really clear to me why that aspect is different from
do_softirq() - just fired off a separate mail)?

> @@ -400,8 +535,13 @@ int pt_irq_destroy_bind(
>          msixtbl_pt_unregister(d, pirq);
>          if ( pt_irq_need_timer(pirq_dpci->flags) )
>              kill_timer(&pirq_dpci->timer);
> -        pirq_dpci->dom   = NULL;
>          pirq_dpci->flags = 0;
> +        /*
> +         * Before the 'pirq_guest_unbind' had been called an interrupt could
> +         * have been scheduled. No more of them are going to be scheduled 
> after
> +         * that but we must deal with the one that were put in the queue.
> +         */

This is the second of third instance of this or a similar comment.
Please have it just once in full, and have the other(s) simply refer
to the full one.

> @@ -652,3 +773,81 @@ void hvm_dpci_eoi(struct domain *d, unsigned int 
> guest_gsi,
>  unlock:
>      spin_unlock(&d->event_lock);
>  }
> +
> +static void dpci_softirq(void)
> +{
> +    struct hvm_pirq_dpci *pirq_dpci;
> +    unsigned int cpu = smp_processor_id();
> +    LIST_HEAD(our_list);
> +
> +    local_irq_disable();
> +    list_splice_init(&per_cpu(dpci_list, cpu), &our_list);
> +    local_irq_enable();
> +
> +    while ( !list_empty(&our_list) )
> +    {
> +        struct domain *d;

Considering that you already have one non-function scope variable
here, please use the other one only needed inside this loop
(pirq_dpci) here too.

> +
> +        pirq_dpci = list_entry(our_list.next, struct hvm_pirq_dpci, 
> softirq_list);
> +        list_del(&pirq_dpci->softirq_list);
> +
> +        d = pirq_dpci->dom;
> +        smp_wmb(); /* 'd' MUST be saved before we set/clear the bits. */

So you state the right thing, but use the wrong barrier: To order a
read and a write, smp_mb() is your only choice.

> -void pci_release_devices(struct domain *d)
> +int pci_release_devices(struct domain *d)
>  {
>      struct pci_dev *pdev;
>      u8 bus, devfn;
> +    int ret;
>  
>      spin_lock(&pcidevs_lock);
> -    pci_clean_dpci_irqs(d);
> +    ret = pci_clean_dpci_irqs(d);
> +    if ( ret == -EAGAIN )

-ERESTART?

> --- a/xen/include/xen/hvm/irq.h
> +++ b/xen/include/xen/hvm/irq.h
> @@ -93,13 +93,13 @@ struct hvm_irq_dpci {
>  /* Machine IRQ to guest device/intx mapping. */
>  struct hvm_pirq_dpci {
>      uint32_t flags;
> -    bool_t masked;
> +    unsigned long state;

I think "unsigned int" would be sufficient here?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.