Xen project Mailing List

Re: [Xen-devel] [Help] Trigger Watchdog when adding an IPI in vcpu_wake

On Sun, 2016-09-18 at 12:03 +0800, Wei Yang wrote: > Dario, > > Took a look into your code, some questions as below: > Haha, here you are! :-D > 1. vcpu_wake is split into two cases, the first case is "not in irq" > and "irq > enabled". Some reason for this classification? Maybe some background > knowledge > I need to have. > Mmm... yes. Well, probably that distinction should not be there, as it is a (potential, benchmark will tell if that's the case) optimization. And optimizations should be added later, with their own patch and patch description, etc... But as I said in the other email, the code is not even RFC. :-/ So, point is, the whole point of the patch series is to leave interrupts enabled inside scheduling functions, including vcpu_wake() (which is becoming _vcpu_wake()). In order to do that, we taking the scheduler lock, we'll start using spin_lock() and vcpu_spin_lock(), instead of vcpu_spinlock_irq[save](). But this also means that IRQs can't be disabled when we do that, or: - we violate Xen's lock consistency rule (there's ASSERT()s checking that in debug builds) - we risk re-enabling interrupts at the wrong time. So, if IRQs are disabled when vcpu_wake() is called, or we are in IRQ context and not disabling them expose us to races, we want to go the deferred way the patch series is implementing. However, if vcpu_wake() is called with IRQ enabled and not from IRQ context, none of the above risks stand, and we can avoid doing all the dance required for deferred wakeup, and just call _vcpu_wake(). I'm saying this is a (potential) optimization because deferring the wakeup has a cost, and this 'if' is trying to avoid paying such when not strictly necessary. But really, in a proper version of this series I should just always defer the wakeup in this patch, and have another patch, at the end of the series, where this attempt to avoid deferral is implemented, and run benchmarks with and without said patch applied. > 2. in vcpu_wake(), I see you raise def_vcpu_wake softirq when the > list is > empty. and then add the vcpu to the list. any reason for this > sequence? BTW, > at this moment vcpu_wake() holds the spin lock and > vcpu_wake_deferred() should > take the spin lock before continuing. > If the list is empty it means there is no one going through it and waking up vcpus from there. This means I need to raise the softirq that will cause someone to go check the list, so it will find the vcpu that I'm inserting there right now. OTOH, if the list is not empty, it means there is already someone going through the list itself. This means I don't need to raise anything, as the vcpu that I'm inserting in the list right now will be found by such waker, and woken up. This is for avoiding raising too many softirq and poking too many cpus to run their deferred wakers, which would introduce contention issue on the deferred list lock. The reason why the check is before the insertion is pure cosmetics. I can well insert first and check for list_is_singleton(). In the code you see there, that does not matter, because the deferred list has it's spinlock. But this will probably change if I get rid of that, making the list lockless. > In the last patch, I saw you introduce a flag wakeup_triggered, and > then > change the sequence a little. Why at this place, you want to change > the > sequence here? > It was another attempt of optimization. But again, this is all changing with the list becoming lockless. > 3. The main purpose of this series is to enable irq for "scheduler > handlers". > I am not sure the original initiative for this, while I guess this > could > increase the parallelism, right? And I think we have to make sure > those > functions are not re-entrant, otherwise they will face dead lock when > locking > on the same spin-lock. > Hopefully, I explained the purpose in my previous email. I'm not sure what you mean here with 're-entrant'. The only reason why we want IRQs to be disabled when taking a lock, is because we take that lock from both inside and outside of IRQ context. So, if we're sure that a lock is never taken from IRQ context, we can avoid disabling IRQs. And this is what this is all about: the scheduler lock is taken once from IRQ context, and this series is trying to get rid of that one case. > 4. This looks similar with what we try to achieve. While we try to > give the > vcpu to someone else, who we think may pick it, to wake it up, and in > your > implementation, you at this vcpu to its own list and wake it up some > time > later. > It does looks similar (this is the reason why I mentioned it to you :- )), although: - both the goal and the expectations are different, - I'm not really 'giving the vcpu to someone else', I'm _deferring_ it, i.e., it is still going to happen on the same cpu where the event happens, only a little later in time (when no longer in IRQ context) > I guess next step you want to do is to make the wakeup_defer.list a > lock-less list. > Yes, that would be the plan (and refreshing the series, of course). > 5. Well, this may not relevant to your code directly. I found some > difference > between raise_softirq() and cpumask_raise_softirq(). The first one > just set > pending bit, while the second one will send IPI. > Exactly. > __runq_tickle() use the > second one to send SCHEDULE softirq, this means other pcpu would > react > immediately? > I don't think there's much difference in observable behavior. Perhaps, in cpumask_raise_softirq(), if the current cpu is in the mast, it would be enough to set a pensing bit for it. However, this is something very different again, between 4.1 and upstream. In fact, check commits (not present in Xen 4.1): c47c316d99b3b570d0b "x86/HVM: batch vCPU wakeups" 9a727a813e9b25003e4 "x86: suppress event check IPI to MWAITing CPUs" So I can't really be sure or comment much on this. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.