|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
On 08.02.21 15:20, Julien Grall wrote: Hi Juergen, On 08/02/2021 13:58, Jürgen Groß wrote:On 08.02.21 14:09, Julien Grall wrote:Hi Juergen, On 08/02/2021 12:31, Jürgen Groß wrote:On 08.02.21 13:16, Julien Grall wrote:On 08/02/2021 12:14, Jürgen Groß wrote:On 08.02.21 11:40, Julien Grall wrote:Hi Juergen, On 08/02/2021 10:22, Jürgen Groß wrote:On 08.02.21 10:54, Julien Grall wrote:... I don't really see how the difference matter here. The idea is to re-use what's already existing rather than trying to re-invent the wheel with an extra lock (or whatever we can come up).The difference is that the race is occurring _before_ any IRQ isinvolved. So I don't see how modification of IRQ handling would help.
vCPU0 | vCPU1
|
| Call xen_rebind_evtchn_to_cpu()
receive event X |
| mask event X
| bind to vCPU1
<vCPU descheduled> | unmask event X
|
| receive event X
|
| handle_fasteoi_irq(X)
| -> handle_irq_event()
| -> set IRQD_IN_PROGRESS
| -> evtchn_interrupt()
| -> evtchn->enabled = false
| -> clear IRQD_IN_PROGRESS
handle_fasteoi_irq(X)|
-> evtchn_interrupt()|
-> WARN() |
| xen_irq_lateeoi(X)
Note that are are other IRQ flows existing. We should have a look at them before trying to fix thing ourself.Fine with me, but it either needs to fit all use cases (interdomain, IPI, real interrupts) or we need to have a per-type IRQ flow.AFAICT, we already used different flow based on the use cases. Before 2011, we used to use the fasteoi one but this was changed by the following commit: Yes, I know that. I think we should fix the issue locally first, then we can start to do a thorough rework planning. Its not as if the needed changes with the current flow would be so huge, and I'd really like to have a solution rather sooner than later. Changing the IRQ flow might have other side effects which need to be excluded by thorough testing.I agree that we need a solution ASAP. But I am a bit worry to: 1) Add another lock in that event handling path. Regarding complexity: it is very simple (just around masking/unmasking of the event channel). Contention is very unlikely. 2) Add more complexity in the event handling (it is already fairly difficult to reason about the locking/race)Let see what the local fix look like. Yes. Although, the other issue I can see so far is handle_irq_for_port() will update info->{eoi_cpu, irq_epoch, eoi_time} without any locking. But it is not clear this is what you mean by "becoming active".As long as a single event can't be handled on multiple cpus at the same time, there is no locking needed.Well, it can happen in the current code (see my original scenario). If your idea fix it then fine. I hope so. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |