[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] ITS emulation race conditions





On 11/04/17 10:24, Julien Grall wrote:
Hi Stefano,

On 04/11/2017 12:01 AM, Stefano Stabellini wrote:
On Mon, 10 Apr 2017, Andre Przywara wrote:
Hi,

On 10/04/17 00:12, André Przywara wrote:
Hi,

I wanted to run some ideas on how to prevent the race conditions we are
facing with the ITS emulation and removing devices and/or LPIs.
I think Stefano's idea of tagging a discarded LPI is the key, but still
some details are left to be solved.
I think we are dealing with two issues:
1) A guest's DISCARD can race with in incoming LPI.
Ideally DISCARD would remove the host LPI -> vLPI connection (in the
host_lpis table), so any new incoming (host) LPI would simply be
discarded very early in gicv3_do_LPI() without ever resolving into a
virtual LPI. Now while this removal is atomic, we could have just
missed
an incoming LPI, so the old virtual LPI would still traverse down the
VGIC with a "doomed" virtual LPI ID.
I wonder if that could be solved by a "crosswise" check:
- The DISCARD handler *first* tags the pending_irq as "UNMAPPED",
*then*
removes the host_lpis connectin.
- do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag
in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do
that).
With this setup the DISCARD handler can assume that no new virtual LPI
instances enter the VGIC afterwards.

2) An unmapped LPI might still be in use by the VGIC, foremost might
still be in an LR.
Tagging the pending_irq should solve this in general. Whenever a VGIC
function finds the UNMAPPED tag, it does not process the vIRQ, but
retires it. For simplicity we might limit this to handling when a VCPU
exists and an LR gets cleaned up: If we hit a tagged pending_irq here,
we clean the LR, remove the pending_irq from all lists and signal the
ITS emulation that this pending_irq is now ready to be removed (by
calling some kind of cleanup_lpi() function, for instance).
The ITS code can then remove the struct pending_irq from the radix
tree.
MAPD(V=0) is now using this to tag all still mapped events as
"UNMAPPED", the counting down the "still alive" pending_irqs in
cleanup_lpi() until they reach zero. At this point it should be safe to
free the pend_irq array.

Does that sound like a plan?
Do I miss something here? Probably yes, hence I am asking ;-)

So there are two issues with this:
- For doing the LPI cleanup, we would need to call a virtual ITS or
virtual LPI specific function directly from gic.c. This looks like bad
coding style, as it breaks the abstraction (calling a virtual LPI/ITS
specific function from within gic.c)

This is just code organization, I am not worried about it. We might have
to register cleanup functions. The real problem to solve is below.


- If we have a "DISCARD; MAPTI" sequence, targeting the same vLPI, while
this vLPI is still in an LR (but not cleaned up until both commands have
been handled), we end up with using the wrong pending_irq struct (the
new one). (I assume the UNMAPPED tag would be cleared upon the new
MAPTI).

It looks like "DISCARD; MAPTI" would be a problem even if we go with
"GIC: Add checks for NULL pointer pending_irq's", because instead of a
NULL pending_irq, we could get the new pending_irq, the one backing the
same vlpi but a different eventid.


Can this create problems?
I see two possibilities:
a) the old vLPI is still pending in the LR: as the new LR would be
pristine, the code in update_one_lr() would just clean the QUEUED bit,
but not do anything further. The LR would be kept as used by this vLPI,
with the state still set to GICH_LR_PENDING. Upon re-entering the VCPU
this would make the new vLPI pending.
b) the old vLPI has been EOIed: the new LR would be pristine, the code
in update_one_lr() would try to clean it up anyway, which should not
hurt, AFAICT.

This cannot be allowed to happen. We have to keep a consistent internal
state at all times: we cannot change the vlpi mapping in pend_lpi_tree
before the old vlpi is completed discarded. We cannot have a vlpi marked
as "UNMAPPED", but still alive in an LR, while vlpi_to_pending already
returns the new vlpi.

Well, in hardware LPI are handled by the re-distributor and ITS is only acting as a walker to find the mapping (DeviceID, EventID) -> LPI.

In real hardware, a LPI may be received afterwards if the re-distributor already inject it. The OS will receive the LPI and have to deal with it, potentially re-directing to the new interrupt handler and not the old.

The problem is the same here, an LPI may be in LR whilst the DISCARD happen. I think this is too late and the guest should receive it.


We have a number of possible approaches, I'll mark them with [letter].


[a] On DISCARD we do everything we can to remove the vlpi from
everywhere:

- if pending_irq is in lr_queue, remove it from the list
- if the vlpi is in an LR register as pending (not active, see below),
  remove it from the LR (see the code in gic_restore_pending_irqs, under
  "No more free LRs: find a lower priority irq to evict")
- remove pending_irq from inflight
- remove pending_irq from pend_lpi_tree

At this stage there should be no more traces of the vlpi/pending_irq
anywhere.

What do we do if a "DISCARD; MAPTI" pair of commands is issued while the
old vlpi is still ACTIVE in an LR?  ACTIVE means that the guest is still
handling the irq and hasn't even EOIed it yet. I know that physical LPIs
have no active state, but what about virtual LPIs? I am tempted to say
that in any case for simplicity we could just remove the vlpi from the
LR ("evict") anyway. I don't think this case should happen with a well
behaved guest anyhow.

But the other issue is that "DISCARD, MAPTI" could be done on a
different vcpu, compared to the one having the old vlpi in an LR. In
this case, we would have to send an SGI to the right pcpu and clear the
LR, which is a royal pain in the neck because the other vcpu could not
even be running. Also another MAPTI could come in on a different pcpu
while we haven't completed the first LR removal. There might be races.
Let's explore other options.


[b] On DISCARD we would have to:
- if pending_irq is in lr_queue, remove it from the list
- if the vlpi is in an LR register:
  - mark as UNMAPPED (this check can be done from another vcpu)
  - remove from inflight (it could also be done in gic_update_one_lr)

Then when the guest EOIs the interrupt, in gic_update_one_lr:
- clear LR
- if UNMAPPED
  - clear UNMAPPED and return
  - remove old pending_irq from pend_lpi_tree (may not be necessary?)


Now the problem with this approach, like you pointed out, is: what do we
do if the guest issues a MAPTI before EOIing the interrupt?


[c] The simplest thing to do is ignore it (or report error back to the
guest) and printk a warning if the old vlpi is still UNMAPPED. It could
be a decent solution for Xen 4.9.

I don't think this is a decent solution for Xen 4.9. It could be easily
trigger and it is possible to have "DISCARD MAPTI" in the queue.



[d] Otherwise, we can try to handle it properly. On DISCARD:
- if pending_irq is in lr_queue, remove it from the list
- if vlpi is in LR:
  - mark as UNMAPPED
  - remove from inflight

On MAPTI:
if old pending_irq is UNMAPPED:
  - clear UNMAPPED in old pending_irq
  - add UNMAPPED to new pending_irq
- remove old pending_irq from pend_lpi_tree
- add new pending_irq to pend_lpi_tree

Then when the guest EOIs the interrupt, in gic_update_one_lr:
- clear LR
- if UNMAPPED
  - clear UNMAPPED and return

So who is removing the pending_irq from the radix_tree? If you do in
both MAPTI and gic_update_one_lr, you will end up in a potential race
condition where MAPTI add the new on in the radix_tree and
gic_clear_update will remove from the radix_tree. So the mapping will
disappear.

It was not easily solvable on the migration case, and I don't think this
is different here. Also, the new MAPTI may use a different collection
(vCPU ID) so how do you protect that correctly?

Thinking a bit more, I think there is another race with your suggestion between gic_update_one_lr and vgci_vcpu_inject_irq.

If I understand correctly your suggestion, UNMAPPED will clear the lrs and return. However, you may have receive a new interrupt just before clearing the LRs.

Do you expect vgic_vcpu_inject_irq to clear UNMAPPED? If so, how the function will know if this is the new pending_irq or the previous and should be ignored?



Among these approaches, [b]+[c] would be OK for 4.9. Or [d]. [a] looks
simple but it is actually difficult to do correctly.


So I would like to know how to proceed here:
I) stick to the tag, and fix that particular case above by checking for
the "inflight && ENABLED" condition and clearing the LR if the
pending_irq does not state it's pending anymore
II) introduce a NO_LONGER_PENDING tag in addition to the UNMAPPED tag,
where a new MAPTI just clears UNMAPPED_LPI, but keeps NO_LONGER_PENDING.
NO_LONGER_PENDING would be checked and cleared by update_one_lr(),
UNMAPPED by vgic_vcpu_inject_irq().
That would leave the issue how to call the pend_irq_cleanup() function
from update_one_lr() without breaking abstraction

If MAPTI is called with a different eventid but the same vlpi, wouldn't
vlpi_to_pending return the wrong pending_irq struct in
gic_update_one_lr, causing problems to both I) and II)? The flags would
not be set.


III) Revert to the "check for NULL" solution.

Wouldn't irq_to_pending potentially return the new pending_irq instead
of NULL in gic_update_one_lr? This is a problem. For example, the new
pending_irq is not in the inflight list, but gic_update_one_lr will try
to remove it from it.

I think this approach is error prone: we should always get the right
pending_irq struct, or one that is appropriately marked with a flag, or
NULL. In this case, we would still have to mark the new pending_irq as
"UNMAPPED" to tell gic_update_one_lr to do nothing and return. It would
end up very similar to [d]. We might as well do [d].

I think you still have to handle the NULL case because you may receive a
spurious interrupt from the host before whilst cleaning up the mapping.

The mapping may pLPI <-> vLPI may still exist, but when you get into
vgic_vcpu_inject_irq the irq_to_pending will return NULL as it does not
exist anymore in the radix tree.

Cheers,


--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.