|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Revisit VT-d asynchronous flush issue
Let's start a new thread with a summary of previous discussion, and
then our latest experiment data and updated proposal.
From previous discussions, it's suggested that a spin model is accepted,
only when spin timeout doesn't exceed the order of a scheduling time
slice, or other blocking operations like what WBINVD might take.
Otherwise async-flush model is preferred to prevent misbehaving guests
taking long spins if possible, to impact whole system.
Below are some thresholds to be considered:
1) scheduling time slice in Credit is 1ms.
2) WBINVD cost is 4.6ms in worst case on an IVT platform (32 cores,
10GB NIC assigned to the VM, running iperf). Detail data is append in
the bottom. Actual cost varies on different platforms, due to different
cache size/layout. For example, we also heard from other colleagues
about 10ms level cost on another platform.
3) PCI SIG strongly recommends that Completion Timeout mechanism
not expire in less than 10ms (PCIe 3.0 spec, 7.8.15, Device Capabilities
2 Register). It means CPU MMIO read might already take >10ms which
we just didn't note.
Based on above information, at least we can think a timeout range
between [1ms, 10ms] would likely not introduce bad system behavior.
Or conservatively, we can define the spin timeout default as 1ms,
while allowing boot-time override up to 10ms for more flexibility.
Then regarding to VT-d flush:
- For context/iotlb/iec flush, our measurements show worst cases
<10us. We also confirmed with hardware team, that 1ms is large
enough for IOMMU internal flush.
- For ATS device-TLB flush, PCI spec defines up to 60s, but:
* Our hardware team confirms that 1ms should be enough for
integrated PCI devices w/ ATS.
* for discrete PCI devices w/ ATS, it's uncertain whether 1ms
or 10ms is too restrictive to them, but there are only a few devices
now in the market.
Based on above information, we propose to continue spin-timeout
model w/ some adjustment, which fixes current timeout concern
and also allows limited ATS support in a light way:
1) reduce spin timeout to 1ms, which can be boot-time changed
up to 10ms.
2) if timeout expires, kill the VM which the target device is assigned
to. Optionally hypervisor may mark device non-assignable.
It works for devices w/o ATS. It works for integrated devices w/ ATS.
It might or might not work for discrete devices w/ ATS, but we can
re-evaluate the gain vs. software complexity of async flush until we
see many discrete devices breaking the timeout assumptions in the
future.
Thoughts?
----
<detail data>
Min(us) Max(us) Average(us)
context 5.24 5.49 5.36
iotlb 1.90 2.07 2.03
iec 5.54 7.86 6.58
wbinvd 2721.42 4655.71 3571.43
Platform info:
1. Base Board Information
Manufacturer: Intel Corporation
Product Name: S2600CP
Version: E99552-561
2. CPU:
cpu family : 6
model : 62
model name : Genuine Intel(R) CPU @ 2.80GHz
Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |