[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v14 00/20] Introduce PVH domU support

On 04/11/13 16:59, Konrad Rzeszutek Wilk wrote:
On Mon, Nov 04, 2013 at 12:14:49PM +0000, George Dunlap wrote:
  - Fixed bugs in v14:
    Zombie domains, FreeBSD crash, Crash at 4GiB, HVM crash
    (Thank you to Roger Pau Mone for fixes to the last 3)
  - Completely eliminated PV emulation codepath

Odd, you dropped Mukesh email from the patch series - so he can't
jump on answering questions right away.

The mail I received has Mukesh cc'd in all the patches...

== RFC ==

We had talked about accepting the patch series as-is once I had the
known bugs fixed; but I couldn't help making an attempt at using the
HVM IO emulation codepaths so that we could completely eliminate
having to use the PV emulation code, in turn eliminating some of the
uglier "support" patches required to make the PV emulation code
capable of running on a PVH guest.  The idea for "admin" pio ranges
would be that we would use the vmx hardware to allow the guest direct
access, rather than the "re-execute with guest GPRs" trick that PV
uses.  (This functionality is not implememted by this patch series, so
we would need to make sure it was sorted for the dom0 series.)

The result looks somewhat cleaner to me.  On the other hand, because
string in & out instructions use the full emulation code, it means
opening up an extra 6k lines of code to PVH guests, including all the
complexity of the ioreq path.  (It doesn't actually send ioreqs, but
since it shares much of the path, it shares much of the complexity.)
Additionally, I'm not sure I've done it entirely correctly: the guest
boots and the io instructions it executes seem to be handled
correctly, but it may not be using the corner cases.
The case I think Mukesh was hitting was the 'speaker_io' path. But
perhaps I am misremembering it?

Well looking at the trace, it looks like the PVH kernel he gave me is actually attempting to enumerate the PCI space (writing a large range of values to cf8 then reading cfc). A full set of accesses is below:

vcpu 0
IO address summary:
      21:[w]        1  0.00s  0.00%  5387 cyc { 5387| 5387| 5387}
      70:[w]        8  0.00s  0.00%  1434 cyc {  916| 1005| 3651}
      71:[r]        8  0.00s  0.00%  1803 cyc { 1017| 1496| 5100}
      a1:[w]        1  0.00s  0.00%  1357 cyc { 1357| 1357| 1357}
     cf8:[r]        3  0.00s  0.00%  1202 cyc { 1088| 1150| 1369}
     cf8:[w]    16850  0.01s  0.00%   966 cyc {  896|  937| 1073}
     cfa:[w]        1  0.00s  0.00%   932 cyc {  932|  932|  932}
     cfb:[w]        2  0.00s  0.00%  2517 cyc { 2001| 3033| 3033}
     cfc:[r]    16560  0.01s  0.00%  1174 cyc { 1118| 1150| 1227}
     cfe:[r]      288  0.00s  0.00%  1380 cyc { 1032| 1431| 1499}

vcpu 1
IO address summary:
      60:[r]       16  0.00s  0.00%  1141 cyc { 1011| 1014| 2093}
      64:[r]    18276  0.01s  0.01%  1579 cyc { 1408| 1443| 2629}

vcpu 2
IO address summary:
      70:[w]       33  0.00s  0.00%  1192 cyc {  855|  920| 2306}
      71:[r]       31  0.00s  0.00%  1177 cyc {  988| 1032| 1567}
      71:[w]        2  0.00s  0.00%  1079 cyc { 1014| 1144| 1144}
     2e9:[r]        3  0.00s  0.00%  1697 cyc { 1002| 1011| 3080}
     2e9:[w]        3  0.00s  0.00%   998 cyc {  902|  952| 1141}
     2f9:[r]        3  0.00s  0.00%  1725 cyc {  996| 1020| 3160}
     2f9:[w]        3  0.00s  0.00%   990 cyc {  905|  935| 1130}
     3e9:[r]        3  0.00s  0.00%  1595 cyc { 1011| 1026| 2749}
     3e9:[w]        3  0.00s  0.00%  1012 cyc {  920|  976| 1142}
     3f9:[r]        3  0.00s  0.00%  2480 cyc {  988| 1079| 5375}
     3f9:[w]        3  0.00s  0.00%  1064 cyc {  913| 1035| 1245}

(No i/o from vcpu 3.)

Presumably some of these are just "the BIOS may be lying, check anyway" probes, which should be harmless for domUs.

This also means no support for "legacy" forced invalid ops -- only native
cpuid is supported in this series.

(FWIW, support for legacy forced invalid ops was requested by Tim.)

I have the fixes in another series, if people think it would be better
to check in exactly what we had with bug fixes ASAP.

Other "open issues" on the design (which need not stop the series
going in) include:

  - Whether a completely separate mode is necessary, or whether having
just having HVM mode with some flags to disable / change certain
functionality would be better

  - Interface-wise: Right now PVH is special-cased for bringing up
CPUs.  Is this what we want to do going forward, or would it be better
to try to make it more like PV (which was tried before and is hard), or more
like HVM (which would involve having emulated APICs, &c &c).
How is it hard? From the Linux standpoint it is just an hypercall?

This is my understanding of a discussion that happened between Tim and Mukesh just as I was joining the conversation. My understanding was that the issue had to do with pre-loading segments and DTs, which for PV guests is easy because Xen controls the tables themselves, but is harder to do in a reasonable way for HVM guests because the guest controls the tables. Mukesh had initially implemented it the full PV way (or mostly PV), but Tim was concerned about some kind of potential consistency issue. But I didn't read the discussion very carefully, as I was just trying to get my head around the series as a whole at that time.

The suggestion to just use an HVM-style method was made at the XenSummit by Glauber Costa. Glauber is a bit more of a KVM guy, so tends to lean towards "just behave like the real hardware". Nonetheless, I think his concern about adding an extra interface is a valid one, and worth keeping in mind.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.