[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Is: paravirt docs and diet. Was:Re: [Lguest] lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY



On Wed, May 8, 2013 at 8:47 PM, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> 
wrote:
> On Wed, May 08, 2013 at 11:07:33AM -0700, H. Peter Anvin wrote:
>> On 05/08/2013 10:20 AM, Konrad Rzeszutek Wilk wrote:
>> >
>> > If I am reading you right the #1 issue is that you don't know whether
>> > a certain paravirt instruction has any side-effects and as such you don't
>> > feel that you can treat it like an equivalent instruction that is defined
>> > in the Intel SDM?
>> >
>> > And that means that any development work you have in the pipeline is
>> > affected because you don't have the documentation on hand and are unsure
>> > whether you will break something?
>> >
>>
>> That is, indeed, the #1 issue (and you and I have discussed it at
>> length, obviously.)
>

First of thank you for enumerating the issues and clarifying them.

My understanding was (until this email conversation) was that the
lack of the hypercall docs was hindering you. However you are not
interested in that - you are after the documentation for the paravirt
interface. Specifically how it differs from what baremetal would do.
That changes my priorities!

In that case lguest provides an excellent source of documentation on
how the paravirt interfaces should work. It is almost like reading a book.

Perhaps moving them to paravirt.h will help. There are of course
some differences between the platforms that use it - but that would help
identify them. 

>> There are a few other issues:
>>
>> 2. some of the paravirt_ops are plain wrong.  Most of the really big
>> problems are in the MMU-related ones, but as an easily-explained example
>> that I ran into the other day:
>>
>> read_cr4_safe() assumes that there is no useful distinction between "cr4
>> is zero" and "cr4 doesn't exist".  Unfortunately, this is an invalid
>> assumption.  It would be a five-minute fix in the normal case, but since
>> it is paravirtualized, fixing it involves grokking the semantics of each
>> PV layer, including any of the hypercalls that may be involved.
>>
>> In this particular example I think the answer is actually reasonable
>> simple, because I don't think any of the hypervisors support running on
>> pre-cr4 hardware (basically 486 at this point.)
>

>From my understanding the paravirt interface was a shim for three platforms
(VMware, Xen and lguest). And the API was choosen to be broad to encompass
everything.  That was then and nowadays Xen and lguest are the top users.
And there are things that don't make sense anymore (like the example above).

Your specific example is interesting as a quick 10 second lookup confirmed
that the implementation for different paravirt platforms all point to
native_read_cr4().  There are no PV hypercalls involved. Axe!

We had a public discussion on this and I think determined that the paravit_ops
can benefit from a diet. Certainly the [read|write]_msr_safe that was
introduced by Borislav Petkov per your suggestion was a good candidate.

So was the store_gdt. I think the read_tsc and this one (read_cr4) can be
slashed too.  Not sure about the idt ones but I hadn't done a careful
examination of that.

I hope it is clear that there is no resistance from me in this diet-program.
The issue I face to help you are the same you do: priority - those merge 
windows,
darn bugs, regressions, etc.

And until this conversation I was under the impression the documentation
for hypercalls was #1 - but that is more of a #3 (#2 being the
diet program, #1 being the paravirt documentation).

>>
>> 3. "Let's add another hook" becomes a far too easy solution to new problems.
>

I would hope that with your help in brainstorming these problems one can figure
out that right way of doing it. Perhaps instead of posting the patches
first, one can solicit you first for help to design it properly from the start?

>
>>
>> 4. Performance and maintainability impact of having to support multiple
>> code flows with different semantics.  The semantics of the Xen MMU, in
>> particular, is actually quite different from the x86 MMU.
>

You along with Thomas had designed a nice way of not only making this code
nicely but also cleaning up the code. And I hope we all learned more
about the Xen RO/pinning/pagetables protection mechanism. The semantics
are understood now.

In terms of development however, I recall that you as a excellent
maintainer was able to delegate the work to an Oracle employee and
everybody pitched in to help review the code and test. We did not test
a forced branch which had some extra patches and that bit us all
in the merge window.

The after effect is quite nice code and design.
>
>
>>
>> 5. Performance and maintainability impact of a maze of twisty little
>> functions, all different.  For example, in the case of some of the MSR
>> functions, we actually end up telling the compiler to combine and break
>> up the two 32-bit halves into a 64-bit register multiple times, because
>> the functions don't actually match up.  I still don't understand why we
>> don't just patch out the rdmsr/wrmsr instructions, using those registers.

My understanding is that there are two variants of paravirt patching:
inline patching (for a selective bunch of them), and patch in a call.

The inline patching is the more efficient one and the functions that
were choosen for this were the performance sensitive. I think at the
time this was designed the rdmsr/wrmsr instructions were not terribly
fast.

I think that some of these paravirt interfaces could be optimized now.

One idea that I had been toying around and asked the ksplice team to
prototype was the usage of the asm goto mechanism. It only did it for
the pte_modify call but I never got to run the performance numbers
on all the different architectures.

In summary I think (and please correct me if I mis-understood you):
 - The paravirt interfaces documentation should be #1 priority,
 - Fixes, diet and optimization for paravirt infrastructure is needed.


And since we don't want to hinder your development and I would need
to reschedule work-items I have to ask - what are the development work
that is hindering this? So I have an idea of the timelines (if this
needs to be an NDA discussion, pls email me privately your manager's
email so I can start the discussion on that).


Lastly, a bit seperate topic, but close, I thought that the paravirt
interface was under x86 maintainers ownership, but the MAINTAINERS 
file says otherwise!

Jeremy, Rusty, are you OK with this. Rusty, would you have the time to
review the patches or would it be easier if I was the sole person ?

>From f622669a3b6e05d2962e3886ef8b46add18b1e6c Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Fri, 10 May 2013 10:41:22 -0400
Subject: [PATCH] MAINTAINERS: Become the maintainer of the paravirt interface.

Somehow I thought the x86 team owned it. Time to step up
and maintain this and put the paravirt interface on a diet!

CC: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
CC: Chris Wright <chrisw@xxxxxxxxxxxx>
CC: Alok Kataria <akataria@xxxxxxxxxx>
CC: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
CC: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
---
 MAINTAINERS | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3d7782b..7e24105 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6044,10 +6044,8 @@ F:       drivers/char/ppdev.c
 F:     include/uapi/linux/ppdev.h
 
 PARAVIRT_OPS INTERFACE
-M:     Jeremy Fitzhardinge <jeremy@xxxxxxxx>
-M:     Chris Wright <chrisw@xxxxxxxxxxxx>
-M:     Alok Kataria <akataria@xxxxxxxxxx>
 M:     Rusty Russell <rusty@xxxxxxxxxxxxxxx>
+M:     Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
 L:     virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
 S:     Supported
 F:     Documentation/ia64/paravirt_ops.txt
-- 
1.8.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.