[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
> -----Original Message----- > From: Jan Beulich <jbeulich@xxxxxxxx> > Sent: 11 January 2021 09:00 > To: paul@xxxxxxx > Cc: wl@xxxxxxx; iwj@xxxxxxxxxxxxxx; anthony.perard@xxxxxxxxxx; > andrew.cooper3@xxxxxxxxxx; > george.dunlap@xxxxxxxxxx; julien@xxxxxxx; sstabellini@xxxxxxxxxx; > roger.pau@xxxxxxxxxx; xen- > devel@xxxxxxxxxxxxxxxxxxxx; 'Igor Druzhinin' <igor.druzhinin@xxxxxxxxxx> > Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per > partition > > On 11.01.2021 09:45, Paul Durrant wrote: > >> -----Original Message----- > >> From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> > >> Sent: 09 January 2021 00:56 > >> To: paul@xxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx > >> Cc: wl@xxxxxxx; iwj@xxxxxxxxxxxxxx; anthony.perard@xxxxxxxxxx; > >> andrew.cooper3@xxxxxxxxxx; > >> george.dunlap@xxxxxxxxxx; jbeulich@xxxxxxxx; julien@xxxxxxx; > >> sstabellini@xxxxxxxxxx; > >> roger.pau@xxxxxxxxxx > >> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per > >> partition > >> > >> On 08/01/2021 08:32, Paul Durrant wrote: > >>>> -----Original Message----- > >>>> From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> > >>>> Sent: 08 January 2021 00:47 > >>>> To: xen-devel@xxxxxxxxxxxxxxxxxxxx > >>>> Cc: paul@xxxxxxx; wl@xxxxxxx; iwj@xxxxxxxxxxxxxx; > >>>> anthony.perard@xxxxxxxxxx; > >>>> andrew.cooper3@xxxxxxxxxx; george.dunlap@xxxxxxxxxx; jbeulich@xxxxxxxx; > >>>> julien@xxxxxxx; > >>>> sstabellini@xxxxxxxxxx; roger.pau@xxxxxxxxxx; Igor Druzhinin > >>>> <igor.druzhinin@xxxxxxxxxx> > >>>> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per > >>>> partition > >>>> > >>>> TLFS 7.8.1 stipulates that "a virtual processor index must be less than > >>>> the maximum number of virtual processors per partition" that "can be > >>>> obtained > >>>> through CPUID leaf 0x40000005". Furthermore, "Requirements for > >>>> Implementing > >>>> the Microsoft Hypervisor Interface" defines that starting from Windows > >>>> Server > >>>> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now > >>>> contain a value -1 basically assuming the hypervisor has no restriction > >>>> while > >>>> 0 (that we currently expose) means the default restriction is still > >>>> present. > >>>> > >>>> Along with previous changes exposing ExProcessorMasks this allows a > >>>> recent > >>>> Windows VM with Viridian extension enabled to have more than 64 vCPUs > >>>> without > >>>> going into immediate BSOD. > >>>> > >>> > >>> This is very odd as I was happily testing with a 128 vCPU VM once > >>> ExProcessorMasks was > >> implemented... no need for the extra leaf. > >>> The documentation for 0x40000005 states " Describes the scale limits > >>> supported in the current > >> hypervisor implementation. If any > >>> value is zero, the hypervisor does not expose the corresponding > >>> information". It does not say (in > >> section 7.8.1 or elsewhere AFAICT) > >>> what implications that has for VP_INDEX. > >>> > >>> In " Requirements for Implementing the Microsoft Hypervisor Interface" I > >>> don't see anything saying > >> what the semantics of not > >>> implementing leaf 0x40000005 are, only that if implementing it then -1 > >>> must be used to break the > 64 > >> VP limit. It also says that > >>> reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be > >>> clear, which is clearly > >> not true if ExProcessorMasks is > >>> enabled. That document is dated June 13th 2012 so I think it should be > >>> taken with a pinch of salt. > >>> > >>> Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is > >>> enabled? If so, which > >> version of Windows? I'd like to get > >>> a repro myself. > >> > >> I return with more testing that confirm both my and your results. > >> > >> 1) with ExProcessorMask and 66 vCPUs assigned both current WS19 build and > >> pre-release 20270 of vNext boot correctly with no changes > >> > >> and that would be fine but the existing documentation for > >> ex_processor_masks > >> states that: > >> "Hence this enlightenment must be specified for guests with more > >> than 64 vCPUs if B<hcall_remote_tlb_flush> and/or B<hcall_ipi> are also > >> specified." > >> > >> You then would expect 64+ vCPU VM to boot without ex_processors_mask, > >> hcall_remote_tlb_flush and hcall_ipi. > > > > Indeed. > > > >> > >> So, > >> 2) without ExProcessorMaks and 66 vCPUs assigned and with > >> hcall_remote_tlb_flush > >> and hcall_ipi disabled: WS19 BSODs and vNext fails to initialize secondary > >> CPUs > >> > > > > This is not what I recall from testing but I can confirm I see the same now. > > > >> After applying my change, > >> 3) without ExProcessorMaks and 66 vCPUs assigned and with > >> hcall_remote_tlb_flush > >> and hcall_ipi disabled WS19 and vNext boot correctly > >> > >> So we either need to change documentation and require ExProcessorMasks for > >> all > >> VMs with 64+ vCPUs or go with my change. > >> > > > > I think we go with your change. I guess we can conclude then that the > > separate viridian flag as part > of the base set is the right way to go (to stop the leaf magically appearing > on migrate of existing > guests) and leave ex_processor_masks (and the documentation) as-is. > > > > You can add my R-b to the patch. > > That's the unchanged patch then, including the libxl change that > I had asked about and that I have to admit I don't fully follow > Igor's responses? I'm hesitant to give an ack for that aspect of > the change, yet I suppose the libxl maintainers will defer to > x86 ones there. Alternatively Andrew or Roger could of course > ack this ... > I don't think we really need specific control in xl.cfg as this is a fix for some poorly documented semantics in the spec. The flag simply prevents the leaf magically appearing on migrate and I think that's enough. Paul > Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |