[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 7/7] xen/x86: use PCID feature
On 23/03/18 14:46, Jan Beulich wrote: >>>> On 23.03.18 at 12:29, <jgross@xxxxxxxx> wrote: >> On 23/03/18 11:51, Jan Beulich wrote: >>>>>> On 21.03.18 at 13:51, <jgross@xxxxxxxx> wrote: >>>> Avoid flushing the complete TLB when switching %cr3 for mitigation of >>>> Meltdown by using the PCID feature if available. >>>> >>>> We are using 4 PCID values for a 64 bit pv domain subject to XPTI and >>>> 2 values for the non-XPTI case: >>>> >>>> - guest active and in kernel mode >>>> - guest active and in user mode >>>> - hypervisor active and guest in user mode (XPTI only) >>>> - hypervisor active and guest in kernel mode (XPTI only) >>> >>> Before committing to this route, Jun, Kevin, can we please get >>> confirmation that PCID isn't (and isn't going to be) subject to the >>> same speculation issues in the pipeline that the U/S bit is (having >>> caused Meltdown in the first place)? To me it seems a pretty >>> likely thing to play all the same games. >> >> Really? This would assume either the processor is capable to deal with >> multiple matching TLB entries when speculating or that there can be >> only one entry per virtual address present in the TLB at the same time >> in spite of different PCIDs. > > Hmm, yes, good point. > >> And why aren't you asking for the same problem with VPIDs? This should >> be comparable to the PCID problem you are suspecting. > > Since Linux is using the approach, I'm not really suspecting a > problem. I'd just like to be sure there is none / not going to be > one. None of this is spelled out in the doc after all. > >>>> --- >>>> docs/misc/xen-command-line.markdown | 12 +++++++++ >>>> xen/arch/x86/debug.c | 3 ++- >>>> xen/arch/x86/domain_page.c | 2 +- >>>> xen/arch/x86/domctl.c | 4 +++ >>>> xen/arch/x86/flushtlb.c | 49 >>>> ++++++++++++++++++++++++++++------ >>>> xen/arch/x86/mm.c | 34 +++++++++++++++++++++--- >>>> xen/arch/x86/pv/dom0_build.c | 1 + >>>> xen/arch/x86/pv/domain.c | 52 >>>> +++++++++++++++++++++++++++++++++++++ >>>> xen/include/asm-x86/domain.h | 14 +++++++--- >>>> xen/include/asm-x86/pv/domain.h | 2 ++ >>>> xen/include/asm-x86/x86-defns.h | 1 + >>>> 11 files changed, 158 insertions(+), 16 deletions(-) >>> >>> Having had the discussion previously, I'm missing a change to >>> smp.c:new_tlbflush_clock_period() here. >> >> I can add that. I did not do this as I haven't treated FLUSH_TLB >> differently to FLUSH_TLB_GLOBAL (trying this even without any other >> change to staging led to degfaults in dom0). So such a change to >> new_tlbflush_clock_period() should be a separate patch I believe. > > Ah yes, you don't have the "flushing too much" issue anymore in > this version, or at least not in as obvious a way. Or wait - it's in > patch 4. You still do an including-global flush there regardless of > whether FLUSH_TLB_GLOBAL was actually requested. Anyway, > we'll save this aspect for later. > >>>> --- a/xen/arch/x86/debug.c >>>> +++ b/xen/arch/x86/debug.c >>>> @@ -97,7 +97,8 @@ dbg_pv_va2mfn(dbgva_t vaddr, struct domain *dp, uint64_t >>>> pgd3val) >>>> l3_pgentry_t l3e, *l3t; >>>> l2_pgentry_t l2e, *l2t; >>>> l1_pgentry_t l1e, *l1t; >>>> - unsigned long cr3 = (pgd3val ? pgd3val : dp->vcpu[0]->arch.cr3); >>>> + unsigned long cr3 = (pgd3val ? pgd3val >>>> + : (dp->vcpu[0]->arch.cr3 & >>>> ~X86_CR3_NOFLUSH)); >>> >>> What about the PCID portion? You want the address of the page >>> here, so I think you should use a "white-listing" masking operation >>> instead of a blacklisting one. >> >> The PCID portion is no problem here as the value is converted into a >> mfn. >> >> I can do the modification you are asking for, of course. > > Well, even if the bits are shifter out in the end, the code could > look more correct. Plus by masking to just the address, future > new meaning assigned to currently unused bits would not be > a problem for this piece of code anymore. > >>>> +/* >>>> + * Return additional PCID specific cr3 bits. >>>> + * >>>> + * Note that X86_CR3_NOFLUSH will not be readable in cr3. Anyone consuming >>>> + * v->arch.cr3 should mask away X86_CR3_NOFLUSH and X86_CR3_PCIDMASK in >>>> case >>> >>> I stand to my previous comment, which was left unanswered afaics: >> >> Uuh, sorry for that. >> >>> "Is it a good idea to suppress the flush this way for every reader >>> of v->arch.cr3? For example, what about the use in >>> dbg_pv_va2mfn()? I think it should be the consumers of the field >>> to decide whether to OR in that flag (which isn't part of the >>> register value anyway)." >>> To be more precise, I can see the pcid to be put here (which will >>> require consumers to be aware anyway), but I don't think the non- >>> register-value no-flush indicator belongs here. IOW I think after >>> writing the value into %cr3, the value read back should match the >>> stored value. >> >> This will make restore_all_guest more complicated. v->arch.cr3 is copied >> to cpu_info->xen_cr3 there and this value is then used for %cr3. I >> really don't want to add complex logic there to add the no-flush >> indicator in case PCIDs are active. > > Valid point. Looking at all present uses of ->arch.cr3, it's probably > indeed better the way you have it. However, I'm now wondering > about something else: make_cr3() leaves PCID as zero for HVM > and idle domains, but runs Xen with PCIDs 2 and 3 for (some) PV > domains. That looks like an undesirable setup though - it would > seem better to run Xen (with full page tables) with PCID 0 at all > times. > > Then we'd have e.g. > PCID 0 Xen (full page tables) > PCID x PV guest priv > PCID y PV guest user So this would need another way to switch between guest and xen %cr3. Or would you want to use different %cr3 values with the same PCID without flushing the TLB in between? This seems to be a way to ask for problems... In case you'd use the same %cr3 (guest kernel one, I guess) for both cases: are you really sure there is no problem in any hypervisor path accessing guest data which would result in using guest kernel access rights when coming from user mode (BTW: that was the security note I had in v2 of my series). > Global pages in PCID 0 could then still be permitted, and wouldn't > ever need flushing except when FLUSH_TLB_GLOBAL is requested. > > As to the use of two separate PCIDs for PV kernel and user modes > - while this helps isolation, it prevents recovering the non-XPTI > property of user mode TLB entries surviving in-guest mode switches. I don't get that. With PCID the guest's kernel _and_ user entries will survive in-guest mode switches as there is no TLB flushing involved (the no-flush bit is set in v->arch.cr3 for both modes). The only downside are guest kernel accesses to user pages: they will need additional TLB entries as the PCID is different. > I wonder whether this is part of the reason you see PCID have a > negative effect in the non-XPTI case. > > So in the end the question is: Why not use just two PCIDs, and > allow global pages just like we do now, with the added benefit > that we no longer need to flush Xen's global TLB entries just > because we want to get rid of PV guest user ones. I can't see how that would work without either needing some more TLB flushes in order to prevent stale TLB entries or loosing the Meltdown mitigation. Which %cr3/PCID combination should be used in hypervisor, guest kernel and guest user mode? Which pages would be global? > >>>> + } >>>> + if ( d->arch.pv_domain.pcid ) >>>> + v->arch.cr3 |= get_pcid_bits(v, d->arch.pv_domain.xpti); >>> >>> It is certainly at least confusing that you pass "xpti" as argument >>> for a parameter named "is_xen". The question is what mode you >>> want us to be in when running with PCID but no XPTI: Should Xen >>> use its own PCID then? That would seem reasonable to me, but >>> would seem to require passing true here. Yet then this would >>> require switching CR3 on the way out to guests and back in from >>> guests even in that case (just without copying the root page table). >>> >>> If in that mode Xen isn't meant to use its own PCID, the command >>> line option "pcid=noxpti" would seem at least misleading to me then, >>> as you wouldn't really use different PCIDs in that mode, but only >>> disable global pages (which probably hurts performance) and use >>> INVPCID for flushing (which probably helps performance). >> >> The idea is to use different PCID values for guest kernel and user >> mode. This removes the need for global guest user pages. > > Partly only. Global guest user pages serve two purposes: They > survive a round trip user -> kernel -> user (as long as no Xen > context switch occurs in the middle), and they also allow the > kernel to utilize TLB entries user mode has just created e.g. for > system call input. Your current use of PCIDs retains only the > first property. Right. > >> I don't >> want to use global guest user pages together with PCID as flushing >> global pages from the TLB with PCID enabled requires flushing either >> the complete TLB or you'd have to use INVLPG in all possible address >> spaces (so you'd need to have multiple %cr3 switches). > > Well, yes, flushing _individual_ pages is a problem in that case. > As to multiple CR3 switches - are these all that bad really with > the no-flush bit set? With the reduced number of PCIDs in actual > use (as discussed above) "all possible address spaces" would > mean just two. And I could imagine that in a number of cases > just one INVLPG (with the right PCID active) might suffice. > > One complicating factor is that we don't want to introduce > Xen TLB entries for other than what we map in the minimal page > tables into PV guest PCID space, which would happen if we > simply switched PCID around an INVLPG. > > What I don't understand in any event is why you need separate > PCIDs for Xen depending on whether the active PV guest is in > kernel or user mode. Main reason are the different page tables anchored in %cr3. > As a side note, I've noticed only now that get_pcid_bits()'s > first parameter wants to be constified. Will change that. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |