[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for-4.12] x86/altp2m: fix HVMOP_altp2m_set_domain_state race



>>> On 08.02.19 at 12:58, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
> On 2/8/19 1:13 PM, Razvan Cojocaru wrote:
>> On 2/8/19 12:51 PM, Jan Beulich wrote:
>>>>>> On 08.02.19 at 10:56, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>>> HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
>>>> on purpose (as it was originally supposed to cater to a in-guest
>>>> agent, and a domain pausing itself is not a good idea).
>>>>
>>>> This can lead to domain crashes in the vmx_vmexit_handler() code
>>>> that checks if the guest has the ability to switch EPTP without an
>>>> exit. That code can __vmread() the host p2m's EPT_POINTER
>>>> (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
>>>> chance to run altp2m_vcpu_initialise(), but after
>>>> d->arch.altp2m_active is set).
>>>>
>>>> While the in-guest scenario continues to pose problems, this
>>>> patch fixes the "external" case.
>>>
>>> IOW you're papering over the problem rather than fixing it. Why
>>> does altp2m_active get set to true before actually having set up
>>> everything? Shouldn't it get cleared early, but set late?
>> Well, yes, that would have been my second attempt: set the "altp2m 
>> enabled" bool after the init, and before the uninit and no longer 
>> domain_pause() explicitly; however I thought that was a brittle 
>> solution, relying on comments / programmer attention to the code 
>> sequence rather than taking a proper lock.
>> 
>> I'll test that scenario then and return with the results / possibly 
>> another patch.
> 
> Actually, your suggestion does not work, because the way the code has 
> been designed, altp2m_vcpu_initialise() calls altp2m_vcpu_update_p2m(), 
> which does the proper work that's interesting to us here, like this:
> 
>    2153 static void vmx_vcpu_update_eptp(struct vcpu *v)
>    2154 {
>    2155     struct domain *d = v->domain;
>    2156     struct p2m_domain *p2m = NULL;
>    2157     struct ept_data *ept;
>    2158
>    2159     if ( altp2m_active(d) )
>    2160         p2m = p2m_get_altp2m(v);
>    2161     if ( !p2m )
>    2162         p2m = p2m_get_hostp2m(d);
>    2163
>    2164     ept = &p2m->ept;
>    2165     ept->mfn = pagetable_get_pfn(p2m_get_pagetable(p2m));
>    2166
>    2167     vmx_vmcs_enter(v);
>    2168
>    2169     __vmwrite(EPT_POINTER, ept->eptp);
>    2170
>    2171     if ( v->arch.hvm.vmx.secondary_exec_control &
>    2172          SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
>    2173         __vmwrite(EPTP_INDEX, vcpu_altp2m(v).p2midx);
>    2174
>    2175     vmx_vmcs_exit(v);
>    2176 }
> 
> So please note that on line 2159 it checks if altp2m is active, and only 
> then does it do the right thing. So setting the d->arch.altp2m_active 
> bool _after_ calling altp2m_vcpu_initialise() will fail to work 
> correctly - turning this into a chicken-and-egg problem, or perhaps more 
> interestingly, another discussion about whether in-guest-only altp2m 
> agents make any sense fundamentally.

Well, to be honest I expected dependencies like this to be there,
and hence I didn't expect it would be a straightforward change.
Just like we do e.g. for the IOMMU enabling, I guess the boolean
wants to become a tristate then (off -> enabling -> enabled),
which interested sites then can use to distinguish what they
want/need to do.

Another relatively obvious solution would be to add a boolean
parameter to altp2m_vcpu_update_p2m() such that
altp2m_vcpu_initialise() can guide it properly. But this of course
depends to a certain degree on how wide spread the problem is.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.