[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option



On Thu, Jul 30, 2015 at 1:01 PM, Boris Ostrovsky
<boris.ostrovsky@xxxxxxxxxx> wrote:
> On 07/30/2015 02:54 PM, Andrew Cooper wrote:
>>
>> On 30/07/15 19:30, Andy Lutomirski wrote:
>>>
>>> On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper
>>> <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>
>>>> On 30/07/2015 00:13, Andy Lutomirski wrote:
>>>>>
>>>>> On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
>>>>> <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> On 29/07/2015 23:49, Boris Ostrovsky wrote:
>>>>>>>
>>>>>>> On 07/29/2015 06:46 PM, David Vrabel wrote:
>>>>>>>>
>>>>>>>> On 29/07/2015 23:11, Andrew Cooper wrote:
>>>>>>>>>
>>>>>>>>> On 29/07/2015 23:05, Andy Lutomirski wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
>>>>>>>>>> <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 29/07/2015 22:26, Andy Lutomirski wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>>>>>>>>>>>> <boris.ostrovsky@xxxxxxxxxx> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 29/07/15 15:43, Boris Ostrovsky wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> FYI, I have got a repro now and am investigating.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Good and bad news.  This bug has nothing to do with LDTs
>>>>>>>>>>>>>> themselves.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have worked out what is going on, but this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/arch/x86/xen/enlighten.c
>>>>>>>>>>>>>> b/arch/x86/xen/enlighten.c
>>>>>>>>>>>>>> index 5abeaac..7e1a82e 100644
>>>>>>>>>>>>>> --- a/arch/x86/xen/enlighten.c
>>>>>>>>>>>>>> +++ b/arch/x86/xen/enlighten.c
>>>>>>>>>>>>>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
>>>>>>>>>>>>>> pgprot_t prot)
>>>>>>>>>>>>>>              pte = pfn_pte(pfn, prot);
>>>>>>>>>>>>>>     +       (void)*(volatile int*)v;
>>>>>>>>>>>>>>            if (HYPERVISOR_update_va_mapping((unsigned long)v,
>>>>>>>>>>>>>> pte, 0)) {
>>>>>>>>>>>>>>                    pr_err("set_aliased_prot va update failed
>>>>>>>>>>>>>> w/
>>>>>>>>>>>>>> lazy mode
>>>>>>>>>>>>>> %u\n", paravirt_get_lazy_mode());
>>>>>>>>>>>>>>                    BUG();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is perhaps not the fix we are looking for, and every use of
>>>>>>>>>>>>>> HYPERVISOR_update_va_mapping() is susceptible to the same
>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think in most cases we know that page is mapped so hopefully
>>>>>>>>>>>>> this is the
>>>>>>>>>>>>> only site that we need to be careful about.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there any chance we can get some kind of quick-and-dirty fix
>>>>>>>>>>>> that
>>>>>>>>>>>> can go to x86/urgent in the next few days even if a clean fix
>>>>>>>>>>>> isn't
>>>>>>>>>>>> available yet?
>>>>>>>>>>>
>>>>>>>>>>> Quick and dirty?
>>>>>>>>>>>
>>>>>>>>>>> Reading from v is the most obvious and quick way, for areas where
>>>>>>>>>>> we are
>>>>>>>>>>> certain v exists, is kernel memory and is expected to have a
>>>>>>>>>>> backing
>>>>>>>>>>> page.  I don't know offhand how many of current
>>>>>>>>>>> HYPERVISOR_update_va_mapping() callsites this applies to.
>>>>>>>>>>
>>>>>>>>>> __get_user((char *)v, tmp), perhaps, unless there's something
>>>>>>>>>> better
>>>>>>>>>> in the wings.  Keep in mind that we need this for -stable, and
>>>>>>>>>> it's
>>>>>>>>>> likely to get backported quite quickly due to CVE-2015-5157.
>>>>>>>>>
>>>>>>>>> Hmm - something like that tucked inside
>>>>>>>>> HYPERVISOR_update_va_mapping()
>>>>>>>>> would probably work, and certainly be minimal hassle for -stable.
>>>>>>>>>
>>>>>>>>> Altering the hypercall used is certainly not something to backport,
>>>>>>>>> nor
>>>>>>>>> are we sure it is a viable fix at this time.
>>>>>>>>
>>>>>>>> Changing this one use of update_va_mapping to use
>>>>>>>> mmu_update_normal_pt
>>>>>>>> is the correct fix to unblock this LDT series.  I see no reason why
>>>>>>>> this
>>>>>>>> cannot be backported.
>>>>>>>
>>>>>>> To properly fix it should include batching and that is not something
>>>>>>> that I think we should target for stable.
>>>>>>
>>>>>> Batching is absolutely not necessary to alter update_va_mapping to
>>>>>> mmu_update_normal_pt.  After all, update_va_mapping isn't batched.
>>>>>>
>>>>>> However this isn't the first issue issue we have had lazy mmu
>>>>>> faulting,
>>>>>> and I doubt it is the last.  There are not many callsites of
>>>>>> update_va_mapping - I will audit them tomorrow and see if any similar
>>>>>> issues are lurking elsewhere.
>>>>>
>>>>> One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
>>>>> yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
>>>>> access to fault.  Is this something we should be worried about?
>>>>
>>>> Yes.  update_va_mapping() will function perfectly well taking one RW
>>>> mapping to RO even if there is a second RW mapping.  In such a case, the
>>>> next LDT access will fault.
>>>
>>> Which is a problem because that alias might still exist, and also
>>> because Linux really doesn't expect that fault.
>>>
>>>> On closer inspection, Xen is rather unhelpful with the fault.  Xen's
>>>> lazy #PF will be bounced back to the guest with cr2 adjusted to appear
>>>> in the range passed to set_ldt().  The error code however will be
>>>> unmodified (and limited only by not-user and not-reserved), so will
>>>> appear as a non-present read or write supervisor access to an address
>>>> which the kernel has a valid read mapping of.
>>>
>>> More yuck.
>>>
>>> I think I'm just going to stick an unconditional vm_flush_aliases in
>>> alloc_ldt.
>>>
>>>> Therefore, set_ldt() needs to be confident that there are no writeable
>>>> mappings to the frames used to make up the LDT.  It could proactively
>>>> fault them in by accessing one descriptor in each page inside the limit,
>>>> but by the time a fault is received it is probably too late to work out
>>>> where the other mapping is which prevented the typechange (or indeed,
>>>> whether Xen objected to one of the descriptors instead).
>>>
>>> This seems like overkill.
>>>
>>> I'm still a bit confused, though: the failure is in xen_free_ldt.  How
>>> do we make it all the way to xen_free_ldt without the vmapped page
>>> existing in the guest's page tables?  After all, we had to survive
>>> xen_alloc_ldt first, and ISTM that should fail in exactly the same
>>> way.
>>
>> (Summarising part of a discussion which has just occurred on IRC)
>>
>> I presume that xen_free_ldt() is called while in the context of an mm
>> which doesn't have the particular area of the vmalloc() space faulted in.
>
>
> This is exactly what's happening --- the bug is only triggered during exit
> and xen_free_ldt() is called from someone else's context, e.g.:
>
> [   53.986677] Call Trace:
> [   53.986677]  [<c105312d>] xen_free_ldt+0x2d/0x40
> [   53.986677]  [<c1062310>] free_ldt_struct.part.1+0x10/0x40
> [   53.986677]  [<c1062735>] destroy_context+0x25/0x40
> [   53.986677]  [<c10a764e>] __mmdrop+0x1e/0xc0
> [   53.986677]  [<c10c9858>] finish_task_switch+0xd8/0x1a0
> [   53.986677]  [<c1863736>] __schedule+0x316/0x950
> [   53.986677]  [<c1863d96>] schedule+0x26/0x70
> [   53.986677]  [<c10ac613>] do_wait+0x1b3/0x200
> [   53.986677]  [<c10ac9d7>] SyS_waitpid+0x67/0xd0
> [   53.986677]  [<c10aa820>] ? task_stopped_code+0x50/0x50
> [   53.986677]  [<c186717a>] syscall_call+0x7/0x7
>
> But that would imply that this other context has mm->context.ldt of
> ldt_gdt_32. How is that possible?
>

It's freed via destroy_context, which destroys someone else's LDT, right?

--Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.