[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option



On 07/29/2015 10:21 AM, Andrew Cooper wrote:
On 29/07/15 06:28, Andy Lutomirski wrote:
On Tue, Jul 28, 2015 at 8:01 PM, Boris Ostrovsky
<boris.ostrovsky@xxxxxxxxxx> wrote:
On 07/28/2015 08:47 PM, Andrew Cooper wrote:
On 29/07/2015 01:21, Andy Lutomirski wrote:
On Tue, Jul 28, 2015 at 10:10 AM, Boris Ostrovsky
<boris.ostrovsky@xxxxxxxxxx> wrote:
On 07/28/2015 01:07 PM, Andy Lutomirski wrote:
On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
<andrew.cooper3@xxxxxxxxxx> wrote:
I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
xen_free_ldt() is attempting to nab back the pages which Xen still has
mapped as an LDT.

I just instrumented it with yet more LSL instructions.  I'm pretty
sure that set_ldt really is clearing at least LDT entry zero.
Nonetheless the free_ldt call still oopses.

Yes, I added some instrumentation to the hypervisor and we definitely
set
LDT to NULL before failing.

-boris
Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from
getting incremented once on each CPU at the same time if both CPUs
fault in the same shadow LDT page at the same time?
Nothing, but that is fine.  If a page is in use in two vcpus LDTs, it is
expected to have a type refcount of 2.

Similarly, what
keeps both CPUs from calling get_page_type at the same time and
therefore losing track of the page type reference count?
a cmpxchg() loop in the depths of __get_page_type().

I don't see why vmalloc or vm_unmap_aliases would have anything to do
with this, though.
So just for kicks I made lazy_max_pages() return 0 to free vmaps immediately
and the problem went away.
As far as I can tell, this affects TLB flushes but not unmaps.  That
means that my patch is totally bogus -- vm_unmap_aliases() *flushed*
aliases but isn't involved in removing them from the page tables.
That must be why xen_alloc_ldt and xen_set_ldt work today.

So what does flushing the TLB have to do with anything?  The only
thing I can think of is that it might force some deferred hypercalls
out.  I can reproduce this easily on UP, so IPIs aren't involved.

The other odd thing is that it seems like this happens when clearing
the LDT and freeing the old one but not when setting the LDT and
freeing the old one.  This is plausibly related to the lazy mode in
effect at the time, but I have no evidence for that.

Two more data points:  Putting xen_flush_mc before and after the
SET_LDT multicall has no effect.  Putting flush_tlb_all() in
xen_free_ldt doesn't help either, while vm_unmap_aliases() in the
exact same place does help.
FYI, I have got a repro now and am investigating.


To simplify your test case, this is sufficient for me to trigger this:


#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <asm/ldt.h>


int main()
{
        int i;

        struct user_desc desc = {
                .entry_number    = 0,
                .base_addr       = 0,
                .limit           = 10,
                .seg_32bit       = 1,
                .contents        = 2, /* Code, not conforming */
                .read_exec_only  = 0,
                .limit_in_pages  = 0,
                .seg_not_present = 0,
                .useable         = 0
};

        for (i = 0; i < 500; i++)
                syscall(SYS_modify_ldt, 0x11, &desc, sizeof(desc));
}


Run this program in a loop --- the error is triggered (again, for me), when it exits.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.