[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] x86/HVM: correct hvmemul_map_linear_addr() for multi-page case
>>> On 25.09.18 at 17:30, <andrew.cooper3@xxxxxxxxxx> wrote: > On 25/09/18 13:41, Jan Beulich wrote: >>>>> On 20.09.18 at 14:41, <andrew.cooper3@xxxxxxxxxx> wrote: >>> On 13/09/18 11:12, Jan Beulich wrote: >>>> The function does two translations in one go for a single guest access. >>>> Any failure of the first translation step (guest linear -> guest >>>> physical), resulting in #PF, ought to take precedence over any failure >>>> of the second step (guest physical -> host physical). >>> Why? What is the basis of this presumption? >>> >>> As far as what real hardware does... >>> >>> This test sets up a ballooned page and a read-only page. I.e. a second >>> stage fault on the first part of a misaligned access, and a first stage >>> fault on the second part of the access. >>> >>> (d1) --- Xen Test Framework --- >>> (d1) Environment: HVM 64bit (Long mode 4 levels) >>> (d1) Test splitfault >>> (d1) About to read >>> (XEN) *** EPT qual 0000000000000181, gpa 000000000011cffc >>> (d1) Reading PTR: got 00000000ffffffff >>> (d1) About to write >>> (XEN) *** EPT qual 0000000000000182, gpa 000000000011cffc >>> (d1) ****************************** >>> (d1) PANIC: Unhandled exception at 0008:00000000001047e0 >>> (d1) Vec 14 #PF[-d-sWP] %cr2 000000000011d000 >>> (d1) ****************************** >>> >>> The second stage fault is recognised first, which is contrary to your >>> presumption, i.e. the code in its current form appears to be correct. >> Coming back to this example of yours: As a first step, are we in >> agreement that with the exception of very complex instructions >> (FSAVE, FXSAVE, XSAVE etc) instructions are supposed to work in an >> all-or-nothing manner when it comes to updating of architectural >> state (be it registers or memory)? > > No. Read Chapter Intel Vol3 8.1 and 8.2, which makes it quite clear > that misaligned accesses may be split access, and observably so to other > processors in the system. > > I've even found a new bit in it which says that >quadword SSE accesses > may even result in a partial write being completed before #PF is raised. As said before, I'm all with you with the "may" part for FPU, SSE, and alike more complex accesses. Short of being able to think of a way to compare (in a more direct way than via EPT behavior, as your test did) native and emulated behavior (I had coded something up until I realized that it doesn't test what I want to test), I've written a small test for real hardware. Would this debugging patch --- a/xen/arch/x86/boot/x86_64.S +++ b/xen/arch/x86/boot/x86_64.S @@ -87,7 +87,7 @@ GLOBAL(boot_cpu_compat_gdt_table) * of physical memory. In any case the VGA hole should be mapped with type UC. * Uses 1x 4k page. */ -l1_identmap: +GLOBAL(l1_identmap) pfn = 0 .rept L1_PAGETABLE_ENTRIES /* VGA hole (0xa0000-0xc0000) should be mapped UC-. */ --- a/xen/arch/x86/ioport_emulate.c +++ b/xen/arch/x86/ioport_emulate.c @@ -112,8 +112,48 @@ static struct dmi_system_id __initdata i { } }; +extern unsigned long l1_identmap[L1_PAGETABLE_ENTRIES];//temp static int __init ioport_quirks_init(void) { + {//temp + unsigned char*hole = __va(0xb0000); + unsigned long l1e = l1_identmap[0xb0]; + int i; + + show_page_walk((long)hole); + + l1_identmap[0xb0] &= ~_PAGE_RW; + l1_identmap[0xb0] |= _PAGE_PWT; + for(;;) { + flush_area_all(hole, FLUSH_TLB|FLUSH_TLB_GLOBAL); + show_page_walk((long)hole); + + for(i = 3; i > 0; --i) { + unsigned long cr2 = 0; + + memset(hole - 4, 0, 4); + asm("1: addl $-1, %0; 2:\n\t" + ".pushsection .fixup,\"ax\"\n" + "3: mov %%cr2, %1; jmp 2b\n\t" + ".popsection\n\t" + _ASM_EXTABLE(1b, 3b) + : "+m" (hole[-i]), "+r" (cr2) :: "memory"); + + printk("CR2=%lx data: %02x %02x %02x %02x\n", cr2, + hole[-4], hole[-3], hole[-2], hole[-1]); + } + + if(l1_identmap[0xb0] & (1UL << (paddr_bits - 1))) + break; + + l1_identmap[0xb0] |= ((1UL << paddr_bits) - 1) & PAGE_MASK; + } + + l1_identmap[0xb0] = l1e; + flush_area_all(hole, FLUSH_TLB|FLUSH_TLB_GLOBAL); + show_page_walk((long)hole); + } + dmi_check_system(ioport_quirks_tbl); return 0; } producing this output (XEN) Pagetable walk from ffff8300000b0000: (XEN) L4[0x106] = 80000000cf0ba063 ffffffffffffffff (XEN) L3[0x000] = 00000000cf0b4063 ffffffffffffffff (XEN) L2[0x000] = 00000000cf0b3063 ffffffffffffffff (XEN) L1[0x0b0] = 00000000000b0373 ffffffffffffffff (XEN) Pagetable walk from ffff8300000b0000: (XEN) L4[0x106] = 80000000cf0ba063 ffffffffffffffff (XEN) L3[0x000] = 00000000cf0b4063 ffffffffffffffff (XEN) L2[0x000] = 00000000cf0b3063 ffffffffffffffff (XEN) L1[0x0b0] = 00000000000b0379 ffffffffffffffff (XEN) CR2=ffff8300000b0000 data: 00 00 00 00 (XEN) CR2=ffff8300000b0000 data: 00 00 00 00 (XEN) CR2=ffff8300000b0000 data: 00 00 00 00 (XEN) Pagetable walk from ffff8300000b0000: (XEN) L4[0x106] = 80000000cf0ba063 ffffffffffffffff (XEN) L3[0x000] = 00000000cf0b4063 ffffffffffffffff (XEN) L2[0x000] = 00000000cf0b3063 ffffffffffffffff (XEN) L1[0x0b0] = 0000000ffffff379 ffffffffffffffff (XEN) CR2=ffff8300000b0000 data: 00 00 00 00 (XEN) CR2=ffff8300000b0000 data: 00 00 00 00 (XEN) CR2=ffff8300000b0000 data: 00 00 00 00 (XEN) Pagetable walk from ffff8300000b0000: (XEN) L4[0x106] = 80000000cf0ba063 ffffffffffffffff (XEN) L3[0x000] = 00000000cf0b4063 ffffffffffffffff (XEN) L2[0x000] = 00000000cf0b3063 ffffffffffffffff (XEN) L1[0x0b0] = 00000000000b0373 ffffffffffffffff convince you that simple (ALU) accesses at the very least do not get split, and hence our emulation shouldn't do so either. (I admit that I think I see another change, beyond the patch here, to be necessary for this to work consistently in all cases: In the "known_gla() returns true" case we'd also have to perform the second page walk _before_ trying to actually write anything. But I had questioned that conditional anyway, first in https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg00155.html and then with a shortened version left in the description of commit 3bdec530a5 ["x86/HVM: split page straddling emulated accesses in more cases"]. Paul wanted that behavior to be mirrored to hvmemul_rmw() instead.) Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |