[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Linux Xen PV CPA W^X violation false-positives
On 24.01.24 17:54, Jason Andryuk wrote: Xen PV domains show CPA W^X violations like: CPA detected W^X violation: 0000000000000064 -> 0000000000000067 range: 0xffff888000010000 - 0xffff888000010fff PFN 10 WARNING: CPU: 0 PID: 30 at arch/x86/mm/pat/set_memory.c:613 __change_page_attr_set_clr+0x113a/0x11c0 Modules linked in: xt_physdev xt_MASQUERADE iptable_nat nf_nat nf_conntrack libcrc32c nf_defrag_ipv4 ip_tables x_tables xen_argo(O) CPU: 0 PID: 30 Comm: kworker/0:2 Tainted: G O 6.1.38 #1 Workqueue: events bpf_prog_free_deferred RIP: e030:__change_page_attr_set_clr+0x113a/0x11c0 Code: 4c 89 f1 4c 89 e2 4c 89 d6 4c 89 8d 70 ff ff ff 4d 8d 86 ff 0f 00 00 48 c7 c7 f0 3c da 81 c6 05 d0 0e 0e 01 01 e8 f6 71 00 00 <0f> 0b 4c 8b 8d 70 ff ff ff e9 2a fd ff ff 48 8b 85 60 ff ff ff 48 RSP: e02b:ffffc90000367c48 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 000ffffffffef064 RCX: 0000000000000000 RDX: 0000000000000003 RSI: 00000000fffff7ff RDI: 00000000ffffffff RBP: ffffc90000367d48 R08: 0000000000000000 R09: ffffc90000367aa0 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000067 R13: 0000000000000001 R14: ffff888000010000 R15: ffffc90000367d60 FS: 0000000000000000(0000) GS:ffff88800b800000(0000) knlGS:0000000000000000 CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdbaeda01c0 CR3: 0000000004312000 CR4: 0000000000050660 Call Trace: <TASK> ? show_regs.cold+0x1a/0x1f ? __change_page_attr_set_clr+0x113a/0x11c0 ? __warn+0x7b/0xc0 ? __change_page_attr_set_clr+0x113a/0x11c0 ? report_bug+0x111/0x1a0 ? handle_bug+0x4d/0xa0 ? exc_invalid_op+0x19/0x70 ? asm_exc_invalid_op+0x1b/0x20 ? __change_page_attr_set_clr+0x113a/0x11c0 ? __change_page_attr_set_clr+0x113a/0x11c0 ? debug_smp_processor_id+0x17/0x20 ? ___cache_free+0x2e/0x1e0 ? _raw_spin_unlock+0x1e/0x40 ? __purge_vmap_area_lazy+0x2ea/0x6b0 set_direct_map_default_noflush+0x7c/0xa0 __vunmap+0x1ac/0x280 __vfree+0x1d/0x60 vfree+0x27/0x40 __bpf_prog_free+0x44/0x50 bpf_prog_free_deferred+0x104/0x120 process_one_work+0x1ca/0x3d0 ? process_one_work+0x3d0/0x3d0 worker_thread+0x45/0x3c0 ? process_one_work+0x3d0/0x3d0 kthread+0xe2/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Xen provides a set of page tables that the guest executes out of when it starts. The L1 entries are shared between level2_ident_pgt and level2_kernel_pgt, and xen_setup_kernel_pagetable() sets the NX bit in the level2_ident_pgt entries. verify_rwx() only checks the l1 entry and reports a false-positive violation. Here is a dump of some kernel virtual addresses and the corresponding L1 and L2 entries: This is the start of the directmap (ident) and they have NX (bit 63) set in the PMD. ndvm-pv (1): [ 0.466778] va=ffff888000000000 pte=0010000000000027 level: 1 ndvm-pv (1): [ 0.466788] va=ffff888000000000 pmd=800000000242c067 level: 2 Directmap for kernel text: ndvm-pv (1): [ 0.466795] va=ffff888001000000 pte=0010000001000065 level: 1 ndvm-pv (1): [ 0.466801] va=ffff888001000000 pmd=8000000002434067 level: 2 ndvm-pv (1): [ 0.466807] va=ffff888001010000 pte=0010000001010065 level: 1 ndvm-pv (1): [ 0.466814] va=ffff888001010000 pmd=8000000002434067 level: 2 The start of the kernel text highmap is unmapped: ndvm-pv (1): [ 0.466820] va=ffffffff80000000 pte=0000000000000000 level: 3 ndvm-pv (1): [ 0.466826] va=ffffffff80000000 pmd=0000000000000000 level: 3 Kernel PMD for .text has NX bit clear ndvm-pv (1): [ 0.466832] va=ffffffff81000000 pte=0010000001000065 level: 1 ndvm-pv (1): [ 0.466838] va=ffffffff81000000 pmd=0000000002434067 level: 2 Kernel PTE for rodata_end has NX bit set ndvm-pv (1): [ 0.466846] va=ffffffff81e62000 pte=8010000001e62025 level: 1 ndvm-pv (1): [ 0.466874] va=ffffffff81e62000 pmd=000000000243b067 level: 2 Directmap of rodata_end ndvm-pv (1): [ 0.466907] va=ffff888001e62000 pte=8010000001e62025 level: 1 ndvm-pv (1): [ 0.466913] va=ffff888001e62000 pmd=800000000243b067 level: 2 Directmap of a low RAM address ndvm-pv (1): [ 0.466920] va=ffff888000010000 pte=0010000000010027 level: 1 ndvm-pv (1): [ 0.466926] va=ffff888000010000 pmd=800000000242c067 level: 2 Directmap of another RAM address close to but below kernel text ndvm-pv (1): [ 0.466932] va=ffff88800096c000 pte=001000000096c027 level: 1 ndvm-pv (1): [ 0.466938] va=ffff88800096c000 pmd=8000000002430067 level: 2 Here are some L2 entries showing the differing NX bits for l2_ident vs. l2_kernel while they point at the same L1 addresses ndvm-pv (1): [ 0.466944] l2_ident[ 0] pmd=800000000242c067 ndvm-pv (1): [ 0.466949] l2_ident[ 1] pmd=800000000242d067 ndvm-pv (1): [ 0.466955] l2_ident[ 8] pmd=8000000002434067 ndvm-pv (1): [ 0.466959] l2_ident[ 9] pmd=8000000002435067 ndvm-pv (1): [ 0.466964] l2_ident[ 14] pmd=800000000243a067 ndvm-pv (1): [ 0.466969] l2_ident[ 15] pmd=800000000243b067 ndvm-pv (1): [ 0.466974] l2_kernel[ 8] pmd=0000000002434067 ndvm-pv (1): [ 0.466979] l2_kernel[ 9] pmd=0000000002435067 ndvm-pv (1): [ 0.466984] l2_kernel[ 14] pmd=000000000243a067 ndvm-pv (1): [ 0.466989] l2_kernel[ 15] pmd=000000000243b067 One option is to add a fallback check for verify_rwx() to check the PMD permissions to silence the warning. Something like below. I think it's not readily generalizable as it hardcodes checking the PMD. That works for Xen where L1 PTEs are always used, but wouldn't work for Non-Xen. I don't think this would be a real issue, as it is only Xen PV code setting the NX bit is PMD entries. And in case it really becomes an issue, the higher level page tables could be checked, too. The other option would be to duplicate L1 page tables. Xen PV doesn't support large pages, so the kernel highmap can't use large pages. The increased memory would add up though. Indeed, so I don't think this would be a good idea. Regards, Jason --- arch/x86/mm/pat/set_memory.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index e9b448d1b1b7..904129b411ee 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -641,6 +641,20 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star if ((pgprot_val(new) & (_PAGE_RW | _PAGE_NX)) != _PAGE_RW) return new;+ if ((pgprot_val(new) & (_PAGE_RW | _PAGE_NX)) == _PAGE_RW) { This if is a little bit strange, as the condition can't ever be false. I'd rather test "if (npg == 1)" as this is the case where a PMD entry is really existing, ... + pmd_t *pmd = lookup_pmd_address(start); + + if (pmd && pmd_val(*pmd) & _PAGE_NX) { ... removing the need to test for pmd to be not NULL. + pr_debug_once("CPA PMD 0x%016lx NX prevents PTE W^X violation: %016llx -> %016llx range: 0x%016lx - 0x%016lx PFN %lx\n", + pmd_flags(*pmd), + (unsigned long long)pgprot_val(old), + (unsigned long long)pgprot_val(new), + start, end, pfn); I'd scratch that pr_debug(), as it doesn't really help. + + return new; + } + } + end = start + npg * PAGE_SIZE - 1; WARN_ONCE(1, "CPA detected W^X violation: %016llx -> %016llx range: 0x%016lx - 0x%016lx PFN %lx\n", (unsigned long long)pgprot_val(old), Jason, do you want to send a V2 with your Signed-off, or would you like me to try upstreaming the patch? Juergen
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |