[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Linux Xen PV CPA W^X violation false-positives



On 24.01.24 17:54, Jason Andryuk wrote:
Xen PV domains show CPA W^X violations like:

CPA detected W^X violation: 0000000000000064 -> 0000000000000067 range: 
0xffff888000010000 - 0xffff888000010fff PFN 10
WARNING: CPU: 0 PID: 30 at arch/x86/mm/pat/set_memory.c:613 
__change_page_attr_set_clr+0x113a/0x11c0
Modules linked in: xt_physdev xt_MASQUERADE iptable_nat nf_nat nf_conntrack 
libcrc32c nf_defrag_ipv4 ip_tables x_tables xen_argo(O)
CPU: 0 PID: 30 Comm: kworker/0:2 Tainted: G           O       6.1.38 #1
Workqueue: events bpf_prog_free_deferred
RIP: e030:__change_page_attr_set_clr+0x113a/0x11c0
Code: 4c 89 f1 4c 89 e2 4c 89 d6 4c 89 8d 70 ff ff ff 4d 8d 86 ff 0f 00 00 48 c7 c7 
f0 3c da 81 c6 05 d0 0e 0e 01 01 e8 f6 71 00 00 <0f> 0b 4c 8b 8d 70 ff ff ff e9 
2a fd ff ff 48 8b 85 60 ff ff ff 48
RSP: e02b:ffffc90000367c48 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 000ffffffffef064 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 00000000fffff7ff RDI: 00000000ffffffff
RBP: ffffc90000367d48 R08: 0000000000000000 R09: ffffc90000367aa0
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000067
R13: 0000000000000001 R14: ffff888000010000 R15: ffffc90000367d60
FS:  0000000000000000(0000) GS:ffff88800b800000(0000) knlGS:0000000000000000
CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdbaeda01c0 CR3: 0000000004312000 CR4: 0000000000050660
Call Trace:
  <TASK>
  ? show_regs.cold+0x1a/0x1f
  ? __change_page_attr_set_clr+0x113a/0x11c0
  ? __warn+0x7b/0xc0
  ? __change_page_attr_set_clr+0x113a/0x11c0
  ? report_bug+0x111/0x1a0
  ? handle_bug+0x4d/0xa0
  ? exc_invalid_op+0x19/0x70
  ? asm_exc_invalid_op+0x1b/0x20
  ? __change_page_attr_set_clr+0x113a/0x11c0
  ? __change_page_attr_set_clr+0x113a/0x11c0
  ? debug_smp_processor_id+0x17/0x20
  ? ___cache_free+0x2e/0x1e0
  ? _raw_spin_unlock+0x1e/0x40
  ? __purge_vmap_area_lazy+0x2ea/0x6b0
  set_direct_map_default_noflush+0x7c/0xa0
  __vunmap+0x1ac/0x280
  __vfree+0x1d/0x60
  vfree+0x27/0x40
  __bpf_prog_free+0x44/0x50
  bpf_prog_free_deferred+0x104/0x120
  process_one_work+0x1ca/0x3d0
  ? process_one_work+0x3d0/0x3d0
  worker_thread+0x45/0x3c0
  ? process_one_work+0x3d0/0x3d0
  kthread+0xe2/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>
---[ end trace 0000000000000000 ]---

Xen provides a set of page tables that the guest executes out of when it
starts.  The L1 entries are shared between level2_ident_pgt and
level2_kernel_pgt, and xen_setup_kernel_pagetable() sets the NX bit in
the level2_ident_pgt entries.  verify_rwx() only checks the l1 entry and
reports a false-positive violation.

Here is a dump of some kernel virtual addresses and the corresponding
L1 and L2 entries:
This is the start of the directmap (ident) and they have NX (bit 63) set
in the PMD.
ndvm-pv (1): [    0.466778] va=ffff888000000000 pte=0010000000000027 level: 1
ndvm-pv (1): [    0.466788] va=ffff888000000000 pmd=800000000242c067 level: 2
Directmap for kernel text:
ndvm-pv (1): [    0.466795] va=ffff888001000000 pte=0010000001000065 level: 1
ndvm-pv (1): [    0.466801] va=ffff888001000000 pmd=8000000002434067 level: 2
ndvm-pv (1): [    0.466807] va=ffff888001010000 pte=0010000001010065 level: 1
ndvm-pv (1): [    0.466814] va=ffff888001010000 pmd=8000000002434067 level: 2
The start of the kernel text highmap is unmapped:
ndvm-pv (1): [    0.466820] va=ffffffff80000000 pte=0000000000000000 level: 3
ndvm-pv (1): [    0.466826] va=ffffffff80000000 pmd=0000000000000000 level: 3
Kernel PMD for .text has NX bit clear
ndvm-pv (1): [    0.466832] va=ffffffff81000000 pte=0010000001000065 level: 1
ndvm-pv (1): [    0.466838] va=ffffffff81000000 pmd=0000000002434067 level: 2
Kernel PTE for rodata_end has NX bit set
ndvm-pv (1): [    0.466846] va=ffffffff81e62000 pte=8010000001e62025 level: 1
ndvm-pv (1): [    0.466874] va=ffffffff81e62000 pmd=000000000243b067 level: 2
Directmap of rodata_end
ndvm-pv (1): [    0.466907] va=ffff888001e62000 pte=8010000001e62025 level: 1
ndvm-pv (1): [    0.466913] va=ffff888001e62000 pmd=800000000243b067 level: 2
Directmap of a low RAM address
ndvm-pv (1): [    0.466920] va=ffff888000010000 pte=0010000000010027 level: 1
ndvm-pv (1): [    0.466926] va=ffff888000010000 pmd=800000000242c067 level: 2
Directmap of another RAM address close to but below kernel text
ndvm-pv (1): [    0.466932] va=ffff88800096c000 pte=001000000096c027 level: 1
ndvm-pv (1): [    0.466938] va=ffff88800096c000 pmd=8000000002430067 level: 2

Here are some L2 entries showing the differing NX bits for l2_ident vs.
l2_kernel while they point at the same L1 addresses
ndvm-pv (1): [    0.466944]  l2_ident[  0] pmd=800000000242c067
ndvm-pv (1): [    0.466949]  l2_ident[  1] pmd=800000000242d067
ndvm-pv (1): [    0.466955]  l2_ident[  8] pmd=8000000002434067
ndvm-pv (1): [    0.466959]  l2_ident[  9] pmd=8000000002435067
ndvm-pv (1): [    0.466964]  l2_ident[ 14] pmd=800000000243a067
ndvm-pv (1): [    0.466969]  l2_ident[ 15] pmd=800000000243b067
ndvm-pv (1): [    0.466974] l2_kernel[  8] pmd=0000000002434067
ndvm-pv (1): [    0.466979] l2_kernel[  9] pmd=0000000002435067
ndvm-pv (1): [    0.466984] l2_kernel[ 14] pmd=000000000243a067
ndvm-pv (1): [    0.466989] l2_kernel[ 15] pmd=000000000243b067

One option is to add a fallback check for verify_rwx() to check the PMD
permissions to silence the warning.  Something like below.  I think it's
not readily generalizable as it hardcodes checking the PMD.  That works
for Xen where L1 PTEs are always used, but wouldn't work for Non-Xen.

I don't think this would be a real issue, as it is only Xen PV code setting
the NX bit is PMD entries.

And in case it really becomes an issue, the higher level page tables could be
checked, too.

The other option would be to duplicate L1 page tables.  Xen PV doesn't
support large pages, so the kernel highmap can't use large pages.  The
increased memory would add up though.

Indeed, so I don't think this would be a good idea.


Regards,
Jason
---
  arch/x86/mm/pat/set_memory.c | 14 ++++++++++++++
  1 file changed, 14 insertions(+)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index e9b448d1b1b7..904129b411ee 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -641,6 +641,20 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t 
new, unsigned long star
        if ((pgprot_val(new) & (_PAGE_RW | _PAGE_NX)) != _PAGE_RW)
                return new;
+ if ((pgprot_val(new) & (_PAGE_RW | _PAGE_NX)) == _PAGE_RW) {

This if is a little bit strange, as the condition can't ever be false.

I'd rather test "if (npg == 1)" as this is the case where a PMD entry is really
existing, ...

+               pmd_t *pmd = lookup_pmd_address(start);
+
+               if (pmd && pmd_val(*pmd) & _PAGE_NX) {

... removing the need to test for pmd to be not NULL.

+                       pr_debug_once("CPA PMD 0x%016lx NX prevents PTE W^X violation: 
%016llx -> %016llx range: 0x%016lx - 0x%016lx PFN %lx\n",
+                                     pmd_flags(*pmd),
+                                     (unsigned long long)pgprot_val(old),
+                                     (unsigned long long)pgprot_val(new),
+                                     start, end, pfn);

I'd scratch that pr_debug(), as it doesn't really help.

+
+                       return new;
+               }
+       }
+
        end = start + npg * PAGE_SIZE - 1;
        WARN_ONCE(1, "CPA detected W^X violation: %016llx -> %016llx range: 0x%016lx 
- 0x%016lx PFN %lx\n",
                  (unsigned long long)pgprot_val(old),

Jason, do you want to send a V2 with your Signed-off, or would you like me to
try upstreaming the patch?


Juergen



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.