[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Linux Xen PV CPA W^X violation false-positives



On Thu, Mar 28, 2024 at 02:00:14PM +0100, Jürgen Groß wrote:
> Hi Jason,
> 
> On 28.03.24 02:24, Jason Andryuk wrote:
> > On Wed, Mar 27, 2024 at 7:46 AM Jürgen Groß <jgross@xxxxxxxx> wrote:
> > > 
> > > On 24.01.24 17:54, Jason Andryuk wrote:
> > > > +
> > > > +                     return new;
> > > > +             }
> > > > +     }
> > > > +
> > > >        end = start + npg * PAGE_SIZE - 1;
> > > >        WARN_ONCE(1, "CPA detected W^X violation: %016llx -> %016llx 
> > > > range: 0x%016lx - 0x%016lx PFN %lx\n",
> > > >                  (unsigned long long)pgprot_val(old),
> > > 
> > > Jason, do you want to send a V2 with your Signed-off, or would you like 
> > > me to
> > > try upstreaming the patch?
> > 
> > Hi Jürgen,
> > 
> > Yes, please upstream your approach.  I wasn't sure how to deal with
> > it, so it was more of a bug report.
> 
> The final solution was a bit more complicated, as there are some
> corner cases to be considered. OTOH it is now complete by looking
> at all used translation entries.
> 
> Are you able to test the attached patch? I don't see the original
> issue and can only verify the patch doesn't cause any regression.
> 
> 
> Juergen

Hi Jürgen,

I gave a try to the patch in this email with osstest, and I can't find a
single "CPA detected W^X violation" log entry, when there's seems to be
many in osstest in general, from dom0 it seems as it's on the host
serial console usually.

http://logs.test-lab.xenproject.org/osstest/logs/185252/

If you look in several "serial-$host.log*" files, there will be the
"CPA detected" message, but they happen on previous test run.

I did an other smaller run before this one, and same thing:
http://logs.test-lab.xenproject.org/osstest/logs/185186/

And this other run as well, which I failed to setup properly with lots
of broken, but no failure due to the patch and I can't find any "CPA
detected" messages.
http://logs.test-lab.xenproject.org/osstest/logs/185248/

I hope that helps?

Cheers,


> From fd25a67d92e44b61d05d92658b23d026202a1656 Mon Sep 17 00:00:00 2001
> From: Juergen Gross <jgross@xxxxxxxx>
> To: x86@xxxxxxxxxx
> To: linux-kernel@xxxxxxxxxxxxxxx
> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Date: Thu, 28 Mar 2024 12:24:48 +0100
> Subject: [PATCH] x86/pat: fix W^X violation false-positives when running as
>  Xen PV guest
> 
> When running as Xen PV guest in some cases W^X violation WARN()s have
> been observed. Those WARN()s are produced by verify_rwx(), which looks
> into the PTE to verify that writable kernel pages have the NX bit set
> in order to avoid code modifications of the kernel by rogue code.
> 
> As the NX bits of all levels of translation entries are or-ed and the
> RW bits of all levels are and-ed, looking just into the PTE isn't enough
> for the decision that a writable page is executable, too. When running
> as a Xen PV guest, kernel initialization will set the NX bit in PMD
> entries of the initial page tables covering the .data segment.
> 
> When finding the PTE to have set the RW bit but no NX bit, higher level
> entries must be looked at. Only when all levels have the RW bit set and
> no NX bit set, the W^X violation should be flagged.
> 
> Additionally show_fault_oops() has a similar problem: it will issue the
> "kernel tried to execute NX-protected page" message only if it finds
> the NX bit set in the leaf translation entry, while any NX bit in
> non-leaf entries are being ignored for issuing the message.
> 
> Modify lookup_address_in_pgd() to return the effective NX and RW bit
> values of the non-leaf translation entries and evaluate those as well
> in verify_rwx() and show_fault_oops().
> 
> Fixes: 652c5bf380ad ("x86/mm: Refuse W^X violations")
> Reported-by: Jason Andryuk <jandryuk@xxxxxxxxx>
> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
> ---
>  arch/x86/include/asm/pgtable_types.h |  2 +-
>  arch/x86/kernel/sev.c                |  3 +-
>  arch/x86/mm/fault.c                  |  7 ++--
>  arch/x86/mm/pat/set_memory.c         | 56 +++++++++++++++++++++-------
>  arch/x86/virt/svm/sev.c              |  3 +-
>  5 files changed, 52 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_types.h 
> b/arch/x86/include/asm/pgtable_types.h
> index 0b748ee16b3d..91ab538d3872 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -565,7 +565,7 @@ static inline void update_page_count(int level, unsigned 
> long pages) { }
>   */
>  extern pte_t *lookup_address(unsigned long address, unsigned int *level);
>  extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
> -                                 unsigned int *level);
> +                                 unsigned int *level, bool *nx, bool *rw);
>  extern pmd_t *lookup_pmd_address(unsigned long address);
>  extern phys_addr_t slow_virt_to_phys(void *__address);
>  extern int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn,
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index b59b09c2f284..e833183d1adb 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -515,12 +515,13 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb 
> *ghcb, struct es_em_ctxt
>       unsigned long va = (unsigned long)vaddr;
>       unsigned int level;
>       phys_addr_t pa;
> +     bool nx, rw;
>       pgd_t *pgd;
>       pte_t *pte;
>  
>       pgd = __va(read_cr3_pa());
>       pgd = &pgd[pgd_index(va)];
> -     pte = lookup_address_in_pgd(pgd, va, &level);
> +     pte = lookup_address_in_pgd(pgd, va, &level, &nx, &rw);
>       if (!pte) {
>               ctxt->fi.vector     = X86_TRAP_PF;
>               ctxt->fi.cr2        = vaddr;
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 622d12ec7f08..eb8e897a5653 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -514,18 +514,19 @@ show_fault_oops(struct pt_regs *regs, unsigned long 
> error_code, unsigned long ad
>  
>       if (error_code & X86_PF_INSTR) {
>               unsigned int level;
> +             bool nx, rw;
>               pgd_t *pgd;
>               pte_t *pte;
>  
>               pgd = __va(read_cr3_pa());
>               pgd += pgd_index(address);
>  
> -             pte = lookup_address_in_pgd(pgd, address, &level);
> +             pte = lookup_address_in_pgd(pgd, address, &level, &nx, &rw);
>  
> -             if (pte && pte_present(*pte) && !pte_exec(*pte))
> +             if (pte && pte_present(*pte) && (!pte_exec(*pte) || nx))
>                       pr_crit("kernel tried to execute NX-protected page - 
> exploit attempt? (uid: %d)\n",
>                               from_kuid(&init_user_ns, current_uid()));
> -             if (pte && pte_present(*pte) && pte_exec(*pte) &&
> +             if (pte && pte_present(*pte) && pte_exec(*pte) && !nx &&
>                               (pgd_flags(*pgd) & _PAGE_USER) &&
>                               (__read_cr4() & X86_CR4_SMEP))
>                       pr_crit("unable to execute userspace code (SMEP?) (uid: 
> %d)\n",
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 80c9037ffadf..baa4dc4748e9 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -619,7 +619,8 @@ static inline pgprot_t static_protections(pgprot_t prot, 
> unsigned long start,
>   * Validate strict W^X semantics.
>   */
>  static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long 
> start,
> -                               unsigned long pfn, unsigned long npg)
> +                               unsigned long pfn, unsigned long npg,
> +                               bool nx, bool rw)
>  {
>       unsigned long end;
>  
> @@ -641,6 +642,10 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t 
> new, unsigned long star
>       if ((pgprot_val(new) & (_PAGE_RW | _PAGE_NX)) != _PAGE_RW)
>               return new;
>  
> +     /* Non-leaf translation entries can disable writing or execution. */
> +     if (!rw || nx)
> +             return new;
> +
>       end = start + npg * PAGE_SIZE - 1;
>       WARN_ONCE(1, "CPA detected W^X violation: %016llx -> %016llx range: 
> 0x%016lx - 0x%016lx PFN %lx\n",
>                 (unsigned long long)pgprot_val(old),
> @@ -660,17 +665,22 @@ static inline pgprot_t verify_rwx(pgprot_t old, 
> pgprot_t new, unsigned long star
>   * Return a pointer to the entry and the level of the mapping.
>   */
>  pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
> -                          unsigned int *level)
> +                          unsigned int *level, bool *nx, bool *rw)
>  {
>       p4d_t *p4d;
>       pud_t *pud;
>       pmd_t *pmd;
>  
>       *level = PG_LEVEL_NONE;
> +     *nx = false;
> +     *rw = true;
>  
>       if (pgd_none(*pgd))
>               return NULL;
>  
> +     *nx |= pgd_flags(*pgd) & _PAGE_NX;
> +     *rw &= pgd_flags(*pgd) & _PAGE_RW;
> +
>       p4d = p4d_offset(pgd, address);
>       if (p4d_none(*p4d))
>               return NULL;
> @@ -679,6 +689,9 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long 
> address,
>       if (p4d_leaf(*p4d) || !p4d_present(*p4d))
>               return (pte_t *)p4d;
>  
> +     *nx |= p4d_flags(*p4d) & _PAGE_NX;
> +     *rw &= p4d_flags(*p4d) & _PAGE_RW;
> +
>       pud = pud_offset(p4d, address);
>       if (pud_none(*pud))
>               return NULL;
> @@ -687,6 +700,9 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long 
> address,
>       if (pud_leaf(*pud) || !pud_present(*pud))
>               return (pte_t *)pud;
>  
> +     *nx |= pud_flags(*pud) & _PAGE_NX;
> +     *rw &= pud_flags(*pud) & _PAGE_RW;
> +
>       pmd = pmd_offset(pud, address);
>       if (pmd_none(*pmd))
>               return NULL;
> @@ -695,6 +711,9 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long 
> address,
>       if (pmd_leaf(*pmd) || !pmd_present(*pmd))
>               return (pte_t *)pmd;
>  
> +     *nx |= pmd_flags(*pmd) & _PAGE_NX;
> +     *rw &= pmd_flags(*pmd) & _PAGE_RW;
> +
>       *level = PG_LEVEL_4K;
>  
>       return pte_offset_kernel(pmd, address);
> @@ -710,18 +729,24 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long 
> address,
>   */
>  pte_t *lookup_address(unsigned long address, unsigned int *level)
>  {
> -     return lookup_address_in_pgd(pgd_offset_k(address), address, level);
> +     bool nx, rw;
> +
> +     return lookup_address_in_pgd(pgd_offset_k(address), address, level,
> +                                  &nx, &rw);
>  }
>  EXPORT_SYMBOL_GPL(lookup_address);
>  
>  static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long 
> address,
> -                               unsigned int *level)
> +                               unsigned int *level, bool *nx, bool *rw)
>  {
> -     if (cpa->pgd)
> -             return lookup_address_in_pgd(cpa->pgd + pgd_index(address),
> -                                            address, level);
> +     pgd_t *pgd;
> +
> +     if (!cpa->pgd)
> +             pgd = pgd_offset_k(address);
> +     else
> +             pgd = cpa->pgd + pgd_index(address);
>  
> -     return lookup_address(address, level);
> +     return lookup_address_in_pgd(pgd, address, level, nx, rw);
>  }
>  
>  /*
> @@ -849,12 +874,13 @@ static int __should_split_large_page(pte_t *kpte, 
> unsigned long address,
>       pgprot_t old_prot, new_prot, req_prot, chk_prot;
>       pte_t new_pte, *tmp;
>       enum pg_level level;
> +     bool nx, rw;
>  
>       /*
>        * Check for races, another CPU might have split this page
>        * up already:
>        */
> -     tmp = _lookup_address_cpa(cpa, address, &level);
> +     tmp = _lookup_address_cpa(cpa, address, &level, &nx, &rw);
>       if (tmp != kpte)
>               return 1;
>  
> @@ -965,7 +991,8 @@ static int __should_split_large_page(pte_t *kpte, 
> unsigned long address,
>       new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages,
>                                     psize, CPA_DETECT);
>  
> -     new_prot = verify_rwx(old_prot, new_prot, lpaddr, old_pfn, numpages);
> +     new_prot = verify_rwx(old_prot, new_prot, lpaddr, old_pfn, numpages,
> +                           nx, rw);
>  
>       /*
>        * If there is a conflict, split the large page.
> @@ -1046,6 +1073,7 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, 
> unsigned long address,
>       pte_t *pbase = (pte_t *)page_address(base);
>       unsigned int i, level;
>       pgprot_t ref_prot;
> +     bool nx, rw;
>       pte_t *tmp;
>  
>       spin_lock(&pgd_lock);
> @@ -1053,7 +1081,7 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, 
> unsigned long address,
>        * Check for races, another CPU might have split this page
>        * up for us already:
>        */
> -     tmp = _lookup_address_cpa(cpa, address, &level);
> +     tmp = _lookup_address_cpa(cpa, address, &level, &nx, &rw);
>       if (tmp != kpte) {
>               spin_unlock(&pgd_lock);
>               return 1;
> @@ -1594,10 +1622,11 @@ static int __change_page_attr(struct cpa_data *cpa, 
> int primary)
>       int do_split, err;
>       unsigned int level;
>       pte_t *kpte, old_pte;
> +     bool nx, rw;
>  
>       address = __cpa_addr(cpa, cpa->curpage);
>  repeat:
> -     kpte = _lookup_address_cpa(cpa, address, &level);
> +     kpte = _lookup_address_cpa(cpa, address, &level, &nx, &rw);
>       if (!kpte)
>               return __cpa_process_fault(cpa, address, primary);
>  
> @@ -1619,7 +1648,8 @@ static int __change_page_attr(struct cpa_data *cpa, int 
> primary)
>               new_prot = static_protections(new_prot, address, pfn, 1, 0,
>                                             CPA_PROTECT);
>  
> -             new_prot = verify_rwx(old_prot, new_prot, address, pfn, 1);
> +             new_prot = verify_rwx(old_prot, new_prot, address, pfn, 1,
> +                                   nx, rw);
>  
>               new_prot = pgprot_clear_protnone_bits(new_prot);
>  
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index cffe1157a90a..9131d80cff36 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -338,12 +338,13 @@ void snp_dump_hva_rmpentry(unsigned long hva)
>  {
>       unsigned long paddr;
>       unsigned int level;
> +     bool nx, rw;
>       pgd_t *pgd;
>       pte_t *pte;
>  
>       pgd = __va(read_cr3_pa());
>       pgd += pgd_index(hva);
> -     pte = lookup_address_in_pgd(pgd, hva, &level);
> +     pte = lookup_address_in_pgd(pgd, hva, &level, &nx, &rw);
>  
>       if (!pte) {
>               pr_err("Can't dump RMP entry for HVA %lx: no PTE/PFN found\n", 
> hva);
> -- 
> 2.35.3
> 


-- 
Anthony PERARD



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.