[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH] xen: arm: force guest memory accesses to cacheable when MMU is disabled
On ARM guest OSes are started with MMU and Caches disables (as they are on native) however caching is enabled in the domain running the builder and therefore we must ensure cache consistency. The existing solution to this problem (a0035ecc0d82 "tools: libxc: flush data cache after loading images into guest memory") is to flush the caches after loading the various blobs into guest RAM. However this approach has two short comings: - The cache flush primitives available to userspace on arm32 are not sufficient for our needs. - There is a race between the cache flush and the unmap of the guest page where the processor might speculatively dirty the cache line again. (of these the second is the more fundamental) This patch makes use of the the hardware functionality to force all accesses made from guest mode to be cached (the HCR.DC == default cached bit). This means that we don't need to worry about the domain builder's writes being cached because the guests "uncached" accesses will actually be cached. Unfortunately the use of HCR.DC is incompatible with the guest enabling its MMU (SCTLR.M bit). Therefore we must trap accesses to the SCTLR so that we can detect when this happens and disable HCR.DC. This is done with the HCR.TVM (trap virtual memory controls) bit which also causes various other registers to be trapped, all of which can be passed straight through to the underlying register. Once the guest has enabled its MMU we no longer need to trap so there is no ongoing overhead. In my tests Linux makes about half a dozen accesses to these registers before the MMU is enabled, I would expect other OSes to behave similarly (the sequence of writes needed to setup the MMU is pretty obvious). Apart from this unfortunate need to trap these accesses this approach is incompatible with guests which attempt to do DMA operations with their MMU disabled. In practice this means guests with passthrough which we do not yet support. Since a typical guest (including dom0) does not access devices which require DMA until after it is fully up and running with paging enabled the main risk is to in-guest firmware which does DMA i.e. running EFI in a guest, with a disk passed through and booting from that disk. Since we know that dom0 is not using any such firmware and we do not support device passthrough to guests yet we can live with this restriction. Once passthrough is implemented this will need to be revisited. The patch includes a couple of seemingly unrelated but necessary changes: - HSR_SYSREG_CRN_MASK was incorrectly defined, which happened to be benign with the existing set of system register we handled, but broke with the new ones introduced here. - The defines used to decode the HSR system register fields were named the same as the register. This breaks the accessor macros. This had gone unnoticed because the handling of the existing trapped registers did not require accessing the underlying hardware register. Rename those constants with an HSR_SYSREG prefix (in line with HSR_CP32/64 for 32-bit registers). This patch has survived thousands of boot loops on a Midway system. Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx> --- My preferred solution here would be for the tools to use an uncached mapping of guest memory when building the guest, which requires adding a new privmcd ioctl (a relatively straightforward patch) and plumbing a "cached" flag through the libxc foreign mapping interfaces (a twisty maze of passage, all alike). IMO the libxc side of this patch was not looking suitable for a 4.4 freeze exception, since it was quite large (because we have 4 or more mapping interfaces in libxc, some of which call back into others). So I propose this version for 4.4. The uncached mapping solution should be revisited for a future release. At the risk of appearing to be going mad: <speaking hat="submitter"> This bug results in memory corruption bugs in the guest, which mostly manifest as a failure to boot the guest (subsequent bad behaviour is possible but, I think, unlikely) the frequency of failures is perhaps 1/10 times. This would not constitute an awesome release. Although the patch is large most of it is repetative and mechanical (made explicit through the use of macros in many cases). The biggest risk is that one of the registers is not passed through correctly (i.e. the wrong size or target registers). The ones which Linux uses have been tested and appear to function OK. The others might be buggy but this is mitigated through the use of the same set of macros. I think the chance of the patch having a bug wrt my understanding of the hardware behaviour is pretty low. WRT there being bugs in my understanding of the hardware documentation, I would say middle to low, but I have discussed it with some folks at ARM and they didn't call me an idiot (in fact pretty much the same thing has been proposed for KVM). Overall I think the benefits outweigh the risks. One thing I'm not sure about is reverting the previous fix in a0035ecc0d82. It's reasonably recent so reverting it takes us back to a pretty well understood state in the libraries. The functionality is harmless if incomplete. I think given the first argument I would lean towards reverting. </speaking> <speaking hat="stand in release manager"> OK. </speaking> Obviously if you think I'm being to easy on myself please say so. Actually, if you think my judgement is right I'd appreciate being told so too. --- xen/arch/arm/domain.c | 5 ++ xen/arch/arm/domain_build.c | 1 + xen/arch/arm/traps.c | 153 ++++++++++++++++++++++++++++++++++++++- xen/arch/arm/vtimer.c | 6 +- xen/include/asm-arm/cpregs.h | 4 + xen/include/asm-arm/processor.h | 2 +- xen/include/asm-arm/sysregs.h | 19 ++++- 7 files changed, 182 insertions(+), 8 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 124cccf..104d228 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -219,6 +219,11 @@ static void ctxt_switch_to(struct vcpu *n) else hcr |= HCR_RW; + if ( n->arch.sctlr & SCTLR_M ) + hcr &= ~(HCR_TVM|HCR_DC); + else + hcr |= (HCR_TVM|HCR_DC); + WRITE_SYSREG(hcr, HCR_EL2); isb(); diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 47b781b..bb31db8 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -1026,6 +1026,7 @@ int construct_dom0(struct domain *d) else WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_RW, HCR_EL2); #endif + WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_DC | HCR_TVM, HCR_EL2); /* * kernel_load will determine the placement of the initrd & fdt in diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 7c5ab19..d00bba3 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1279,6 +1279,23 @@ static void advance_pc(struct cpu_user_regs *regs, union hsr hsr) regs->pc += hsr.len ? 4 : 2; } +static void update_sctlr(uint32_t v) +{ + /* + * If MMU (SCTLR_M) is now enabled then we must disable HCR.DC + * because they are incompatible. + * + * Once HCR.DC is disabled then we do not need HCR_TVM either, + * since it's only purpose was to catch the MMU being enabled. + * + * Both are set appropriately on context switch but we need to + * clear them now since we may not context switch on return to + * guest. + */ + if ( v & SCTLR_M ) + WRITE_SYSREG(READ_SYSREG(HCR_EL2) & ~(HCR_DC|HCR_TVM), HCR_EL2); +} + static void do_cp15_32(struct cpu_user_regs *regs, union hsr hsr) { @@ -1338,6 +1355,89 @@ static void do_cp15_32(struct cpu_user_regs *regs, if ( cp32.read ) *r = v->arch.actlr; break; + +/* Passthru a 32-bit AArch32 register which is also 32-bit under AArch64 */ +#define CP32_PASSTHRU32(R...) do { \ + if ( cp32.read ) \ + *r = READ_SYSREG32(R); \ + else \ + WRITE_SYSREG32(*r, R); \ +} while(0) + +/* + * Passthru a 32-bit AArch32 register which is 64-bit under AArch64. + * Updates the lower 32-bits and clears the upper bits. + */ +#define CP32_PASSTHRU64(R...) do { \ + if ( cp32.read ) \ + *r = (uint32_t)READ_SYSREG64(R); \ + else \ + WRITE_SYSREG64((uint64_t)*r, R); \ +} while(0) + +/* + * Passthru a 32-bit AArch32 register which is 64-bit under AArch64. + * Updates either the HI ([63:32]) or LO ([31:0]) 32-bits preserving + * the other half. + */ +#ifdef CONFIG_ARM_64 +#define CP32_PASSTHRU64_HI(R...) do { \ + if ( cp32.read ) \ + *r = (uint32_t)(READ_SYSREG64(R) >> 32); \ + else \ + { \ + uint64_t t = READ_SYSREG64(R) & 0xffffffffUL; \ + t |= ((uint64_t)(*r)) << 32; \ + WRITE_SYSREG64(t, R); \ + } \ +} while(0) +#define CP32_PASSTHRU64_LO(R...) do { \ + if ( cp32.read ) \ + *r = (uint32_t)(READ_SYSREG64(R) & 0xffffffff); \ + else \ + { \ + uint64_t t = READ_SYSREG64(R) & 0xffffffff00000000UL; \ + t |= *r; \ + WRITE_SYSREG64(t, R); \ + } \ +} while(0) +#endif + + /* HCR.TVM */ + case HSR_CPREG32(SCTLR): + CP32_PASSTHRU32(SCTLR_EL1); + update_sctlr(*r); + break; + case HSR_CPREG32(TTBR0_32): CP32_PASSTHRU64(TTBR0_EL1); break; + case HSR_CPREG32(TTBR1_32): CP32_PASSTHRU64(TTBR1_EL1); break; + case HSR_CPREG32(TTBCR): CP32_PASSTHRU32(TCR_EL1); break; + case HSR_CPREG32(DACR): CP32_PASSTHRU32(DACR32_EL2); break; + case HSR_CPREG32(DFSR): CP32_PASSTHRU32(ESR_EL1); break; + case HSR_CPREG32(IFSR): CP32_PASSTHRU32(IFSR32_EL2); break; + case HSR_CPREG32(ADFSR): CP32_PASSTHRU32(AFSR0_EL1); break; + case HSR_CPREG32(AIFSR): CP32_PASSTHRU32(AFSR1_EL1); break; + case HSR_CPREG32(CONTEXTIDR): CP32_PASSTHRU32(CONTEXTIDR_EL1); break; + +#ifdef CONFIG_ARM_64 + case HSR_CPREG32(DFAR): CP32_PASSTHRU64_LO(FAR_EL1); break; + case HSR_CPREG32(IFAR): CP32_PASSTHRU64_HI(FAR_EL1); break; + case HSR_CPREG32(MAIR0): CP32_PASSTHRU64_LO(MAIR_EL1); break; + case HSR_CPREG32(MAIR1): CP32_PASSTHRU64_HI(MAIR_EL1); break; + case HSR_CPREG32(AMAIR0): CP32_PASSTHRU64_LO(AMAIR_EL1); break; + case HSR_CPREG32(AMAIR1): CP32_PASSTHRU64_HI(AMAIR_EL1); break; +#else + case HSR_CPREG32(DFAR): CP32_PASSTHRU32(DFAR); break; + case HSR_CPREG32(IFAR): CP32_PASSTHRU32(IFAR); break; + case HSR_CPREG32(MAIR0): CP32_PASSTHRU32(MAIR0); break; + case HSR_CPREG32(MAIR1): CP32_PASSTHRU32(MAIR1); break; + case HSR_CPREG32(AMAIR0): CP32_PASSTHRU32(AMAIR0); break; + case HSR_CPREG32(AMAIR1): CP32_PASSTHRU32(AMAIR1); break; +#endif + +#undef CP32_PASSTHRU32 +#undef CP32_PASSTHRU64 +#undef CP32_PASSTHRU64_LO +#undef CP32_PASSTHRU64_HI default: printk("%s p15, %d, r%d, cr%d, cr%d, %d @ 0x%"PRIregister"\n", cp32.read ? "mrc" : "mcr", @@ -1351,6 +1451,9 @@ static void do_cp15_64(struct cpu_user_regs *regs, union hsr hsr) { struct hsr_cp64 cp64 = hsr.cp64; + uint32_t *r1 = (uint32_t *)select_user_reg(regs, cp64.reg1); + uint32_t *r2 = (uint32_t *)select_user_reg(regs, cp64.reg2); + uint64_t r; if ( !check_conditional_instr(regs, hsr) ) { @@ -1368,6 +1471,26 @@ static void do_cp15_64(struct cpu_user_regs *regs, domain_crash_synchronous(); } break; + +#define CP64_PASSTHRU(R...) do { \ + if ( cp64.read ) \ + { \ + r = READ_SYSREG64(R); \ + *r1 = r & 0xffffffffUL; \ + *r2 = r >> 32; \ + } \ + else \ + { \ + r = (*r1) | (((uint64_t)(*r2))<<32); \ + WRITE_SYSREG64(r, R); \ + } \ +} while(0) + + case HSR_CPREG64(TTBR0): CP64_PASSTHRU(TTBR0_EL1); break; + case HSR_CPREG64(TTBR1): CP64_PASSTHRU(TTBR1_EL1); break; + +#undef CP64_PASSTHRU + default: printk("%s p15, %d, r%d, r%d, cr%d @ 0x%"PRIregister"\n", cp64.read ? "mrrc" : "mcrr", @@ -1382,11 +1505,12 @@ static void do_sysreg(struct cpu_user_regs *regs, union hsr hsr) { struct hsr_sysreg sysreg = hsr.sysreg; + register_t *x = select_user_reg(regs, sysreg.reg); switch ( hsr.bits & HSR_SYSREG_REGS_MASK ) { - case CNTP_CTL_EL0: - case CNTP_TVAL_EL0: + case HSR_SYSREG_CNTP_CTL_EL0: + case HSR_SYSREG_CNTP_TVAL_EL0: if ( !vtimer_emulate(regs, hsr) ) { dprintk(XENLOG_ERR, @@ -1394,6 +1518,31 @@ static void do_sysreg(struct cpu_user_regs *regs, domain_crash_synchronous(); } break; + +#define SYSREG_PASSTHRU(R...) do { \ + if ( sysreg.read ) \ + *x = READ_SYSREG(R); \ + else \ + WRITE_SYSREG(*x, R); \ +} while(0) + + case HSR_SYSREG_SCTLR_EL1: + SYSREG_PASSTHRU(SCTLR_EL1); + update_sctlr(*x); + break; + case HSR_SYSREG_TTBR0_EL1: SYSREG_PASSTHRU(TTBR0_EL1); break; + case HSR_SYSREG_TTBR1_EL1: SYSREG_PASSTHRU(TTBR1_EL1); break; + case HSR_SYSREG_TCR_EL1: SYSREG_PASSTHRU(TCR_EL1); break; + case HSR_SYSREG_ESR_EL1: SYSREG_PASSTHRU(ESR_EL1); break; + case HSR_SYSREG_FAR_EL1: SYSREG_PASSTHRU(FAR_EL1); break; + case HSR_SYSREG_AFSR0_EL1: SYSREG_PASSTHRU(AFSR0_EL1); break; + case HSR_SYSREG_AFSR1_EL1: SYSREG_PASSTHRU(AFSR1_EL1); break; + case HSR_SYSREG_MAIR_EL1: SYSREG_PASSTHRU(MAIR_EL1); break; + case HSR_SYSREG_AMAIR_EL1: SYSREG_PASSTHRU(AMAIR_EL1); break; + case HSR_SYSREG_CONTEXTIDR_EL1: SYSREG_PASSTHRU(CONTEXTIDR_EL1); break; + +#undef SYSREG_PASSTHRU + default: printk("%s %d, %d, c%d, c%d, %d %s x%d @ 0x%"PRIregister"\n", sysreg.read ? "mrs" : "msr", diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c index 433ad55..e325f78 100644 --- a/xen/arch/arm/vtimer.c +++ b/xen/arch/arm/vtimer.c @@ -240,18 +240,18 @@ static int vtimer_emulate_sysreg(struct cpu_user_regs *regs, union hsr hsr) switch ( hsr.bits & HSR_SYSREG_REGS_MASK ) { - case CNTP_CTL_EL0: + case HSR_SYSREG_CNTP_CTL_EL0: vtimer_cntp_ctl(regs, &r, sysreg.read); if ( sysreg.read ) *x = r; return 1; - case CNTP_TVAL_EL0: + case HSR_SYSREG_CNTP_TVAL_EL0: vtimer_cntp_tval(regs, &r, sysreg.read); if ( sysreg.read ) *x = r; return 1; - case HSR_CPREG64(CNTPCT): + case HSR_SYSREG_CNTPCT_EL0: return vtimer_cntpct(regs, x, sysreg.read); default: diff --git a/xen/include/asm-arm/cpregs.h b/xen/include/asm-arm/cpregs.h index f0f1d53..508467a 100644 --- a/xen/include/asm-arm/cpregs.h +++ b/xen/include/asm-arm/cpregs.h @@ -121,6 +121,8 @@ #define TTBR0 p15,0,c2 /* Translation Table Base Reg. 0 */ #define TTBR1 p15,1,c2 /* Translation Table Base Reg. 1 */ #define HTTBR p15,4,c2 /* Hyp. Translation Table Base Register */ +#define TTBR0_32 p15,0,c2,c0,0 /* 32-bit access to TTBR0 */ +#define TTBR1_32 p15,0,c2,c0,1 /* 32-bit access to TTBR1 */ #define HTCR p15,4,c2,c0,2 /* Hyp. Translation Control Register */ #define VTCR p15,4,c2,c1,2 /* Virtualization Translation Control Register */ #define VTTBR p15,6,c2 /* Virtualization Translation Table Base Register */ @@ -260,7 +262,9 @@ #define CPACR_EL1 CPACR #define CSSELR_EL1 CSSELR #define DACR32_EL2 DACR +#define ESR_EL1 DFSR #define ESR_EL2 HSR +#define FAR_EL1 HIFAR #define FAR_EL2 HIFAR #define HCR_EL2 HCR #define HPFAR_EL2 HPFAR diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h index dfe807d..06e638f 100644 --- a/xen/include/asm-arm/processor.h +++ b/xen/include/asm-arm/processor.h @@ -342,7 +342,7 @@ union hsr { #define HSR_SYSREG_OP0_SHIFT (20) #define HSR_SYSREG_OP1_MASK (0x0001c000) #define HSR_SYSREG_OP1_SHIFT (14) -#define HSR_SYSREG_CRN_MASK (0x00003800) +#define HSR_SYSREG_CRN_MASK (0x00003c00) #define HSR_SYSREG_CRN_SHIFT (10) #define HSR_SYSREG_CRM_MASK (0x0000001e) #define HSR_SYSREG_CRM_SHIFT (1) diff --git a/xen/include/asm-arm/sysregs.h b/xen/include/asm-arm/sysregs.h index 48ad07e..0cee0e9 100644 --- a/xen/include/asm-arm/sysregs.h +++ b/xen/include/asm-arm/sysregs.h @@ -40,8 +40,23 @@ ((__HSR_SYSREG_##crm) << HSR_SYSREG_CRM_SHIFT) | \ ((__HSR_SYSREG_##op2) << HSR_SYSREG_OP2_SHIFT) -#define CNTP_CTL_EL0 HSR_SYSREG(3,3,c14,c2,1) -#define CNTP_TVAL_EL0 HSR_SYSREG(3,3,c14,c2,0) +#define HSR_SYSREG_SCTLR_EL1 HSR_SYSREG(3,0,c1, c0,0) +#define HSR_SYSREG_TTBR0_EL1 HSR_SYSREG(3,0,c2, c0,0) +#define HSR_SYSREG_TTBR1_EL1 HSR_SYSREG(3,0,c2, c0,1) +#define HSR_SYSREG_TCR_EL1 HSR_SYSREG(3,0,c2, c0,2) +#define HSR_SYSREG_AFSR0_EL1 HSR_SYSREG(3,0,c5, c1,0) +#define HSR_SYSREG_AFSR1_EL1 HSR_SYSREG(3,0,c5, c1,1) +#define HSR_SYSREG_ESR_EL1 HSR_SYSREG(3,0,c5, c2,0) +#define HSR_SYSREG_FAR_EL1 HSR_SYSREG(3,0,c6, c0,0) +#define HSR_SYSREG_MAIR_EL1 HSR_SYSREG(3,0,c10,c2,0) +#define HSR_SYSREG_AMAIR_EL1 HSR_SYSREG(3,0,c10,c3,0) +#define HSR_SYSREG_CONTEXTIDR_EL1 HSR_SYSREG(3,0,c13,c0,1) + +#define HSR_SYSREG_CNTPCT_EL0 HSR_SYSREG(3,3,c14,c0,0) +#define HSR_SYSREG_CNTP_CTL_EL0 HSR_SYSREG(3,3,c14,c2,1) +#define HSR_SYSREG_CNTP_TVAL_EL0 HSR_SYSREG(3,3,c14,c2,0) + + #endif #endif -- 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |