[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen: arm: force guest memory accesses to cacheable when MMU is disabled
On Tue, 7 Jan 2014, Ian Campbell wrote: > On ARM guest OSes are started with MMU and Caches disables (as they are on > native) however caching is enabled in the domain running the builder and > therefore we must ensure cache consistency. > > The existing solution to this problem (a0035ecc0d82 "tools: libxc: flush data > cache after loading images into guest memory") is to flush the caches after > loading the various blobs into guest RAM. However this approach has two short > comings: > > - The cache flush primitives available to userspace on arm32 are not > sufficient for our needs. > - There is a race between the cache flush and the unmap of the guest page > where the processor might speculatively dirty the cache line again. > > (of these the second is the more fundamental) > > This patch makes use of the the hardware functionality to force all accesses > made from guest mode to be cached (the HCR.DC == default cached bit). This > means that we don't need to worry about the domain builder's writes being > cached because the guests "uncached" accesses will actually be cached. > > Unfortunately the use of HCR.DC is incompatible with the guest enabling its > MMU (SCTLR.M bit). Therefore we must trap accesses to the SCTLR so that we can > detect when this happens and disable HCR.DC. This is done with the HCR.TVM > (trap virtual memory controls) bit which also causes various other registers > to be trapped, all of which can be passed straight through to the underlying > register. Once the guest has enabled its MMU we no longer need to trap so > there is no ongoing overhead. In my tests Linux makes about half a dozen > accesses to these registers before the MMU is enabled, I would expect other > OSes to behave similarly (the sequence of writes needed to setup the MMU is > pretty obvious). > > Apart from this unfortunate need to trap these accesses this approach is > incompatible with guests which attempt to do DMA operations with their MMU > disabled. In practice this means guests with passthrough which we do not yet > support. Since a typical guest (including dom0) does not access devices which > require DMA until after it is fully up and running with paging enabled the > main risk is to in-guest firmware which does DMA i.e. running EFI in a guest, > with a disk passed through and booting from that disk. Since we know that dom0 > is not using any such firmware and we do not support device passthrough to > guests yet we can live with this restriction. Once passthrough is implemented > this will need to be revisited. > > The patch includes a couple of seemingly unrelated but necessary changes: > > - HSR_SYSREG_CRN_MASK was incorrectly defined, which happened to be benign > with the existing set of system register we handled, but broke with the new > ones introduced here. > - The defines used to decode the HSR system register fields were named the > same as the register. This breaks the accessor macros. This had gone > unnoticed because the handling of the existing trapped registers did not > require accessing the underlying hardware register. Rename those constants > with an HSR_SYSREG prefix (in line with HSR_CP32/64 for 32-bit registers). > > This patch has survived thousands of boot loops on a Midway system. > > Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx> > --- > My preferred solution here would be for the tools to use an uncached mapping > of guest memory when building the guest, which requires adding a new privmcd > ioctl (a relatively straightforward patch) and plumbing a "cached" flag > through the libxc foreign mapping interfaces (a twisty maze of passage, all > alike). IMO the libxc side of this patch was not looking suitable for a 4.4 > freeze exception, since it was quite large (because we have 4 or more mapping > interfaces in libxc, some of which call back into others). > > So I propose this version for 4.4. The uncached mapping solution should be > revisited for a future release. > > At the risk of appearing to be going mad: > > <speaking hat="submitter"> > > This bug results in memory corruption bugs in the guest, which mostly manifest > as a failure to boot the guest (subsequent bad behaviour is possible but, I > think, unlikely) the frequency of failures is perhaps 1/10 times. This would > not constitute an awesome release. > > Although the patch is large most of it is repetative and mechanical (made > explicit through the use of macros in many cases). The biggest risk is that > one of the registers is not passed through correctly (i.e. the wrong size or > target registers). The ones which Linux uses have been tested and appear to > function OK. The others might be buggy but this is mitigated through the use > of the same set of macros. > > I think the chance of the patch having a bug wrt my understanding of the > hardware behaviour is pretty low. WRT there being bugs in my understanding of > the hardware documentation, I would say middle to low, but I have discussed it > with some folks at ARM and they didn't call me an idiot (in fact pretty much > the same thing has been proposed for KVM). > > Overall I think the benefits outweigh the risks. > > One thing I'm not sure about is reverting the previous fix in a0035ecc0d82. > It's reasonably recent so reverting it takes us back to a pretty well > understood state in the libraries. The functionality is harmless if > incomplete. I think given the first argument I would lean towards reverting. > > </speaking> > > <speaking hat="stand in release manager"> > > OK. > > </speaking> > > Obviously if you think I'm being to easy on myself please say so. > > Actually, if you think my judgement is right I'd appreciate being told so too. > --- > xen/arch/arm/domain.c | 5 ++ > xen/arch/arm/domain_build.c | 1 + > xen/arch/arm/traps.c | 153 > ++++++++++++++++++++++++++++++++++++++- > xen/arch/arm/vtimer.c | 6 +- > xen/include/asm-arm/cpregs.h | 4 + > xen/include/asm-arm/processor.h | 2 +- > xen/include/asm-arm/sysregs.h | 19 ++++- > 7 files changed, 182 insertions(+), 8 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index 124cccf..104d228 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -219,6 +219,11 @@ static void ctxt_switch_to(struct vcpu *n) > else > hcr |= HCR_RW; > > + if ( n->arch.sctlr & SCTLR_M ) > + hcr &= ~(HCR_TVM|HCR_DC); > + else > + hcr |= (HCR_TVM|HCR_DC); > + > WRITE_SYSREG(hcr, HCR_EL2); > isb(); Is this actually needed? Shouldn't HCR be already correctly updated by update_sctlr? > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c > index 47b781b..bb31db8 100644 > --- a/xen/arch/arm/domain_build.c > +++ b/xen/arch/arm/domain_build.c > @@ -1026,6 +1026,7 @@ int construct_dom0(struct domain *d) > else > WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_RW, HCR_EL2); > #endif > + WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_DC | HCR_TVM, HCR_EL2); > > /* > * kernel_load will determine the placement of the initrd & fdt in > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 7c5ab19..d00bba3 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -1279,6 +1279,23 @@ static void advance_pc(struct cpu_user_regs *regs, > union hsr hsr) > regs->pc += hsr.len ? 4 : 2; > } > > +static void update_sctlr(uint32_t v) > +{ > + /* > + * If MMU (SCTLR_M) is now enabled then we must disable HCR.DC > + * because they are incompatible. > + * > + * Once HCR.DC is disabled then we do not need HCR_TVM either, > + * since it's only purpose was to catch the MMU being enabled. > + * > + * Both are set appropriately on context switch but we need to > + * clear them now since we may not context switch on return to > + * guest. > + */ > + if ( v & SCTLR_M ) > + WRITE_SYSREG(READ_SYSREG(HCR_EL2) & ~(HCR_DC|HCR_TVM), HCR_EL2); > +} > + > static void do_cp15_32(struct cpu_user_regs *regs, > union hsr hsr) > { > @@ -1338,6 +1355,89 @@ static void do_cp15_32(struct cpu_user_regs *regs, > if ( cp32.read ) > *r = v->arch.actlr; > break; > + > +/* Passthru a 32-bit AArch32 register which is also 32-bit under AArch64 */ > +#define CP32_PASSTHRU32(R...) do { \ > + if ( cp32.read ) \ > + *r = READ_SYSREG32(R); \ > + else \ > + WRITE_SYSREG32(*r, R); \ > +} while(0) > + > +/* > + * Passthru a 32-bit AArch32 register which is 64-bit under AArch64. > + * Updates the lower 32-bits and clears the upper bits. > + */ > +#define CP32_PASSTHRU64(R...) do { \ > + if ( cp32.read ) \ > + *r = (uint32_t)READ_SYSREG64(R); \ > + else \ > + WRITE_SYSREG64((uint64_t)*r, R); \ > +} while(0) Can/Should CP32_PASSTHRU64_LO be used instead of this? > +/* > + * Passthru a 32-bit AArch32 register which is 64-bit under AArch64. > + * Updates either the HI ([63:32]) or LO ([31:0]) 32-bits preserving > + * the other half. > + */ > +#ifdef CONFIG_ARM_64 > +#define CP32_PASSTHRU64_HI(R...) do { \ > + if ( cp32.read ) \ > + *r = (uint32_t)(READ_SYSREG64(R) >> 32); \ > + else \ > + { \ > + uint64_t t = READ_SYSREG64(R) & 0xffffffffUL; \ > + t |= ((uint64_t)(*r)) << 32; \ > + WRITE_SYSREG64(t, R); \ > + } \ > +} while(0) > +#define CP32_PASSTHRU64_LO(R...) do { \ > + if ( cp32.read ) \ > + *r = (uint32_t)(READ_SYSREG64(R) & 0xffffffff); \ > + else \ > + { \ > + uint64_t t = READ_SYSREG64(R) & 0xffffffff00000000UL; \ > + t |= *r; \ > + WRITE_SYSREG64(t, R); \ > + } \ > +} while(0) > +#endif > + /* HCR.TVM */ > + case HSR_CPREG32(SCTLR): > + CP32_PASSTHRU32(SCTLR_EL1); > + update_sctlr(*r); > + break; > + case HSR_CPREG32(TTBR0_32): CP32_PASSTHRU64(TTBR0_EL1); break; > + case HSR_CPREG32(TTBR1_32): CP32_PASSTHRU64(TTBR1_EL1); break; > + case HSR_CPREG32(TTBCR): CP32_PASSTHRU32(TCR_EL1); break; > + case HSR_CPREG32(DACR): CP32_PASSTHRU32(DACR32_EL2); break; > + case HSR_CPREG32(DFSR): CP32_PASSTHRU32(ESR_EL1); break; > + case HSR_CPREG32(IFSR): CP32_PASSTHRU32(IFSR32_EL2); break; > + case HSR_CPREG32(ADFSR): CP32_PASSTHRU32(AFSR0_EL1); break; > + case HSR_CPREG32(AIFSR): CP32_PASSTHRU32(AFSR1_EL1); break; > + case HSR_CPREG32(CONTEXTIDR): CP32_PASSTHRU32(CONTEXTIDR_EL1); break; > + > +#ifdef CONFIG_ARM_64 > + case HSR_CPREG32(DFAR): CP32_PASSTHRU64_LO(FAR_EL1); break; > + case HSR_CPREG32(IFAR): CP32_PASSTHRU64_HI(FAR_EL1); break; > + case HSR_CPREG32(MAIR0): CP32_PASSTHRU64_LO(MAIR_EL1); break; > + case HSR_CPREG32(MAIR1): CP32_PASSTHRU64_HI(MAIR_EL1); break; > + case HSR_CPREG32(AMAIR0): CP32_PASSTHRU64_LO(AMAIR_EL1); break; > + case HSR_CPREG32(AMAIR1): CP32_PASSTHRU64_HI(AMAIR_EL1); break; > +#else > + case HSR_CPREG32(DFAR): CP32_PASSTHRU32(DFAR); break; > + case HSR_CPREG32(IFAR): CP32_PASSTHRU32(IFAR); break; > + case HSR_CPREG32(MAIR0): CP32_PASSTHRU32(MAIR0); break; > + case HSR_CPREG32(MAIR1): CP32_PASSTHRU32(MAIR1); break; > + case HSR_CPREG32(AMAIR0): CP32_PASSTHRU32(AMAIR0); break; > + case HSR_CPREG32(AMAIR1): CP32_PASSTHRU32(AMAIR1); break; > +#endif > + > +#undef CP32_PASSTHRU32 > +#undef CP32_PASSTHRU64 > +#undef CP32_PASSTHRU64_LO > +#undef CP32_PASSTHRU64_HI > default: > printk("%s p15, %d, r%d, cr%d, cr%d, %d @ 0x%"PRIregister"\n", > cp32.read ? "mrc" : "mcr", > @@ -1351,6 +1451,9 @@ static void do_cp15_64(struct cpu_user_regs *regs, > union hsr hsr) > { > struct hsr_cp64 cp64 = hsr.cp64; > + uint32_t *r1 = (uint32_t *)select_user_reg(regs, cp64.reg1); > + uint32_t *r2 = (uint32_t *)select_user_reg(regs, cp64.reg2); > + uint64_t r; > > if ( !check_conditional_instr(regs, hsr) ) > { > @@ -1368,6 +1471,26 @@ static void do_cp15_64(struct cpu_user_regs *regs, > domain_crash_synchronous(); > } > break; > + > +#define CP64_PASSTHRU(R...) do { \ > + if ( cp64.read ) \ > + { \ > + r = READ_SYSREG64(R); \ > + *r1 = r & 0xffffffffUL; \ > + *r2 = r >> 32; \ it doesn't look like r, r1 and r2 are used anywhere > + } \ > + else \ > + { \ > + r = (*r1) | (((uint64_t)(*r2))<<32); \ > + WRITE_SYSREG64(r, R); \ > + } \ > +} while(0) > + > + case HSR_CPREG64(TTBR0): CP64_PASSTHRU(TTBR0_EL1); break; > + case HSR_CPREG64(TTBR1): CP64_PASSTHRU(TTBR1_EL1); break; > + > +#undef CP64_PASSTHRU > + > default: > printk("%s p15, %d, r%d, r%d, cr%d @ 0x%"PRIregister"\n", > cp64.read ? "mrrc" : "mcrr", > @@ -1382,11 +1505,12 @@ static void do_sysreg(struct cpu_user_regs *regs, > union hsr hsr) > { > struct hsr_sysreg sysreg = hsr.sysreg; > + register_t *x = select_user_reg(regs, sysreg.reg); > > switch ( hsr.bits & HSR_SYSREG_REGS_MASK ) > { > - case CNTP_CTL_EL0: > - case CNTP_TVAL_EL0: > + case HSR_SYSREG_CNTP_CTL_EL0: > + case HSR_SYSREG_CNTP_TVAL_EL0: > if ( !vtimer_emulate(regs, hsr) ) > { > dprintk(XENLOG_ERR, > @@ -1394,6 +1518,31 @@ static void do_sysreg(struct cpu_user_regs *regs, > domain_crash_synchronous(); > } > break; > + > +#define SYSREG_PASSTHRU(R...) do { \ > + if ( sysreg.read ) \ > + *x = READ_SYSREG(R); \ > + else \ > + WRITE_SYSREG(*x, R); \ > +} while(0) > + > + case HSR_SYSREG_SCTLR_EL1: > + SYSREG_PASSTHRU(SCTLR_EL1); > + update_sctlr(*x); > + break; > + case HSR_SYSREG_TTBR0_EL1: SYSREG_PASSTHRU(TTBR0_EL1); break; > + case HSR_SYSREG_TTBR1_EL1: SYSREG_PASSTHRU(TTBR1_EL1); break; > + case HSR_SYSREG_TCR_EL1: SYSREG_PASSTHRU(TCR_EL1); break; > + case HSR_SYSREG_ESR_EL1: SYSREG_PASSTHRU(ESR_EL1); break; > + case HSR_SYSREG_FAR_EL1: SYSREG_PASSTHRU(FAR_EL1); break; > + case HSR_SYSREG_AFSR0_EL1: SYSREG_PASSTHRU(AFSR0_EL1); break; > + case HSR_SYSREG_AFSR1_EL1: SYSREG_PASSTHRU(AFSR1_EL1); break; > + case HSR_SYSREG_MAIR_EL1: SYSREG_PASSTHRU(MAIR_EL1); break; > + case HSR_SYSREG_AMAIR_EL1: SYSREG_PASSTHRU(AMAIR_EL1); break; > + case HSR_SYSREG_CONTEXTIDR_EL1: SYSREG_PASSTHRU(CONTEXTIDR_EL1); break; > + > +#undef SYSREG_PASSTHRU > + > default: > printk("%s %d, %d, c%d, c%d, %d %s x%d @ 0x%"PRIregister"\n", > sysreg.read ? "mrs" : "msr", > diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c > index 433ad55..e325f78 100644 > --- a/xen/arch/arm/vtimer.c > +++ b/xen/arch/arm/vtimer.c > @@ -240,18 +240,18 @@ static int vtimer_emulate_sysreg(struct cpu_user_regs > *regs, union hsr hsr) > > switch ( hsr.bits & HSR_SYSREG_REGS_MASK ) > { > - case CNTP_CTL_EL0: > + case HSR_SYSREG_CNTP_CTL_EL0: > vtimer_cntp_ctl(regs, &r, sysreg.read); > if ( sysreg.read ) > *x = r; > return 1; > - case CNTP_TVAL_EL0: > + case HSR_SYSREG_CNTP_TVAL_EL0: > vtimer_cntp_tval(regs, &r, sysreg.read); > if ( sysreg.read ) > *x = r; > return 1; > > - case HSR_CPREG64(CNTPCT): > + case HSR_SYSREG_CNTPCT_EL0: > return vtimer_cntpct(regs, x, sysreg.read); > > default: > diff --git a/xen/include/asm-arm/cpregs.h b/xen/include/asm-arm/cpregs.h > index f0f1d53..508467a 100644 > --- a/xen/include/asm-arm/cpregs.h > +++ b/xen/include/asm-arm/cpregs.h > @@ -121,6 +121,8 @@ > #define TTBR0 p15,0,c2 /* Translation Table Base Reg. 0 */ > #define TTBR1 p15,1,c2 /* Translation Table Base Reg. 1 */ > #define HTTBR p15,4,c2 /* Hyp. Translation Table Base > Register */ > +#define TTBR0_32 p15,0,c2,c0,0 /* 32-bit access to TTBR0 */ > +#define TTBR1_32 p15,0,c2,c0,1 /* 32-bit access to TTBR1 */ > #define HTCR p15,4,c2,c0,2 /* Hyp. Translation Control Register > */ > #define VTCR p15,4,c2,c1,2 /* Virtualization Translation > Control Register */ > #define VTTBR p15,6,c2 /* Virtualization Translation Table > Base Register */ > @@ -260,7 +262,9 @@ > #define CPACR_EL1 CPACR > #define CSSELR_EL1 CSSELR > #define DACR32_EL2 DACR > +#define ESR_EL1 DFSR > #define ESR_EL2 HSR > +#define FAR_EL1 HIFAR > #define FAR_EL2 HIFAR > #define HCR_EL2 HCR > #define HPFAR_EL2 HPFAR > diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h > index dfe807d..06e638f 100644 > --- a/xen/include/asm-arm/processor.h > +++ b/xen/include/asm-arm/processor.h > @@ -342,7 +342,7 @@ union hsr { > #define HSR_SYSREG_OP0_SHIFT (20) > #define HSR_SYSREG_OP1_MASK (0x0001c000) > #define HSR_SYSREG_OP1_SHIFT (14) > -#define HSR_SYSREG_CRN_MASK (0x00003800) > +#define HSR_SYSREG_CRN_MASK (0x00003c00) > #define HSR_SYSREG_CRN_SHIFT (10) > #define HSR_SYSREG_CRM_MASK (0x0000001e) > #define HSR_SYSREG_CRM_SHIFT (1) > diff --git a/xen/include/asm-arm/sysregs.h b/xen/include/asm-arm/sysregs.h > index 48ad07e..0cee0e9 100644 > --- a/xen/include/asm-arm/sysregs.h > +++ b/xen/include/asm-arm/sysregs.h > @@ -40,8 +40,23 @@ > ((__HSR_SYSREG_##crm) << HSR_SYSREG_CRM_SHIFT) | \ > ((__HSR_SYSREG_##op2) << HSR_SYSREG_OP2_SHIFT) > > -#define CNTP_CTL_EL0 HSR_SYSREG(3,3,c14,c2,1) > -#define CNTP_TVAL_EL0 HSR_SYSREG(3,3,c14,c2,0) > +#define HSR_SYSREG_SCTLR_EL1 HSR_SYSREG(3,0,c1, c0,0) > +#define HSR_SYSREG_TTBR0_EL1 HSR_SYSREG(3,0,c2, c0,0) > +#define HSR_SYSREG_TTBR1_EL1 HSR_SYSREG(3,0,c2, c0,1) > +#define HSR_SYSREG_TCR_EL1 HSR_SYSREG(3,0,c2, c0,2) > +#define HSR_SYSREG_AFSR0_EL1 HSR_SYSREG(3,0,c5, c1,0) > +#define HSR_SYSREG_AFSR1_EL1 HSR_SYSREG(3,0,c5, c1,1) > +#define HSR_SYSREG_ESR_EL1 HSR_SYSREG(3,0,c5, c2,0) > +#define HSR_SYSREG_FAR_EL1 HSR_SYSREG(3,0,c6, c0,0) > +#define HSR_SYSREG_MAIR_EL1 HSR_SYSREG(3,0,c10,c2,0) > +#define HSR_SYSREG_AMAIR_EL1 HSR_SYSREG(3,0,c10,c3,0) > +#define HSR_SYSREG_CONTEXTIDR_EL1 HSR_SYSREG(3,0,c13,c0,1) > + > +#define HSR_SYSREG_CNTPCT_EL0 HSR_SYSREG(3,3,c14,c0,0) > +#define HSR_SYSREG_CNTP_CTL_EL0 HSR_SYSREG(3,3,c14,c2,1) > +#define HSR_SYSREG_CNTP_TVAL_EL0 HSR_SYSREG(3,3,c14,c2,0) > + > + > #endif > > #endif > -- > 1.7.10.4 > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |