[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 2/2] x86/xpti: Don't leak TSS-adjacent percpu data via Meltdown
On 29/07/2019 14:51, Jan Beulich wrote: > On 26.07.2019 22:32, Andrew Cooper wrote: >> The XPTI work restricted the visibility of most of memory, but missed a few >> aspects when it came to the TSS. > None of these were "missed" afair - we'd been aware, and accepted things > to be the way they are now for the first step. Remember that at the time > XPTI was called "XPTI light", in anticipation for this to just be a > temporary solution. Did the term "XPTI light" survive past the first RFC posting? Sure - we did things in an incremental way because it was a technically complex change and Meltdown was out in the wild at the time. However, I would have fixed this at the same time as .entry.text if I had noticed, because the purpose of that series was identical to this series - avoid leaking things we don't absolutely need to leak. >> Given that the TSS is just an object in percpu data, the 4k mapping for it >> created in setup_cpu_root_pgt() maps adjacent percpu data, making it all >> leakable via Meltdown, even when XPTI is in use. >> >> Furthermore, no care is taken to check that the TSS doesn't cross a page >> boundary. As it turns out, struct tss_struct is aligned on its size which >> does prevent it straddling a page boundary, but this will cease to be true >> once CET and Shadow Stack support is added to Xen. > Please can you point me at the CET aspect in documentation here? Aiui > it's only task switches which are affected, and hence only 32-bit TSSes > which would grow (and even then not enough to exceed 128 bytes). For > the purposes 64-bit has there are MSRs to load SSP from. Ah - it was v1 of the CET spec. I see v3 no longer has the shadow stack pointer in the TSS. I'll drop this part of the message. >> --- a/xen/include/asm-x86/processor.h >> +++ b/xen/include/asm-x86/processor.h >> @@ -411,7 +411,7 @@ static always_inline void __mwait(unsigned long eax, >> unsigned long ecx) >> #define IOBMP_BYTES 8192 >> #define IOBMP_INVALID_OFFSET 0x8000 >> >> -struct __packed __cacheline_aligned tss_struct { >> +struct __packed tss_struct { >> uint32_t :32; >> uint64_t rsp0, rsp1, rsp2; >> uint64_t :64; >> @@ -425,6 +425,7 @@ struct __packed __cacheline_aligned tss_struct { >> /* Pads the TSS to be cacheline-aligned (total size is 0x80). */ >> uint8_t __cacheline_filler[24]; >> }; >> +DECLARE_PER_CPU(struct tss_struct, init_tss); > Taking patch 1 this expands to > > __DEFINE_PER_CPU(__section(".bss.percpu.page_aligned") \ > __aligned(PAGE_SIZE), struct tss_struct, _init_tss); > > and then > > __section(".bss.percpu.page_aligned") __aligned(PAGE_SIZE) > __typeof__(struct tss_struct) per_cpu__init_tss; > > which is not what you want: You have an object of size > sizeof(struct tss_struct) which is PAGE_SIZE aligned. Afaict you > therefore still leak everything that follows in the same page. What data might this be? Every object put into this section is suitably aligned, so nothing will sit in the slack between the TSS and the end of the page. > There was a reason for __cacheline_aligned's original placement, albeit I > agree that it was/is against the intention of having the struct > define an interface to the hardware (which doesn't have such an > alignment requirement). There is a hard requirement to have the first 104 bytes be physically contiguous, because on a task switch, some CPUs translate the TSS base and offset directly from there. I expect that is where the __cacheline_aligned(), being 128, comes in. However, the manual also makes it clear that this is only on a task switch, which is inapplicable for us. Finally, were we to put a structure like this on the stack (e.g. like hvm_task_switch() does with tss32), we specifically wouldn't want any unnecessary alignment. > Perhaps the solution is a two-layer approach: > > struct __aligned(PAGE_SIZE) xen_tss { > struct __packed tss_struct { > ... > }; > }; > > where the inner structure describes the hardware interface and the > containing one our own requirement(s). But personally I also > wouldn't mind putting the __aligned(PAGE_SIZE) right on struct > tss_struct, where __cacheline_aligned has been sitting. The only way that would make things more robust is if xen_tss was a union with char[4096] to extend its size. However, I think this is overkill, given the internals of DEFINE_PER_CPU_PAGE_ALIGNED() > Of course either approach goes against the idea of avoiding usage > mistakes (as pointed out by others in the v1 discussion, iirc): > There better wouldn't be a need to get the two "page aligned" > attributes in sync, i.e. the instantiation of the structure > would better enforce the requested alignment. I've not thought > through whether there's trickery to actually make this work, but > I'd hope we could at the very least detect things not being in > sync at compile time. There is a reason why I put in a linker assertion for the TSS being non-aligned. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |