[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XSA-446 relevance on Intel



On Tue, Dec 12, 2023 at 10:56:48AM +0000, Andrew Cooper wrote:
> On 12/12/2023 9:43 am, James Dingwall wrote:
> > Hi,
> >
> > We were experiencing a crash during PV domU boot on several different models
> > of hardware but all with Intel CPUs.  The Xen version was based on 
> > stable-4.15
> > at 4a4daf6bddbe8a741329df5cc8768f7dec664aed (XSA-444) with some local
> > patches.  Since updating the branch to 
> > b918c4cdc7ab2c1c9e9a9b54fa9d9c595913e028
> > (XSA-446) we have not observed the same crash.
> 
> That range covers:
> 
> 1f5f515da0f6 - iommu/amd-vi: use correct level for quarantine domain
> page tables
> b918c4cdc7ab - x86/spec-ctrl: Remove conditional IRQs-on-ness for INT
> $0x80/0x82 paths
> 
> so yeah - not much in the way of change.
> 
> > The occurrence was on 1-2% of boots and we couldn't determine a particular
> > sequence of events that would trigger it.  The kernel is based on Ubuntu's
> > 5.15.0-91 tag but we also observed the same with -85.  Due to the low
> > frequency it is possible that we simply haven't observed it again since
> > updating our Xen build.
> >
> > If I have followed the early startup this is happening shortly after 
> > detection
> > of possible CPU vulnerabilities and patching in alternative instructions.  
> > As
> > the RIP was native_irq_return_iret and XSA-446 related to interupt 
> > management
> > I wondered if it was possible that despite "Xen is not believed to be 
> > vulnerable
> > in default configurations on CPUs from other hardware vendors." there could
> > be some conditions in which an Intel CPU is affected?
> 
> In short, XSA-446 isn't plausibly related.  It's completely internal to
> Xen, with no alteration on guest state.
> 
> It is an error that Linux has ended up in native_irq_return_iret.  Linux
> cannot return to itself with an IRET instruction, and must use
> HYPERCALL_iret instead.
> 
> In recent versions of Linux, this is fixed up as about the earliest
> action a PV kernel takes, but on older versions of Linux, any
> interrupt/exception early enough on boot was fatal in this way.
> 
> 
> This part of the backtrace is odd:
> 
> [    0.398962]  ? native_iret+0x7/0x7
> [    0.398967]  ? insn_decode+0x79/0x100
> [    0.398975]  ? insn_decode+0xcf/0x100
> [    0.398980]  optimize_nops+0x68/0x150
> 
> as it's not clear how we've ended up in a case wanting to return back to
> the kernel to begin with.  However, it's most likely a pagefault, as
> optimize_nops() is making changes in arbitrary locations.
> 
> It is possible that a change in visible features has altered the
> behaviour enough not to crash, but if everything is still the same as
> far as you can tell, then it's likely just chance that you haven't seen
> it again.
> 
> This is definitely a Linux bug, so I suspect something bad has been
> backported into Ubuntu.
> 
> ~Andrew

Thanks for the response.  I had a look at the more recent kernels and managed
to backport "x86/entry,xen: Early rewrite of 
restore_regs_and_return_to_kernel()"
without too much trouble.  It may still be a coincidence that we haven't
encountered the problem but it seems to have gone away for now. 

Regards,
James



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.