[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LBR and Sapphire Rapids


  • To: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Fri, 6 Jan 2023 18:21:45 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OkfaNdfhnQi+sNg+lseArwfD8y8L6hRJ8ngS8xkww8A=; b=SHLnlE6ijQEy4HEV8TPXTPF508EsFgj8+FZsow/QnChSXSOMvC8ozWyQmadhWtJbrPU1q58lM2/yNjYQeuQvyTZAokAKEqQ4aeu3pYqwe2UfUgOESDM6BPH9d/B2X7kmqo+833UJeN1iu0c6jr5Yo4eSJ56sP3BmaFeVEhCauz2k7TZFvXhYFxa5QyblXCKUvSXmTyv88I9lk6zLB1LjGNHzcJECba1o76ISMi8BNdTp8f975DO3lX8hSsdONNi3mugePLZ37sbcNvE8/+lynMYVbb1K1teb99hAbUlxTv7lpIkVAKIROn9FkrpMbceS9LWw+vY7254KL8yDpbdDwg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lbWwwx+e9dhWZLXk1fryigzUFjERC1fbNrlwnUY3Yqdf1I2ajDLwmbUHkWr8Roht9T3kgIzyoi3UsENbkYlE5BvzBcJHKjMbaE1o7sv5p0LpsrZTxLSyYyiN20GG3kXe0WySD2Vd7Mr7BWVvrwZmQL5+bGjKDasH12RqpWjUTgWl7B/EupuriVWi5wLIKly8IK3Y4vaPB3kOWL7lrAEmpmkEi0O+kAOY+PzEVyc2D9WBKp8J90zQmnRbMmquaT8OM88Xql+EKOEc94EzDgSkAhUYXPhTVcep1j5G8PPT4KckEmYebdO5QTRH1u1EEa/jbE+wH9vSNxhecfi3dsAivg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Delivery-date: Fri, 06 Jan 2023 18:22:14 +0000
  • Ironport-data: A9a23:TiA9BaNdVfBXDwvvrR26lsFynXyQoLVcMsEvi/4bfWQNrUon1DYOm 2AZW2yFbKuCYGX9L9x3YI7i9xhUscLVy95iGwto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6jufQA+KmU4YoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGj9SuvzrRC9H5qyo42tB5gJmPpingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0v95XjBR2 PkTFBNOazuEnbiLz/HhT8A506zPLOGzVG8ekldJ6GmDSNoDGtXESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PRxvza7IA9ZidABNPL8fNCQSNoTtUGfv m/cpEzyAw0ANczZwj2Amp6prr6XzHKqA9xOfFG+3qJykEfK51chMU0bVFa+iMaiiXCBSt0Kf iT4/QJr98De7neDXtT7GhG1vnOAlhodQMZLVf037hmXzajZ6BrfAXILJhZDYtE7sM49RRQxy 0SE2djuAFRSXKa9THuc8vKYqGi0MC1Nd2saP3dYE00C/sXpp5w1glTXVNF/HaWpj9rzXzbt3 zSNqyt4jLIW5SIW65iGEZn8q2rEjvD0osQdv207gkrNAttFWbOY
  • Ironport-hdrordr: A9a23:tH2Eb6peOvEG5oUjPzT5SEIaV5oReYIsimQD101hICG9Ffb1qy nOppsmPHrP4wr5N0tPpTntAsi9qBHnhP1ICPgqXYtKNTOO0AHEEGgI1/qB/9SPIVyYysdtkY tmbqhiGJnRIDFB/KDHCdCDYrMdKQ+8gcSVuds=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHZIfu71M1ca5PaBU6iVuTRvRgdgw==
  • Thread-topic: LBR and Sapphire Rapids

Hello,

Testing has identified that VMs on SPR still crash when trying to turn
on LBR, and this is imminently going to cease being a "future" problem.

There is a series out about this, but there is some general confusion
creating mistakes, so I want to try and lay things out coherently here
in one go.

Right now (for Intel), the PDCM CPUID bit is hidden, HVM will #GP for
reads, while PV blindly returns 0.

The first time a vCPU tries to enable MSR_DBG_CTRL.LBR, we either set up
the MSR load/save lists for the LBR MSRs, and de-intercept them, or we
crash the VM if we can't figure out what to do.

LBR MSRs are never preserved on migrate.  A VM that is migrated will (at
best) only see corruption of its data.  If it migrates between otherwise
identical systems that have differing hyperthread settings, it may find
that the LBR stack is a different size.  If it migrates to a system with
a different LBR format, then pretty much everything will explode.


Longterm, we want to support Arch LBR, but we're a long way off being
able to do that.  In the meantime, we need to make VMs not crash. 
IceLake (server at least, not sure about client) has both Arch LBR and
model specific LBR.  Sapphire Rapids does not have model specific LBR.

Also, we cannot advertise the PCDM bit until we've got MSRs properly
accounted for in the migration safety checks (which is still a work in
progress).

From a "not crashing on migrate" point of view, migration need to be
blocked in any case where the LBR format changes (and other cases too). 
Which also means that by default, VMs want to be told "no model-specific
LBR".  But for backwards compatibility we also need a way for the user
to say "please let it still use model specific LBR", and this can't be
an architectural CPUID bit  (But I think it can be expressed as a
combination of PDCM=1,format!=0x3f,ARCH_LBR=0)


But it still doesn't help with SPR today.

On SPR, MSR_DBG_CTRL.LBR is a write-discard bit.  There really are no
model specific LBRs, so we should emulate it as write discard too.  More
generally, I think we should apply that to any system were we don't know
the model-specific indices.

I think this will be sufficient to avoid crashing guests on SPR.  Any
software actually expecting to use model specific LBR would need a model
table anyway just like Xen has, and will not get it updated with SPR's
model number, so for the (more) common case of not having migrated,
things should turn off cleanly.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.