Xen project Mailing List

Re: [Xen-devel] x86/AMD: Nested VM failed to boot L2 guest due to setting/clearing CR0.CD bit

From: Suravee Suthikulanit <suravee.suthikulpanit@xxxxxxx>

Date: Tue, 6 Aug 2013 12:55:55 -0500

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Christoph Egger <chegger@xxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>

Delivery-date: Tue, 06 Aug 2013 17:56:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 8/6/2013 2:12 AM, Jan Beulich wrote:

On 06.08.13 at 04:27, Suravee Suthikulanit <suravee.suthikulpanit@xxxxxxx>

wrote:

Hi All,

While I was testing nested VM on with latest Xen on AMD system, I am running 
into issue where
the L2 guest (Linux) seems to stuck right after loading the kernel. When 
using the "xl debug-keys d" to dump registers,
the L2 guest RIP always at the instruction which tries to write the CR0.CD 
bit.  Besides, once starting L2 guest and it
got stuck, L0 Dom0 becomes very slow until I kill the L2 guest.

After looking into the hvm code for handling CR0 (i.e. 
xen/arch/x86/hvm/hvm.c: hvm_set_cr0()),
I see that the code tries to issue local cache flush on all the cores when 
the L2 guest is
setting the CR0.CD bit. (Please see the code snippet below.)

         if ( (value & X86_CR0_CD) && !(value & X86_CR0_NW) )
         {
             /* Entering no fill cache mode. */
             spin_lock(&v->domain->arch.hvm_domain.uc_lock);
             v->arch.hvm_vcpu.cache_mode = NO_FILL_CACHE_MODE;

             if ( !v->domain->arch.hvm_domain.is_in_uc_mode )
             {
                 /* Flush physical caches. */
---> HERE       on_each_cpu(local_flush_cache, NULL, 1);
                 hvm_set_uc_mode(v, 1);
             }
             spin_unlock(&v->domain->arch.hvm_domain.uc_lock);
         }

When I try to comment out the line, the issue goes away.  Is this line 
necessary?
Why do we need to flush all the cpu cores when the CR0.CD bit only applies
to a particular core?

Doing the flush only on the local CPU would imply that once the
affected vCPU migrates to another pCPU, flushing would _then_
need to be done there too. Tracking this would clearly add
complexity here.

Furthermore, the "UC mode" is being entered on the domain as a
whole, i.e. all the pCPU-s that the domain is actively running one
would need immediate flushing, and all pCPU-s any of the vCPU-s
would migrate to subsequently would need deferred
flushing.

That said, I still can't see how the flushing here would have this
dramatic an effect: It's a one-time thing, when UC mode first gets
entered by a domain. So unless CR0.CD gets flipped back and
forth by a guest, there shouldn't be more than one flush (or there's
a logic error somewhere else).

Finally, the need for that code as a whole is under question in the
context of XSA-60. I would certainly favor (at least on the SVM
side) to handle CR0.CD per vCPU instead of per domain, as long
as there are no requirements that CR0.CD be set consistently
across multiple CPUs (e.g. within a package; on Intel CPUs I'm
being told it's a hard requirement to be consistent at least
between sibling hyperthreads, meaning that we can't rip out the
current logic altogether in favor of a CR0.CD based solution).

Jan

Somehow the problem went away when I update the hypervisor in both L0
and L1, and I can no longer reproduce the issue. At one point when I was
trying to debug the issue using "hvm_debug", I was seeing the messages where the CD bit was flipped
back and forth.

(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = 8005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = c005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = 8005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = c005003b

Thanks for details. I'll keep monitoring this in the future.

Suravee

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.