[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Need help in debugging partially blocked hypervisor


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
  • Date: Tue, 3 Nov 2009 07:53:25 +0100
  • Cc: "Shan, Haitao" <haitao.shan@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
  • Delivery-date: Mon, 02 Nov 2009 22:53:57 -0800
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Subject:Date:User-Agent:Cc: References:In-Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Message-Id; b=hXQM1MAzqXzi3fFc58VtE+cK9QMOJeerYCynTDz3Ge3tIPlzc/jeuvLM 0a1imo7dYubtB9Av4idTTC8M+OQvA5THq+xhoRh9xvWicYil0OFLY84/J CA0JM6jrJ1NvPA5hPj7mUWjnpHbVEiFNc4VX6uepTFpOiaVTYRuV1cjQ6 WvzpFTgtZfaZ2oniPQDZIxOeEFj0CJ3ghI5SSmgPiRgrMtBH8jyS7N12i ZFlyVkFhUxvD5nRbVCErb1HKPpjkJ;
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> 
> > Very detailed explanation indeed. What you described is the same as I saw 
> > months ago.
> > But unluckily, I do not know the root cause yet. It seems to me that 
> > unmasking of PMI in local APIC will immediately generate a new NMI in the 
> > system if one of the enabled counter is zero at that time. 
> > That is why I was asking you whether you could try to set that counter to 
> > some value other than zero (for example, 0x1) before unmasking(in your 
> > case, it is Fixed Counter 1 0x30a) PMI in vpmu_do_interrupt and see whether 
> > it helped.
> 
> OK I will try to set the counter after reading the 0 value to 1.
> But some things remain fully unclear ...

Hi Haitao,

> 1> Setting the counter to non-zero before unmasking PMI in vpmu_do_interrupt;

I tried your first approach.

1. I added

   rdmsrl(CounterX, msr_content)
   if (msr_content == 0)
   {
       HVMTRACE_3D(HAHN_TR2, ...);     // A tracer to see this.
       wrmsrl(ConterX, 0x1)
   }

   directly behind the line of reading the MSR_CORE_PERF_GLOBAL_STATUS.
   In the xentrace output I found some tracers where counters were zero
   but I couldn't reproduce the hanging behavior!

   The interesting thing here was, that MSR_CORE_PERF_GLOBAL_STATUS
   contained always zero (4. Step) after resetting it with writing
   MSR_CORE_PERF_GLOBAL_OVF_CTRL (3. Step).
   This was differently seen in my first mail!

2. I added the code above behind the second read (for test) of
   MSR_CORE_PERF_GLOBAL_STATUS (around 6. and 7. Step).
   Now I could see some of these tracers but no hanging behavior!
   In this case I could see the same behavior of the
   MSR_CORE_PERF_GLOBAL_STATUS like in my first mail.

The conclusion is, that this seems to be a workaround for the endless
NMI loop. PMI's are a very rarely event and this should not raise a performance
problem.

I didn't try your second approach
> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical PMI* when 
> guest vcpu unmasks virtual PMI.
but I have some question.

- What if the 'physical PMI' is not unmasked in vpmu_do_interrupt and a 
watchdog NMI would
  occur before the domU unmasks it?
- Is it possible that after handling the NMI (and not unmasking) another
  domU got running on this CPU and therefore PMI's got lost?

But the real cause of the problem is unknown. As said I saw this only on
Nehalem. Maybe there is a problem together with the hardware? Perhaps your
hardware colleagues know something more ;-)

Thanks
Dietmar

> 
> > 
> > When I met this problem, I remember that I tried two approaches:
> > 1> Setting the counter to non-zero before unmasking PMI in 
> > vpmu_do_interrupt;
> > 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical PMI* 
> > when guest vcpu unmasks virtual PMI.
> > I remember that approach 2 can fix this issue. But I do not remember the 
> > result of approach 1, since I met this about one year ago.
> > It is my understanding that approach 2 is quite same as approach 1, since 
> > normally guest will set the counter to some negative value (for example, 
> > -100000) before unmasking virtual PMI.
> > However, approach 2 looks cleaner and more reasonable.
> > 
> > Can you have a try and let me know the result? If both can not work, there 
> > might be some problems that I have not met before.
> > 
> > BTW: Sorry, I did not see your patch to enable NHM vpmu before. So, there 
> > is no need for me to work on that now. :)
> > 
> > Haitao
> > 
> > 
> > Dietmar Hahn wrote:
> > > Hi Haitao,
> > > 
> > >> Can I know how you enabled vPMU on Nehalem? This is not supported in
> > >> current Xen.
> > > 
> > > http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> > > 
> > >> 
> > >> Concerning vpmu support, I totally agree that we can disable this
> > >> feature by default. If anyone really wants to use it, he can use boot
> > >> options to turn it on.
> > > 
> > > Yes, that's OK for me.
> > > 
> > >> I am preparing a patch for that. And I will
> > >> send a patch to enable NHM vpmu together.
> > >> 
> > >> For the problem that Dietmar met, I think I once met this before. Can
> > >> you add some code in vpmu_do_interrupt that sets the counter you are
> > >> using to a value other than zero? Please let me know if that can
> > >> help. 
> > > 
> > > I don't set the counter to zero. I use 0-val to set the counter.
> > > Actually I testet on Nehalem with
> > > - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000
> > > - Fixed counter #1 (0x30a) and val=1100000
> > > The thing is that in normal case the overflows of both counters appear
> > > nearly at the same time.
> > > As described I added some extra tracer for xentrace in
> > > core2_vpmu_do_interrupt() so the code looks like:
> > > 
> > >     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 1. Step
> > >   {
> > >           uint32_t HAHN_l, HAHN_h;
> > >           HAHN_l = (uint32_t) msr_content;
> > >           HAHN_h = (uint32_t) (msr_content >> 32);
> > >           HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. Step
> > >   }
> > >     if ( !msr_content )
> > >         return 0;
> > >     core2_vpmu_cxt->global_ovf_status |= msr_content;
> > >     msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count())
> > >     - 1); wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   -> 3.
> > > Step 
> > > 
> > >     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 4. Step
> > >   {
> > >         uint32_t HAHN_l, HAHN_h;
> > >         HAHN_l = (uint32_t) msr_content;
> > >         HAHN_h = (uint32_t) (msr_content >> 32);
> > >         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    -> 5. Step
> > > 
> > >         rdmsrl(0xc3, msr_content);                        -> 6. Step
> > >         General counter #2 HAHN_l = (uint32_t) msr_content;
> > >         HAHN_h = (uint32_t) (msr_content >> 32);
> > >         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
> > >         rdmsrl(0x30a, msr_content);                       -> 7. Step
> > >         Fixed counter #1 HAHN_l = (uint32_t) msr_content;
> > >         HAHN_h = (uint32_t) (msr_content >> 32);
> > >         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);
> > >   }
> > > 
> > > With these tracers I got the following output:
> > > 
> > > Last good NMI:
> > > Both counter cause the NMI. Resetting works OK.
> > > The counter itself were running further.
> > > 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ]  rdmsrl(0xc3) 
> > > -> #2 general counter 
> > > 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ]  rdmsrl(0x30a)
> > > -> #1 fixed counter 
> > > 
> > > NMI from where things goes wrong:
> > > Both counter cause the NMI. Resetting works NOT correct, only for the
> > > general counter!
> > > The general counter (caused the NMI) seems to be stopped!
> > > 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]  rdmsrl(0xc3) 
> > > -> #2 general counter 
> > > 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]  rdmsrl(0x30a)
> > > -> #1 fixed counter 
> > > 
> > > Wrong NMI:
> > > Only the fixed counter causes the NMI (which was not resetted during
> > > NMI handling above!) Both counter seems to be stopped!
> > > 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]  rdmsrl(0xc3) 
> > > -> #2 general counter 
> > > 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]  rdmsrl(0x30a)
> > > -> #1 fixed counter 
> > > 
> > > And this state remains forever!
> > > I hope my explanations are understandable ;-)
> > > 
> > > Until now I can see this behavior only on a Nehalem processor.
> > > 
> > > Thanks.
> > > Dietmar
> > > 
> > >> 
> > >> Best Regards
> > >> Shan Haitao
> > >> 
> > >> 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
> > >>> On 30/10/2009 12:20, "Dietmar Hahn" <dietmar.hahn@xxxxxxxxxxxxxx>
> > >>> wrote: 
> > >>> 
> > >>>> I searched the intel processor spec but couldn't find any help.
> > >>>> So my questions is, what is wrong here?
> > >>>> Can anybody with more knowledge point me in the right direction,
> > >>>> what can I still do to find the real cause of this?
> > >>> 
> > >>> You should probably Cc one of the Intel guys who implemented this
> > >>> stuff -- I've added Haitao Shan. 
> > >>> 
> > >>> Meanwhile I'd be interested to know whether things work okay for
> > >>> you, minus performance counters and the hypervisor hang, if you
> > >>> return immediately from vpmu_initialise(). Really at minimum we
> > >>> need such a fix, perhaps with a boot paremeter to re-enable the
> > >>> feature, for 3.4.2 release; allowing guests to hose the hypervisor
> > >>> like this is of course not on. 
> > >>> 
> > >>>  -- Keir
> > 
> 
-- 
Dietmar Hahn
TSP ES&S SWE OS                                Telephone: +49 (0) 89 636 40274
Fujitsu Technology Solutions                Email: dietmar.hahn@xxxxxxxxxxxxxx
Otto-Hahn-Ring 6                              Internet:  http://ts.fujitsu.com
D-81739 München                    Company details:ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.