[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Need help in debugging partially blocked hypervisor


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
  • Date: Tue, 3 Nov 2009 09:24:16 +0100
  • Cc: "Shan, Haitao" <haitao.shan@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
  • Delivery-date: Tue, 03 Nov 2009 00:24:53 -0800
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Subject:Date:User-Agent:Cc: References:In-Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Message-Id; b=KDcVd2dgJd2hez9osdxpilFpaJ3qn3Qt/WjwcJmjgcmafKww4BoeHLBs bbyljT13q7oDRlYFX+s7trFOMXGV7H+bsdte/HNLdprAR1ACZlJ8ZqOOc NO62srFXhYqe+mjvZRdhGSqN0EZ5ONT21XXzF/pwXDVC9Nobv7pl3GHF5 5E0pB5EAJlXRd3WqGwsjvk6eZr1hRu8yPA8Hw31WsB4Kvbks41Mz77H5S 5kh8ob7HYakXaNGRFUufNygKDUFgC;
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> I suspect the guest will reproduce this PMI loop if guest behaves as you said 
> in this email. But as far as I know, VTune and oprofile do not behave like 
> that.
> Of course, this approach is still like workaround (unless I get comfirm that 
> HW requires to do so). This approach is preferrable because it does not 
> change the contents of MSRs. Thus, we have no impact on guest software that 
> does rely on reading the correct value from HW. Approach 1 existed just 
> because we knew that in event-based sampling, counter value on receiving PMI 
> was not used by OProfile/VTune at all and it was safe to set the counter to 
> some non-zero value.
> 
> Haitao
>

OK, then will you send a patch? 
Dietmar.
 
> 
> Dietmar Hahn wrote:
> > Please see below.
> > 
> >> See my comments embedded. :)
> >> 
> >> Haitao
> >> 
> >> 
> >> Dietmar Hahn wrote:
> >>> The conclusion is, that this seems to be a workaround for the
> >>> endless NMI loop. PMI's are a very rarely event and this should not
> >>> raise a performance problem.
> >> I totally agree that this is only a workaround for approach 1.
> >> 
> >>> 
> >>> I didn't try your second approach
> >>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical
> >>>> PMI* when guest vcpu unmasks virtual PMI. but I have some question.
> >>> 
> >>> - What if the 'physical PMI' is not unmasked in vpmu_do_interrupt
> >>>   and a watchdog NMI would occur before the domU unmasks it?
> >> I think the second NMI will be lost.
> >> 
> >>> - Is it possible that after handling the NMI (and not unmasking)
> >>>   another domU got running on this CPU and therefore PMI's got lost?
> >> LVTPC entry in physical local APIC is save/restored by Xen on VCPU
> >> switches. So unmasking (or not) of PMI of one vcpu should have no
> >> impact on another vcpu. When developing vPMU, I treated as vPMU
> >> context both PMU MSRs and LVTPC entry in local APIC. vPMU context is
> >> save/restored on physical HW when vcpus is scheduled, either in an
> >> active save/restore manner or a lazy one (depending on the PMU usage
> >> at the time of switch).      
> >> 
> >>> 
> >>> But the real cause of the problem is unknown. As said I saw this
> >>> only on Nehalem. Maybe there is a problem together with the
> >>> hardware? Perhaps your hardware colleagues know something more ;-)
> >> When I found this problem, I just thought it might be a corner case
> >> that only happens on my box (of course, I only see this in NHM,
> >> too).  
> >> I will try to pin HW guy to see if any explanation, since it is
> >> proven to be a general problem on NHM. 
> >> 
> >> But before everything is clear, I think approach 2 is a better
> >> solution now. 
> > 
> > What would be the effect if the guest unmasks the PMI (which leads to
> > unmasking the 'physical PMI') but doesn't reset the counter to a
> > value != 0? Is the guest able to produce the nmi endless loop? 
> > 
> > Dietmar.
> > 
> >> 
> >>> 
> >>> Thanks
> >>> Dietmar
> >>> 
> >>>> 
> >>>>> 
> >>>>> When I met this problem, I remember that I tried two approaches:
> >>>>> 1> Setting the counter to non-zero before unmasking PMI in
> >>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from vpmu_do_interrupt
> >>>>> and unmask *physical PMI* when guest vcpu unmasks virtual PMI.
> >>>>> I remember that approach 2 can fix this issue. But I do not
> >>>>> remember the result of approach 1, since I met this about one
> >>>>> year ago. It is my understanding that approach 2 is quite same as
> >>>>> approach 1, since normally guest will set the counter to some
> >>>>> negative value (for example, -100000) before unmasking virtual
> >>>>> PMI. 
> >>>>> However, approach 2 looks cleaner and more reasonable.
> >>>>> 
> >>>>> Can you have a try and let me know the result? If both can not
> >>>>> work, there might be some problems that I have not met before.
> >>>>> 
> >>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before.
> >>>>> So, there is no need for me to work on that now. :)
> >>>>> 
> >>>>> Haitao
> >>>>> 
> >>>>> 
> >>>>> Dietmar Hahn wrote:
> >>>>>> Hi Haitao,
> >>>>>> 
> >>>>>>> Can I know how you enabled vPMU on Nehalem? This is not
> >>>>>>> supported in current Xen.
> >>>>>> 
> >>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> >>>>>> 
> >>>>>>> 
> >>>>>>> Concerning vpmu support, I totally agree that we can disable
> >>>>>>> this feature by default. If anyone really wants to use it, he
> >>>>>>> can use boot options to turn it on.
> >>>>>> 
> >>>>>> Yes, that's OK for me.
> >>>>>> 
> >>>>>>> I am preparing a patch for that. And I will
> >>>>>>> send a patch to enable NHM vpmu together.
> >>>>>>> 
> >>>>>>> For the problem that Dietmar met, I think I once met this
> >>>>>>> before. Can you add some code in vpmu_do_interrupt that sets
> >>>>>>> the counter you are using to a value other than zero? Please
> >>>>>>> let me know if that can help.
> >>>>>> 
> >>>>>> I don't set the counter to zero. I use 0-val to set the counter.
> >>>>>> Actually I testet on Nehalem with
> >>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and
> >>>>>> val=1100000 
> >>>>>> - Fixed counter #1 (0x30a) and val=1100000
> >>>>>> The thing is that in normal case the overflows of both counters
> >>>>>> appear nearly at the same time. As described I added some extra
> >>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code
> >>>>>> looks like: 
> >>>>>> 
> >>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 1.
> >>>>>>                Step    { uint32_t HAHN_l, HAHN_h;
> >>>>>>                HAHN_l = (uint32_t) msr_content;
> >>>>>>                HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>                HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. 
> >>>>>> Step
> >>>>>>         }     if ( !msr_content ) return 0;
> >>>>>>     core2_vpmu_cxt->global_ovf_status |= msr_content;
> >>>>>>     msr_content = 0xC000000700000000 | ((1 <<
> >>>>>>     core2_get_pmc_count()) - 1);
> >>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   -> 3. Step
> >>>>>> 
> >>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 4.
> >>>>>>         Step   { uint32_t HAHN_l, HAHN_h;
> >>>>>>         HAHN_l = (uint32_t) msr_content;
> >>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    -> 5.
> >>>>>> Step 
> >>>>>> 
> >>>>>>         rdmsrl(0xc3, msr_content);                        -> 6.
> >>>>>>         Step General counter #2 HAHN_l = (uint32_t) msr_content;
> >>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
> >>>>>>         rdmsrl(0x30a, msr_content);                       -> 7.
> >>>>>>         Step Fixed counter #1 HAHN_l = (uint32_t) msr_content;
> >>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);       }
> >>>>>> 
> >>>>>> With these tracers I got the following output:
> >>>>>> 
> >>>>>> Last good NMI:
> >>>>>> Both counter cause the NMI. Resetting works OK.
> >>>>>> The counter itself were running further.
> >>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ]
> >>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ]
> >>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>> 
> >>>>>> NMI from where things goes wrong:
> >>>>>> Both counter cause the NMI. Resetting works NOT correct, only for
> >>>>>> the general counter! The general counter (caused the NMI) seems
> >>>>>> to be stopped! 
> >>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]
> >>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]
> >>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>> 
> >>>>>> Wrong NMI:
> >>>>>> Only the fixed counter causes the NMI (which was not resetted
> >>>>>> during NMI handling above!) Both counter seems to be stopped!
> >>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]
> >>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]
> >>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>> 
> >>>>>> And this state remains forever!
> >>>>>> I hope my explanations are understandable ;-)
> >>>>>> 
> >>>>>> Until now I can see this behavior only on a Nehalem processor.
> >>>>>> 
> >>>>>> Thanks.
> >>>>>> Dietmar
> >>>>>> 
> >>>>>>> 
> >>>>>>> Best Regards
> >>>>>>> Shan Haitao
> >>>>>>> 
> >>>>>>> 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
> >>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn"
> >>>>>>>> <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> >>>>>>>> 
> >>>>>>>>> I searched the intel processor spec but couldn't find any
> >>>>>>>>> help. So my questions is, what is wrong here?
> >>>>>>>>> Can anybody with more knowledge point me in the right
> >>>>>>>>> direction, what can I still do to find the real cause of this?
> >>>>>>>> 
> >>>>>>>> You should probably Cc one of the Intel guys who implemented
> >>>>>>>> this stuff -- I've added Haitao Shan.
> >>>>>>>> 
> >>>>>>>> Meanwhile I'd be interested to know whether things work okay
> >>>>>>>> for you, minus performance counters and the hypervisor hang,
> >>>>>>>> if you return immediately from vpmu_initialise(). Really at
> >>>>>>>> minimum we need such a fix, perhaps with a boot paremeter to
> >>>>>>>> re-enable the feature, for 3.4.2 release; allowing guests to
> >>>>>>>> hose the hypervisor like this is of course not on.
> >>>>>>>> 
> >>>>>>>>  -- Keir
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 
> 
-- 
Dietmar Hahn
TSP ES&S SWE OS                                Telephone: +49 (0) 89 636 40274
Fujitsu Technology Solutions                Email: dietmar.hahn@xxxxxxxxxxxxxx
Otto-Hahn-Ring 6                              Internet:  http://ts.fujitsu.com
D-81739 München                    Company details:ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.