Xen project Mailing List

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

To: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: "Shan, Haitao" <haitao.shan@xxxxxxxxx>

Date: Tue, 3 Nov 2009 17:00:02 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>

Delivery-date: Tue, 03 Nov 2009 01:00:54 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AcpcXw03lOQ4epb6TvaH/whKz2gW/QABHQdg

Thread-topic: [Xen-devel] Need help in debugging partially blocked hypervisor

Hi, Dietmar, Please review the attached patch. Any comments? Haitao Dietmar Hahn wrote: >> I suspect the guest will reproduce this PMI loop if guest behaves as >> you said in this email. But as far as I know, VTune and oprofile do >> not behave like that. >> Of course, this approach is still like workaround (unless I get >> comfirm that HW requires to do so). This approach is preferrable >> because it does not change the contents of MSRs. Thus, we have no >> impact on guest software that does rely on reading the correct value >> from HW. Approach 1 existed just because we knew that in event-based >> sampling, counter value on receiving PMI was not used by >> OProfile/VTune at all and it was safe to set the counter to some >> non-zero value. >> >> Haitao >> > > OK, then will you send a patch? > Dietmar. > >> >> Dietmar Hahn wrote: >>> Please see below. >>> >>>> See my comments embedded. :) >>>> >>>> Haitao >>>> >>>> >>>> Dietmar Hahn wrote: >>>>> The conclusion is, that this seems to be a workaround for the >>>>> endless NMI loop. PMI's are a very rarely event and this should >>>>> not raise a performance problem. >>>> I totally agree that this is only a workaround for approach 1. >>>> >>>>> >>>>> I didn't try your second approach >>>>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask >>>>>> *physical PMI* when guest vcpu unmasks virtual PMI. but I have >>>>>> some question. >>>>> >>>>> - What if the 'physical PMI' is not unmasked in vpmu_do_interrupt >>>>> and a watchdog NMI would occur before the domU unmasks it? >>>> I think the second NMI will be lost. >>>> >>>>> - Is it possible that after handling the NMI (and not unmasking) >>>>> another domU got running on this CPU and therefore PMI's got >>>>> lost? >>>> LVTPC entry in physical local APIC is save/restored by Xen on VCPU >>>> switches. So unmasking (or not) of PMI of one vcpu should have no >>>> impact on another vcpu. When developing vPMU, I treated as vPMU >>>> context both PMU MSRs and LVTPC entry in local APIC. vPMU context >>>> is save/restored on physical HW when vcpus is scheduled, either in >>>> an active save/restore manner or a lazy one (depending on the PMU >>>> usage at the time of switch). >>>> >>>>> >>>>> But the real cause of the problem is unknown. As said I saw this >>>>> only on Nehalem. Maybe there is a problem together with the >>>>> hardware? Perhaps your hardware colleagues know something more ;-) >>>> When I found this problem, I just thought it might be a corner case >>>> that only happens on my box (of course, I only see this in NHM, >>>> too). I will try to pin HW guy to see if any explanation, since it >>>> is proven to be a general problem on NHM. >>>> >>>> But before everything is clear, I think approach 2 is a better >>>> solution now. >>> >>> What would be the effect if the guest unmasks the PMI (which leads >>> to unmasking the 'physical PMI') but doesn't reset the counter to a >>> value != 0? Is the guest able to produce the nmi endless loop? >>> >>> Dietmar. >>> >>>> >>>>> >>>>> Thanks >>>>> Dietmar >>>>> >>>>>> >>>>>>> >>>>>>> When I met this problem, I remember that I tried two approaches: >>>>>>> 1> Setting the counter to non-zero before unmasking PMI in >>>>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from >>>>>>> vpmu_do_interrupt and unmask *physical PMI* when guest vcpu >>>>>>> unmasks virtual PMI. >>>>>>> I remember that approach 2 can fix this issue. But I do not >>>>>>> remember the result of approach 1, since I met this about one >>>>>>> year ago. It is my understanding that approach 2 is quite same >>>>>>> as approach 1, since normally guest will set the counter to some >>>>>>> negative value (for example, -100000) before unmasking virtual >>>>>>> PMI. However, approach 2 looks cleaner and more reasonable. >>>>>>> >>>>>>> Can you have a try and let me know the result? If both can not >>>>>>> work, there might be some problems that I have not met before. >>>>>>> >>>>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. >>>>>>> So, there is no need for me to work on that now. :) >>>>>>> >>>>>>> Haitao >>>>>>> >>>>>>> >>>>>>> Dietmar Hahn wrote: >>>>>>>> Hi Haitao, >>>>>>>> >>>>>>>>> Can I know how you enabled vPMU on Nehalem? This is not >>>>>>>>> supported in current Xen. >>>>>>>> >>>>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html >>>>>>>> >>>>>>>>> >>>>>>>>> Concerning vpmu support, I totally agree that we can disable >>>>>>>>> this feature by default. If anyone really wants to use it, he >>>>>>>>> can use boot options to turn it on. >>>>>>>> >>>>>>>> Yes, that's OK for me. >>>>>>>> >>>>>>>>> I am preparing a patch for that. And I will >>>>>>>>> send a patch to enable NHM vpmu together. >>>>>>>>> >>>>>>>>> For the problem that Dietmar met, I think I once met this >>>>>>>>> before. Can you add some code in vpmu_do_interrupt that sets >>>>>>>>> the counter you are using to a value other than zero? Please >>>>>>>>> let me know if that can help. >>>>>>>> >>>>>>>> I don't set the counter to zero. I use 0-val to set the >>>>>>>> counter. Actually I testet on Nehalem with >>>>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and >>>>>>>> val=1100000 >>>>>>>> - Fixed counter #1 (0x30a) and val=1100000 >>>>>>>> The thing is that in normal case the overflows of both counters >>>>>>>> appear nearly at the same time. As described I added some extra >>>>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code >>>>>>>> looks like: >>>>>>>> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. >>>>>>>> Step { uint32_t HAHN_l, HAHN_h; >>>>>>>> HAHN_l = (uint32_t) msr_content; >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. >>>>>>>> Step >>>>>>>> } if ( !msr_content ) return 0; >>>>>>>> core2_vpmu_cxt->global_ovf_status |= msr_content; >>>>>>>> msr_content = 0xC000000700000000 | ((1 << >>>>>>>> core2_get_pmc_count()) - 1); >>>>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. >>>>>>>> Step >>>>>>>> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. >>>>>>>> Step { uint32_t HAHN_l, HAHN_h; >>>>>>>> HAHN_l = (uint32_t) msr_content; >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> >>>>>>>> 5. Step >>>>>>>> >>>>>>>> rdmsrl(0xc3, msr_content); -> 6. >>>>>>>> Step General counter #2 HAHN_l = (uint32_t) >>>>>>>> msr_content; HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); >>>>>>>> rdmsrl(0x30a, msr_content); -> 7. >>>>>>>> Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } >>>>>>>> >>>>>>>> With these tracers I got the following output: >>>>>>>> >>>>>>>> Last good NMI: >>>>>>>> Both counter cause the NMI. Resetting works OK. >>>>>>>> The counter itself were running further. >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] >>>>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>>>> >>>>>>>> NMI from where things goes wrong: >>>>>>>> Both counter cause the NMI. Resetting works NOT correct, only >>>>>>>> for the general counter! The general counter (caused the NMI) >>>>>>>> seems to be stopped! >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] >>>>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>>>> >>>>>>>> Wrong NMI: >>>>>>>> Only the fixed counter causes the NMI (which was not resetted >>>>>>>> during NMI handling above!) Both counter seems to be stopped! >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] >>>>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>>>> >>>>>>>> And this state remains forever! >>>>>>>> I hope my explanations are understandable ;-) >>>>>>>> >>>>>>>> Until now I can see this behavior only on a Nehalem processor. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> Dietmar >>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards >>>>>>>>> Shan Haitao >>>>>>>>> >>>>>>>>> 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>: >>>>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn" >>>>>>>>>> <dietmar.hahn@xxxxxxxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>>> I searched the intel processor spec but couldn't find any >>>>>>>>>>> help. So my questions is, what is wrong here? >>>>>>>>>>> Can anybody with more knowledge point me in the right >>>>>>>>>>> direction, what can I still do to find the real cause of >>>>>>>>>>> this? >>>>>>>>>> >>>>>>>>>> You should probably Cc one of the Intel guys who implemented >>>>>>>>>> this stuff -- I've added Haitao Shan. >>>>>>>>>> >>>>>>>>>> Meanwhile I'd be interested to know whether things work okay >>>>>>>>>> for you, minus performance counters and the hypervisor hang, >>>>>>>>>> if you return immediately from vpmu_initialise(). Really at >>>>>>>>>> minimum we need such a fix, perhaps with a boot paremeter to >>>>>>>>>> re-enable the feature, for 3.4.2 release; allowing guests to >>>>>>>>>> hose the hypervisor like this is of course not on. >>>>>>>>>> >>>>>>>>>> -- Keir >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>>> http://lists.xensource.com/xen-devel >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel

Attachment: unmask_vPMI.patch
Description: unmask_vPMI.patch

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.