Xen project Mailing List

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

To: xen-devel@xxxxxxxxxxxxxxxxxxx, haitao.shan@xxxxxxxxx

From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>

Date: Mon, 2 Nov 2009 10:11:25 +0100

Cc: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>

Delivery-date: Mon, 02 Nov 2009 01:11:54 -0800

Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Subject:Date:User-Agent:Cc: References:In-Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Message-Id; b=uE+7edpQv6pG+GURMSusc70UUNc1yLOMa4GBiSd3OJY1RzILI12g+smJ KHWa8yytwEfLoBqU2fErm00nAd8a677hiy4fp57cZuFUw/c+yMoNMBX6P IkJtmzLpG6abnRRQ3WyCUAKQqOoAQfD7oHp8Lr3WPLPlJUlvaTBXcjyD2 HMFDEGWsPvHh/a5ye5a7BI4gGsl22xhb+MreJgeetiYdSutlHpu7JhW6y ENf7yU7QMnXgLv1jZY/YU0DGAyDdx;

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi Haitao, > Can I know how you enabled vPMU on Nehalem? This is not supported in > current Xen. http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html > > Concerning vpmu support, I totally agree that we can disable this > feature by default. If anyone really wants to use it, he can use boot > options to turn it on. Yes, that's OK for me. > I am preparing a patch for that. And I will > send a patch to enable NHM vpmu together. > > For the problem that Dietmar met, I think I once met this before. Can > you add some code in vpmu_do_interrupt that sets the counter you are > using to a value other than zero? Please let me know if that can help. I don't set the counter to zero. I use 0-val to set the counter. Actually I testet on Nehalem with - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000 - Fixed counter #1 (0x30a) and val=1100000 The thing is that in normal case the overflows of both counters appear nearly at the same time. As described I added some extra tracer for xentrace in core2_vpmu_do_interrupt() so the code looks like: rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. Step { uint32_t HAHN_l, HAHN_h; HAHN_l = (uint32_t) msr_content; HAHN_h = (uint32_t) (msr_content >> 32); HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step } if ( !msr_content ) return 0; core2_vpmu_cxt->global_ovf_status |= msr_content; msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1); wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. Step rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. Step { uint32_t HAHN_l, HAHN_h; HAHN_l = (uint32_t) msr_content; HAHN_h = (uint32_t) (msr_content >> 32); HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> 5. Step rdmsrl(0xc3, msr_content); -> 6. Step General counter #2 HAHN_l = (uint32_t) msr_content; HAHN_h = (uint32_t) (msr_content >> 32); HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); rdmsrl(0x30a, msr_content); -> 7. Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; HAHN_h = (uint32_t) (msr_content >> 32); HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } With these tracers I got the following output: Last good NMI: Both counter cause the NMI. Resetting works OK. The counter itself were running further. 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] rdmsrl(0xc3) -> #2 general counter 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] rdmsrl(0x30a) -> #1 fixed counter NMI from where things goes wrong: Both counter cause the NMI. Resetting works NOT correct, only for the general counter! The general counter (caused the NMI) seems to be stopped! 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) -> #2 general counter 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) -> #1 fixed counter Wrong NMI: Only the fixed counter causes the NMI (which was not resetted during NMI handling above!) Both counter seems to be stopped! 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) -> #2 general counter 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) -> #1 fixed counter And this state remains forever! I hope my explanations are understandable ;-) Until now I can see this behavior only on a Nehalem processor. Thanks. Dietmar > > Best Regards > Shan Haitao > > 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>: > > On 30/10/2009 12:20, "Dietmar Hahn" <dietmar.hahn@xxxxxxxxxxxxxx> wrote: > > > >> I searched the intel processor spec but couldn't find any help. > >> So my questions is, what is wrong here? > >> Can anybody with more knowledge point me in the right direction, what can I > >> still > >> do to find the real cause of this? > > > > You should probably Cc one of the Intel guys who implemented this stuff -- > > I've added Haitao Shan. > > > > Meanwhile I'd be interested to know whether things work okay for you, minus > > performance counters and the hypervisor hang, if you return immediately from > > vpmu_initialise(). Really at minimum we need such a fix, perhaps with a boot > > paremeter to re-enable the feature, for 3.4.2 release; allowing guests to > > hose the hypervisor like this is of course not on. > > > > -- Keir > > -- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.