[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] dom0 / hypervisor hang on dom0 boot


  • To: xen-devel@xxxxxxxxxxxxx
  • From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
  • Date: Tue, 21 May 2013 09:39:14 +0200
  • Cc: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Tue, 21 May 2013 07:40:07 +0000
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Cc:Subject:Date:Message-ID: User-Agent:In-Reply-To:References:MIME-Version: Content-Type:Content-Transfer-Encoding; b=SLIkhyjxRQZ4m1qW2wNlMNcAcFOR9gneh9RiGlnuT+/cRBqtCNI4g9jc 1lw0hfuUQfyAuzwBn8m3Fr9mroRz0ZVRYij/dks/woDFcK3D93dKFSL5q mKsi/B4xZtYdI+qSyPv1PVEywrNvvINAQ8qP6Hl50BJC5aLhGFlMy+MsP zcOTrB0YioE75AtHSIDskIJA1lAsNaIQMMolTEMdqWQjK337NiDirGhuR l9uh8m3gNEAQ1rxy26p6CVfVGFLhl;
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>

Am Freitag 17 Mai 2013, 18:28:16 schrieb Konrad Rzeszutek Wilk:

> On Thu, May 16, 2013 at 01:07:05PM +0200, Dietmar Hahn wrote:

> > Am Mittwoch 15 Mai 2013, 10:42:17 schrieb Jan Beulich:

> > > >>> On 15.05.13 at 11:12, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote:

> > > > Am Mittwoch 15 Mai 2013, 09:35:46 schrieb Jan Beulich:

> > > >> >>> On 15.05.13 at 08:53, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote:

> > > >> > I tried iommu=debug and I can't see any faulting messages but Iam not

> > > >> > familiar with this code.

> > > >> > I attached the logging, maybe anyone can have a look on this.

> > >

> > > Perhaps only (if at all) by instrumenting the hypervisor. The

> > > question of course is how easily/quickly you can narrow down the

> > > code region that it might be dying in. And whether it's a hypervisor

> > > action at all that causes the hang (as opposed to something the

> > > DRM code in Dom0 does).

> >

> > I added some debug code to the linux kernel and could track down the

> > point of the hang. I used openSuSE kernel 3.7.10-1.4 but I looked at newer

> > kernels and found that the code is similar.

> >

> > i915_gem_init_global_gtt(...)

> > ...

> > intel_gtt_clear_range(start / PAGE_SIZE, (end-start) / PAGE_SIZE);

> > ...

> >

> > void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)

> > {

> > unsigned int i;

> >

> > ---> A printk(...) here is seen on serial line!

> >

> > for (i = first_entry; i < (first_entry + num_entries); i++) {

> > intel_private.driver->write_entry(intel_private.base.scratch_page_dma,

> > i, 0);

> > }

> >

> > ---> A printk(...) here is never seen!

> >

> > readl(intel_private.gtt+i-1);

> > }

> >

> > The function behind the pointer intel_private.driver->write_entry is

> > i965_write_entry(). And the interesting instruction seems to be:

> > writel(addr | pte_flags, intel_private.gtt + entry);

> >

> > I added another printk() on start of the function i965_write_entry().

> > And surprisingly after printing a lot of messages the kernel came up!!!

> > But now I had other problems like losing the audio device (maybe timeouts).

> > So maybe the hang is a timing problem?

> >

> > What I wanted to check is, what the hypervisor is doing while the system hangs.

> > Has anybody an idea maybe a timer and after 30s printing a dump of the stack of

> > all cpus?

>

> Yes. Can you try the two attached patches please.

 

I tried both but none helped. I think it couldn't be expected as the first

patch handles an error case and the line with the second patch,

the call of pci_dma_sync_single_for_device(), gets not reached.

 

Dietmar.

 

--

Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.