[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] piix: fix regression during unplug in Xen HVM domUs


  • To: Paolo Bonzini <pbonzini@xxxxxxxxxx>
  • From: Olaf Hering <olaf@xxxxxxxxx>
  • Date: Wed, 10 May 2023 00:58:27 +0200
  • Arc-authentication-results: i=1; strato.com; arc=none; dkim=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1683673110; s=strato-dkim-0002; d=strato.com; h=In-Reply-To:References:Message-ID:Subject:Cc:To:From:Date:Cc:Date: From:Subject:Sender; bh=GCP5/CljJD+CfipxX1GkhTK2QgMNJf/4Nny5lGr0f04=; b=cRUuD9cGfKlEtCzb7BlTSVPbFMUF6G907RL/iPoIp69Vw+i0bHc3oRsTKY7Ko/I8JK V9tJ4LT5VJcWimcyKS99DKQt6pc5kitto/mKV/Yoir34iWXaEPXRQiqBGQUgoooIBVYn 8s2w+ULF6ZPquxMRsa+NmJwJJtVX4+/WMmhKekyvDjQoBlEq4tcUxDklqpB159Q1Nsgp KkHLz1a2nUxlushsIreIUEpTvF7vh9okq9o6TvZSUfWymTsTQRErIqkPMip27sJmXSw9 r+FpPX4wSwfzzoNJ0pqH9LMFMDokbYBRKAflW6qorO8B3Y7DCRjcKWU4DRXQpHz2NXYr fOBQ==
  • Arc-seal: i=1; a=rsa-sha256; t=1683673110; cv=none; d=strato.com; s=strato-dkim-0002; b=eZ1Xom1Goq0t0rDLQFjHxTIvR8GF9UFHiztm7ZLPmDIP0fFpWB3eD2NmIGqoeIwoFl UIvVVvwMZm+zPF2ghgAfQa6p89Yr741UO+3SCTQHVgF0va0TR4wCIxmUxgPQERrnKJam RfVyy5p89WDEWSh6bSZIpSrQ5VL8E6N8eIUJeP0EZ23CRJbmmfDPgVzXv7yDDtb+upXv zuJfmNGVD5SyN/6Xg4cHokERLz+EWafb9iM7/J051UboDX/LqCtP0ZEIoSL01qkViGlJ EmA/DVDBV1QlL8xJit6aaZgt4ADEvjNJDaTnUTPaj9u3pfbVHL0fVdgNPMylUWHGDGEr oxUA==
  • Cc: John Snow <jsnow@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Stefano Stabellini <sstabellini@xxxxxxxxxx>, qemu-devel@xxxxxxxxxx, qemu-block@xxxxxxxxxx, Philippe Mathieu-Daudé <f4bug@xxxxxxxxx>
  • Delivery-date: Tue, 09 May 2023 22:59:06 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Resuming this old thread about an unfixed bug, which was introduced in qemu-4.2:

qemu ends up in piix_ide_reset from pci_unplug_disks.
This was not the case prior 4.2, the removed call to
qemu_register_reset(piix3_reset, d) in
ee358e919e385fdc79d59d0d47b4a81e349cd5c9 did apparently nothing.

In my debugging (with v8.0.0) it turned out the three pci_set_word
causes the domU to hang. In fact, it is just the last one:

   pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */

It changes the value from 0xc121 to 0x1.

The question is: what does this do in practice?

Starting with recent qemu (like 7.2), the domU sometimes proceeds with
these messages:

    [    1.631161] uhci_hcd 0000:00:01.2: host system error, PCI problems?
    [    1.634965] uhci_hcd 0000:00:01.2: host controller process error, 
something bad happened!
    [    1.634965] uhci_hcd 0000:00:01.2: host controller halted, very bad!
    [    1.634965] uhci_hcd 0000:00:01.2: HC died; cleaning up
    Loading basic drivers...[    2.398048] Disabling IRQ #23

Is anyone familiar enough with PIIX3 and knows how these devices are
interacting?

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] 
(rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
00:03.0 VGA compatible controller: Cirrus Logic GD 5446


Thanks,
Olaf

On Thu, Mar 25, Paolo Bonzini wrote:

> On 25/03/21 12:12, Olaf Hering wrote:
> > Am Mon, 22 Mar 2021 18:09:17 -0400
> > schrieb John Snow <jsnow@xxxxxxxxxx>:
> > 
> > > My understanding is that XEN has some extra disks that it unplugs when
> > > it later figures out it doesn't need them. How exactly this works is
> > > something I've not looked into too closely.
> > 
> > It has no extra disks, why would it?
> > 
> > I assume each virtualization variant has some sort of unplug if it has to 
> > support guests that lack PV/virtio/enlightened/whatever drivers.
> 
> No, it's Xen only and really should be legacy.  Ideally one would just have
> devices supported at all levels from firmware to kernel.
> 
> > > So if these IDE devices have been "unplugged" already, we avoid
> > > resetting them here. What about this reset causes the bug you describe
> > > in the commit message?
> > > 
> > > Does this reset now happen earlier/later as compared to what it did
> > > prior to ee358e91 ?
> > 
> > Prior this commit, piix_ide_reset was only called when the entire
> > emulated machine was reset. Like: never. With this commit,
> > piix_ide_reset will be called from pci_piix3_xen_ide_unplug. For some
> > reason it confuses the emulated USB hardware. Why it does confused
> > it, no idea.
> 
> > I wonder what the purpose of the qdev_reset_all() call really is. It
> > is 10 years old. It might be stale.
> 
> piix_ide_reset is only calling ide_bus_reset, and from there ide_reset and
> bmdma_reset.  All of these functions do just two things: reset internal
> registers and ensure pending I/O is completed or canceled.  The latter is
> indeed unnecessary; drain/flush/detach is already done before the call to
> qdev_reset_all.
> 
> But the fact that it breaks USB is weird.  That's the part that needs to be
> debugged, because changing IDE to unbreak USB needs an explanation even if
> it's the right thing to do.
> 
> If you don't want to debug it, removing the qdev_reset_all call might do the
> job; you'll have to see what the Xen maintainers think of it.  But if you
> don't debug the USB issue now, it will come back later almost surely.
> 
> Paolo

Attachment: signature.asc
Description: Digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.