[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu
CC'ing Kevin O'Connor On 10/14/15 13:27, Ian Campbell wrote: > On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote: >>> Can't you just teach SeaBIOS how to deal with your PV disks and then >>> only add that to your VM and forget about IDE/AHCI? I mean, that's how >>> it's done for virtio-blk, and it doesn't involve any insanities like >>> ripping out non-hotpluggable devices. >> >> Teaching SeaBIOS to deal with PV disks can be done, in fact we already >> support PV disks in OVMF. It is possible to boot Windows with OVMF >> without any IDE disks (patch pending for libxl to create a VM without >> emulated IDE disks). > > One stumbling block in the past has been how to know when the PV drivers in > the BIOS are no longer required, such that the ring can be torn down and/or > the connection etc handed over to the OS driver. > > I think we deal with this in OVMF using ExitBootServices? (TBH I'm not sure > how). Search "XenBusDxe/XenBusDxe.c" in edk2 for "EVT_SIGNAL_EXIT_BOOT_SERVICES". TBH, the code in NotifyExitBoot() doesn't seem valid. If you check the UEFI spec (for example, v2.5, but the requirement I'm about to quote is very old), in the specification of EFI_BOOT_SERVICES.CreateEvent() you find: EVT_SIGNAL_EXIT_BOOT_SERVICES This event is to be notified by the system when ExitBootServices() is invoked. This event is of type EVT_NOTIFY_SIGNAL and should not be combined with any other event types. The notification function for this event is not allowed to use the Memory Allocation Services, or call any functions that use the Memory Allocation Services and must only call functions that are known not to use Memory Allocation Services, because these services modify the current memory map.The notification function must not depend on timer events since timer services will be deactivated before any notification functions are called. NotifyExitBoot() in "XenBusDxe/XenBusDxe.c" calls the DisconnectController() boot service. That in turn leads to calls to EFI_DRIVER_BINDING_PROTOCOL.Stop() functions (speaking generally), which inevitably free memory as part of unbinding the device, thereby breaking the above requirement. The right solution is the following: - when a driver binds a device (a "handle"), a piece of the resources allocated for that binding should be a new event, to be signaled at ExitBootServices() time. The handler function can be shared by all such devices. The context passed to the handler should be the (driver- specific) structure that represents the binding and the state of the device in general. - When the driver unbinds the device, the event should be closed. This will automatically unregister the callback as well. - Now, when the callback is entered at all, you can be sure that the binding still exists. In this case, you should probe into the various fields of the context (the device state, practically), to figure out if this device "lives" or is dormant. For simpler devices, the answer is always "alive", but some devices could have configuration states where they are bound yet not configured (using no hw resources etc). - In case the device is alive, the action to take is to make it abort any in-flight transfers or other operations, and re-set / deconfigure it *without* touching any memory allocations. You can see this in the virtio-net driver in OVMF. In the "OvmfPkg/VirtioNetDxe" directory, see the "TechNotes.txt", "DriverBinding.c" and "Events.c" files. The callback function is VirtioNetExitBoot(), and the event registration / deregistration happens in: VirtioNetDriverBindingStart() VirtioNetSnpPopulate() gBS->CreateEvent(EVT_SIGNAL_EXIT_BOOT_SERVICES) vs. VirtioNetDriverBindingStop() VirtioNetSnpEvacuate() gBS->CloseEvent() What VirtioNetExitBoot() does is very simple: resetting the virtio device is a small action, and it covers the responsibilities. I'll admit that the virtio-scsi and virtio-block drivers play a bit dirty here. (I've known this for a long time, but been silent about it.) They should have similar callbacks, but don't (In theory, all devices that are bound & alive at ExitBootServices() should be re-set, without touching the memory services.) There are two mitigating factors here: - unlike with virtio-net, the scsi and block drivers in OVMF support only synchronous operations. When you are not calling their functions, there are no transfers in flight. And when something calls ExitBootServices(), that thing is not calling virtio-block / virtio-scsi functions. - The first thing the OS-level virtio drivers do (certainly in Linux, hopefully in Windows) is a virtio-reset on each virtio device found. (This is actually required by the virtio specification, both old and new.) Now, there's one small window for issues here. If something in the guest scribbled over the memory that, according to QEMU, still hosts a live virtio ring, between ExitBootServices() and said OS-level virtio-reset, QEMU *could* interpret that -- even without an explicit virtio kick -- as garbage virtio operations, and abort the guest (or the device would enter a failed state, not sure what happens in today's QEMU). This window is very very small in practice however. OVMF allocates the virtio rings in EfiBootServicesData type memory. The OS is allowed to reuse that kind of memory right after ExitBootServices(). However, Linux developers have found that many buggy UEFI drivers :) allow those areas to be used by hardware even after ExitBootServices(), at least for a while. So they have some reservation code in place to deal with that. (I'm fuzzy on the details, sorry. CC'ing Ard and Matt.) In any case, this decreases the chance that virtio ring corruption could occur between ExitBootServices() and the OS-level virtio-reset, for OVMF's virtio-block and virtio-scsi drivers. For purity, I should really write those few tens of lines of ExitBootServices() handler code for both drivers... but you know how it works; you make time for what really matters. I have never seen (occur, or be reported) such a problem with OVMF. However, what I *have* seen is a similar ring corruption in SeaBIOS. There was a quirky error path in the code that enabled SeaBIOS to internally release (and later, reuse) the memory that from QEMU's POV still hosted a live virtio ring. As you can imagine, the release and the "reuse" (-> abort by QEMU) were quite distanced from each other. Fixed it in SeaBIOS commit 5f2d17d35b23. See also later commit 5e03881f98b3. > AFAIK the BIOS interfaces do not have anything as reliable as that. > > How does virtio deal with this in the BIOS case? It doesn't, as far as I can tell. I don't think it has to, though! On a BIOS box, you can always boot DOS, or another operating system that continues to use the BIOS interfaces forever. (Same as if you never call ExitBootServices() in UEFI.) Given that no starter pistol gets fired between the firmware and the OS on such a platform, they must always respect each other. I guess this could occur through the E820 map, or some such. No clue in what kind of E820 memory SeaBIOS allocates the virtio rings, but I guess the Linux kernel stays away from those areas until it's past device probing and binding. In other words, I guess it is exactly the existence of ExitBootServices() that *gives rise* to this issue. Without ExitBootServices(), the BIOS can never know when its services are no longer needed, so it keeps them alive. For which reason, the OS -- without means to tell the BIOS to go away *now* -- must stay away for a reasonably long time too. (I'm speculating of course.) With ExitBootServices(), there's a clear separation, but when only one side adheres to it, things can break badly. Long story short, here's my unwashed recommendation for Xen PV: - In OVMF, fix the current ExitBootServices() handler. Only de-configure the device using the XenBus protocol, without allocating or freeing any dynamic memory, directly or indirectly. I'll also add writing ExitBootServices() handlers for virtio-block and virtio-scsi to my TODO list. (Now that you've forced me to own up to their lack.) Sigh. :) - In SeaBIOS, allocate the ring in normal memory, and trust that the kernel stays away until the Xen guest driver binds the device, and resets it (as the very first step) If you're worried (or else, if I'm simply wrong!), allocate the ring in reserved memory. A few lost pages are a small price to pay for XenPV in SeaBIOS! ;) ... It's 1AM again. Sigh. Someone steal my keyboard, please. (And hit me with it too if I wrote many incorrect statements above...) Cheers Laszlo _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |