[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-ia64-devel] Important Xen/ia64 domU/vbd fix committed
I've just committed a bug fix to xen-ia64-unstable.hg that seems to make domU much much more stable. With it, I have been able to untar and build Linux on domU for the first time, and also fsck domU's "disk" (rhel.img file). Before I explain, let me first apologize for falling behind on other patches. I have been very focused on understanding and exploring various solutions to this bug since last Friday. I'll try to catch up after I get some sleep and recovery time. (Note also that HP is closed for the holidays from Dec 23 PM until Jan 3. I think I will have some access to email and test machines during that time, but will also take some vacation days.) The problem: Although domU has been booting successfully for many Xen/ia64 users, everyone has experienced some instability. While many commands work fine, others fail and some have caused the system to crash. In particular, some file intensive operations such as fsck and untar'ing Linux consistently fail, and in some cases dom0's disk has been trashed, requiring a full RHEL reinstallation. Last week, Matt Chapman isolated a serious problem: When domU shares pages with dom0 (e.g. for virtual I/O rings), dom0 accesses them by "direct mapping" a domU machine address. While this works fine, some drivers layered under the dom0 virtual I/O backend (including the loopback driver) sometimes use a virt_to_page() on the dom0 virtual address. Since the virtual address represents a physical address that was not in dom0's EFI memory map, dom0's memmap may not have allocated a "struct page" for this address, so virt_to_page gives an address of a non-existent "struct page" (e.g. off the end of the memmap array). Accessing this non-existent struct page may read/write "random" memory in dom0, domU, or even in Xen itself. Boom! The obvious answer is to ensure that when dom0 boots, a memmap is built that is sufficiently large to cover accesses to domU shared pages. This is easier said than done. After several days of poring over Linux code, consulting with HP Linux experts, and trying out various solutions, I gave up; without significant changes to Linux (including common code), I don't think it is possible to coax Linux to both create a memmap to cover all of physical memory AND ensure that it doesn't use those pages itself. I also considered giving all memory (except the Xen heap) to domain0 and "ballooning" it back for domUs. The core Xen team wasn't too keen on that approach, the balloon driver isn't yet implemented on Xen/ia64, and I think there will be some challenging security questions (e.g. what if dom0 swaps out domU's pages to the dom0 disk?). Finally, I settled on "reserving a chunk" at the end of physical memory for domain0's exclusive use. To make this visible via the EFI mem_map, the chunk has to be granule sized/aligned. This granule gets reserved early in Xen's boot and gets passed to dom0 at dom0 launch. This is an ugly hack, but it is simple, requires no changes to Linux and, most importantly, it works. The patch is checked into xen-ia64-unstable.hg as cset 8374. I would appreciate it if others would give it a try. And if someone can implement a better solution, please let me know. It won't work for NUMA machines, but we can worry about that later. In the meantime, domU is much much more stable. I doubt that this is the "last bug" we will find affecting domU stability, but it was a tough one. Thanks very much to Matt for isolating the problem! Dan _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |