[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Invalid types between save and restore, Xen 3.1.4

Hi list,

I am currently charged with the implementation of save/restore/migrate inside NetBSD.

So far, my current work does manage to save/restore a NetBSD domU, but I am erratically (one out of ten) facing issues regarding page type validation and pinning when cycling saves/restores.

For unknown reasons, the save operation works, but restore might fail, with xend reporting:

[2008-12-04 17:24:40 219] INFO (XendCheckpoint:370) Received all pages (0 races) [2008-12-04 17:24:40 219] INFO (XendCheckpoint:370) ERROR Internal error: Failed to pin batch of 21 page tables
[2008-12-04 17:24:40 219] INFO (XendCheckpoint:370) Restore exit with rc=1

This is due to hypervisor refusing some type validation when xc_restore is issuing its xc_mmuext_op():

(XEN) mm.c:1842:d0 Bad type (saw 28000008 != exp e0000000) for mfn 1f16f (pfn 43e) (XEN) mm.c:649:d0 Error getting mfn 1f16f (pfn 43e) from L1 entry 1f16f023 for dom13
(XEN) mm.c:916:d0 Failure in alloc_l1_table: entry 768
(XEN) mm.c:1863:d0 Error while validating mfn 1ee38 (pfn 775) for type 20000000: caf=80000003 taf=20000001
(XEN) mm.c:683: get_l2_linear_pagetable() ret: 0 (exp 1)
(XEN) mm.c:1091:d0 Failure in alloc_l2_table: entry 1007
(XEN) mm.c:1863:d0 Error while validating mfn 1efb4 (pfn 5f9) for type 40000000: caf=80000003 taf=40000001
(XEN) mm.c:2132:d0 Error while pinning mfn 1efb4

It is kind of erratic, and hard to reproduce. I suppose that I am facing a race inside VM code, but as I am not familiar with Xen's inner workings with MMU, I am having a hard time tracking it.

The L1 and L2 entries at fault are always the same. The 1007 L2 entry corresponds to an "alternative" recursive PD in our VM subsystem, and the L1 768 is the start of our kernel's virtual memory.

This is with Xen 3.1.4. NetBSD does not use writable mappings, and manipulates MMU only through the hypercall API. MFN's manipulation are suspended during a save, to avoid any incorrect one after a restore.

What I would like to know is the kind of operations that could result on such a situation. Considering that the xentools should have an accurate view of the pfn_types through the p2m table, how could it become possible that between save and restore, hypervisor refuses to validate pages, as mappings should not change after the call to HYPERVISOR_suspend()?

For example, why is Xen expecting a writable mapping while the page is validated as L1?

I was wondering if anyone could shed some light for me. Please correct me if I am wrong.

Thanking you in advance for your help,

Jean-Yves Migeon

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.