[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5][XSA-97] x86/paging: make log-dirty operations preemptible
On 05/09/14 11:47, Jan Beulich wrote: > Both the freeing and the inspection of the bitmap get done in (nested) > loops which - besides having a rather high iteration count in general, > albeit that would be covered by XSA-77 - have the number of non-trivial > iterations they need to perform (indirectly) controllable by both the > guest they are for and any domain controlling the guest (including the > one running qemu for it). > > Note that the tying of the continuations to the invoking domain (which > previously [wrongly] used the invoking vCPU instead) implies that the > tools requesting such operations have to make sure they don't issue > multiple similar operations in parallel. > > Note further that this breaks supervisor-mode kernel assumptions in > hypercall_create_continuation() (where regs->eip gets rewound to the > current hypercall stub beginning), but otoh > hypercall_cancel_continuation() doesn't work in that mode either. > Perhaps time to rip out all the remains of that feature? > > This is part of CVE-2014-5146 / XSA-97. > > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > Reviewed-by: Tim Deegan <tim@xxxxxxx> Unfortunately XenRT is finding reliable issues with this version of the patch. Taking two builds of XenServer, identical other than this patch (Xen-4.4.1 based adjusting for -EAGAIN/-ERESTART), the build without is fine, but the build with appears to show page accounting issues. The logs below are from a standard vmlifecycle ops test of RHEL6.2 with a 32bit and 64bit PV guest undergoing tests in tandem. E.g: (XEN) [ 4141.838508] mm.c:2352:d0v1 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 2317f0 (pfn 14436) (XEN) [ 4141.838512] mm.c:2995:d0v1 Error while pinning mfn 2317f0 Failure to pin a batch of domain 78's pagetables on restore. (XEN) [ 7832.953068] mm.c:827:d0v0 pg_owner 100 l1e_owner 100, but real_pg_owner 99 (XEN) [ 7832.953072] mm.c:898:d0v0 Error getting mfn 854c3 (pfn 2c820) from L1 entry 00000000854c3025 for l1e_owner=100, pg_owner=100 (XEN) [ 7832.953076] mm.c:1221:d0v0 Failure in alloc_l1_table: entry 488 (XEN) [ 7832.953083] mm.c:2099:d0v0 Error while validating mfn 12406d (pfn 18fbe) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001 (XEN) [ 7832.953086] mm.c:906:d0v0 Attempt to create linear p.t. with write perms (XEN) [ 7832.953089] mm.c:1297:d0v0 Failure in alloc_l2_table: entry 4 (XEN) [ 7832.953100] mm.c:2099:d0v0 Error while validating mfn 23ebe4 (pfn 1db65) for type 2000000000000000: caf=8000000000000003 taf=2000000000000001 (XEN) [ 7832.953104] mm.c:948:d0v0 Attempt to create linear p.t. with write perms (XEN) [ 7832.953106] mm.c:1379:d0v0 Failure in alloc_l3_table: entry 0 (XEN) [ 7832.953110] mm.c:2099:d0v0 Error while validating mfn 2019db (pfn 18eaf) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001 (XEN) [ 7832.953113] mm.c:2995:d0v0 Error while pinning mfn 2019db Failure to pin a batch of domain 100's pagetables on restore. In both of these cases, the save side succeeds, which means the pagetable normalisation found fully complete and correct pagetables (i.e. the p2m and m2p agreed), and xc_get_pfn_type_batch()/xc_map_foreign_bulk() didn't fail any domain ownership tests. On inspection of the libxc logs, I am feeing quite glad I left this debugging message in: xenguest-75-save[11876]: xc: detail: Bitmap contained more entries than expected... xenguest-83-save[32123]: xc: detail: Bitmap contained more entries than expected... xenguest-84-save[471]: xc: detail: Bitmap contained more entries than expected... xenguest-88-save[3823]: xc: detail: Bitmap contained more entries than expected... xenguest-89-save[4656]: xc: detail: Bitmap contained more entries than expected... xenguest-95-save[9379]: xc: detail: Bitmap contained more entries than expected... xenguest-98-save[11784]: xc: detail: Bitmap contained more entries than expected... This means that periodically, a XEN_DOMCTL_SHADOW_OP_{CLEAN,PEEK} hypercall gives us back a bitmap with more set bits than stats.dirty_count which it hands back at the same time. Domain 75 (the 46bit was the first with the bitmap error, migrated to domain 76, then to 78 which suffered a pinning failure. Beyond this point, on the 32bit domain continues testing, and suffers a similar problem later. I have found a bug in my accounting code (need to change two set_bit()s to test_and_set_bit()s before blindly incrementing the stat), but the precondition which tickles this bug indicates something is going awry with the final logdirty bitmap as used by the migration code. Unfortunately, I am now out of the office for 6 working days (back on Monday 22nd), but will be sporadically on email during that time. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |