[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] XenVif div by zero on Tx path after resume.
Hi Paul (and others!) I have done a bit of digging on this, and it looks like it's due to changes made in the suspend/resume path. See log at the bottom for the failure case. In summary: - Originally I suspect suspend callbacks were early (which actually lowered frontend state)... - and late, which is after the resume when devices and the system are powering back up. - ... and there was some (more) synchronization between suspend callbacks and PDO power state changes. For reasons which are not obvious, the frontend and pdo power states are left as up and running, and the Late callback cycles the frontend state down and back up again, leaving the VifSuspendCallbackLate to actually take the frontend back to the final CONNECTED state. This raises a whole bunch of questions, not least: - Initial suspend does not lower PDO power state because it's on boot path / or some other reason? - Why frontend suspend callback early just sets "online" to false, instead of actually lowering the state properly. - Where we actually use some suspend callbacks to request a change in system power, or is the suspend / resume / migrate supposed to be totally transparent? - How we're supposed to synchronise the Tx path with suspend / resume if the latter does not command some system or power state change visible to the OS when we request the guest suspends. As it is, the suspend late callbacks happen in a deferred manner, and there's nothing to stop the Tx path from making a request to send a packet if the OS cannot / has not seen a PDO power state change for the PV network device. As such, the current DIV by zero fix of dropping the packet seems to be to be an acceptable workaround. The alternative would be perhaps to explicitly synchronize the VIF suspend callbacks with PDO power state changes for the PV network device. How? Thoughts? XEN|DEBUG: ====> (xenvif.sys + 0000000000008A40) xenvif|FRONTEND: PATH: device/vif/0 xenvif|FRONTEND: DEBUG CALLERS NEXT PUT PTR: 15 xenvif|FRONTEND: CALLER (0): __FrontendResume to state (PdoResume, FdoAddPhysicalDeviceObject) xenvif|FRONTEND: CALLER (1): __PdoD3ToD0 to state 3 (PdoStartDevice) xenvif|FRONTEND: CALLER (2): VifEnable to state 4 xenvif|FRONTEND: CALLER (3): __FrontendSuspend to state 0 (FrontendSuspendCallbackLate) xenvif|FRONTEND: CALLER (4): __FrontendResume to state 1 (FrontendSuspendCallbackLate) xenvif|FRONTEND: CALLER (5): __PdoD0ToD3 to state 1 (PdoSuspendCallbackLate) xenvif|FRONTEND: CALLER (6): __PdoD3ToD0 to state 3 (PdoSuspendCallbackLate) xenvif|FRONTEND: CALLER (7): VifSuspendCallbackLate to state 4 xenvif|FRONTEND: CALLER (8): __FrontendSuspend to state 0 (FrontendSuspendCallbackLate) xenvif|FRONTEND: CALLER (9): __FrontendResume to state 1 (FrontendSuspendCallbackLate) xenvif|FRONTEND: CALLER (10): __PdoD0ToD3 to state 1 (PdoSuspendCallbackLate) xenvif|FRONTEND: CALLER (11): __PdoD3ToD0 to state 3 (PdoSuspendCallbackLate) xenvif|FRONTEND: CALLER (12): VifSuspendCallbackLate to state 4 xenvif|FRONTEND: CALLER (13): __FrontendSuspend to state 0 (FrontendSuspendCallbackLate) xenvif|FRONTEND: CALLER (14): __FrontendResume to state 1 (FrontendSuspendCallbackLate) xenvif|FRONTEND: CALLER (15): (none) to state 0 xen|BUGCHECK: ====> xen|BUGCHECK: ASSERTION_FAILURE: FFFFF80113373A40 FFFFF80113373A60 000000000000144E 0000000000000000 xen|BUGCHECK: FILE: E:\jenkins\workspace\nvif_private_martinhar_CA-355670\local\src\xenvif\transmitter.c LINE: 5198 xen|BUGCHECK: TEXT: !NT_SUCCESS(status)
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |