[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

XenVif div by zero on Tx path after resume.


  • To: Martin Harvey <martin.harvey@xxxxxxxxxx>, "win-pv-devel@xxxxxxxxxxxxxxxxxxxx" <win-pv-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Martin Harvey <martin.harvey@xxxxxxxxxx>
  • Date: Thu, 14 Apr 2022 10:27:35 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=f4xeonW5s/lAaAuo+MvN7yikNOPnGbc8yU5crcS3FfI=; b=mgGIKVAgHRrBbCcANcfPh5GDeJl/YjYcNIX+cQv8AlJGy8ePq4gw+A8jU5RBb3j2zxb6INPQza2irgHfo1d1z0qC9ZfJvsyAF4PcH8irhvKFvawbxNmmdw/Kxx7twxtt1Ljcl74JOD5EAOFG0Xr9JZsooIYbld591OkWNf3O2XQiGGykEmmtr4iwyRDmbdPCL5C5q9ydZ6izdlVJLqnE3V0leLJeiLaPjUK7TPqGIVzH33YVg0ABjm6zF/1jEIIxnj/8CXjodYkfW6X3twxuswH+9GXKPgXT+LDD376EkTGGBUFH+ioyzPHY7ialwn42bAUTGyXCMAfl53lPAXiiQg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FHM3VlSuKT/ohBu1TC+xPLFKxD78yGtkhCdD870/Wm8JwXBtxdVd5NxM2q4o+NQLU7o1vXauXcjMlkV0ZwFwIDXQOTZQNM707erxU+cuzRR6C/6E8p/8BiOAF96YjAemsLdGpztlez5oqQkD+ozNAQ0Nb2nd8yKjf9X1VKEXKTS/VuFizDQFN/G9R+Ucth5MKW2Tzh34b6twOy/JPnjNmV4YEYATlhjdhMg7de/gadI235DbN7js0pb4fUH3pvj7bg1bvHIJM+NE8WW560P0zARn+HoARG2DbKFIFJ1slyMYkqc9G9ae2AvXr2F5InOYOOoiP93eO7eq1UfDDs/sSg==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Delivery-date: Thu, 14 Apr 2022 10:27:47 +0000
  • Ironport-data: A9a23:eugJda8C3j8glLeRyylbDrUD/36TJUtcMsCJ2f8bNWPcYEJGY0x3n GYWC26AOvjfYjPxedB/aYvg8k0EvcfUxtU3HVY4/Cg8E34SpcT7XtnIdU2Y0wF+jyHgoOCLy +1EN7Es+ehtFie0Si+Fa+Sn9T8mvU2xbuKU5NTsY0idfic5DnZ54f5fs7Rh2NQw3YHlW1rlV e7a+KUzBnf0g1aYDUpMg06zgEsHUCPa4W5wUvQWPJinjXeG/5UnJMt3yZKZdhMUdrJ8DO+iL 9sv+Znilo/vE7XBPfv++lrzWhVirrc/pmFigFIOM0SpqkAqSiDfTs/XnRfTAKtao2zhojx/9 DlCnZK3TzswM7LdochHCgh+NHtCL/Ucv7CSdBBTseTLp6HHW37lwvEoB0AqJ4wIvO1wBAmi9 9RBdmpLNErawbvrnvTrEYGAhex6RCXvFIYWoG1hy3fZBOw4TIHrSKTW/95Imjw3g6iiGN6AN 5FFOWMyMXwsZTV/BWcVLcw9oNu3rWfcfD1h+FSyiJY4tj27IAtZj+G2bYu9lsaxbdpUmAOAu CfK8nr0BjkeNceD0nyV/3S0nOjNkCjnHoUIG9WFGuVC2QPJgDZJUVtPCAX98aLRZlOCt8x3J UsSxjMjsrkL2BbyX//aBSGq+CTDsUtJMzZPKNES5AaIw6vSxg+WAGkYUzJMAOAbWN8KqS8Cj QHQwY6wbdB7mPjMEC/GqO/Ixd+nEXJNRVLucxPoWufsDzPLhIgoxizCQd94eEJepo2kQGqgq 9xmQcVXulnysSLp//jjlbwkq2j1znQscuLTzl+GNo5CxlknDLNJn6TytTDmAQ9ode51tGWps nkegNS55+sTF5yLnyHlaLxTQOD5va7cbmGH2gMH83wdG9KFoS/LkWd4um8WGauUGpxcJW+Bj LH75Gu9G6O/zFP1NPQqMupd+uwhzLT6FMSNaxwnRoEmX3SFTyfepHsGTRfJhwjFyRFw+Ylia cbzWZv9Vh4yVPU4pAdass9AiNfHMAhlnjiNLX06pjz6uYejiIm9EO5bawrfMrpktMtpYmz9q r5iCidD8D0GOMXWaSjL648Da1cMKHkwH5ftrMJLMOWEJ2Jb9KsJUpc9HZtJl1RZoplo
  • Ironport-hdrordr: A9a23:xZeeEq9qM/uCP2tA365uk+FVdb1zdoMgy1knxilNoENuHfBwxv rDoB1E73LJYW4qKQkdcKO7SdK9qBTnhNVICOgqTP+ftWzd1ldAQ7sSi7cKrweQeBEWldQtn5 uIEZIOceEYZGIS5a2RgWmF+r4bsZ26GcuT9ILjJgJWPGZXgtZbnmNE42igYy9LbTgDIaB8OI uX58JBqTblU28QdN6HCn4MWPWGj8HXlbr9CCR2SiIP2U2rt3eF+bT6Gx+X0lM1SDVU24ov9m DDjkjQ+rijifem0RXRvlWjo6i+2eGRheerNvb8y/T9GQ+cyjpAo74RGIFqiQpF7t1HLmxa0u Uk7S1QeviboEmhBF1d6SGdpjUIlgxerEMKgGXo/UfLsIj3Qik3BNFGgp8cehzF61A4tNU5y6 5T2XmF3qAnei8osR6NkuQgbSsa4nZcYEBS4dI7njhaS88TebVRpYsQ8AdcF4oBBjvz7MQiHP N1BM/R6f5KeRfCBkqp91VH0ZipRDA+Dx2GSk8Ntoic1CVXhmlwyw8dyNYElnkN+ZohQ91P5v jCMK5viLZSJ/VmG55VFaMEW4+6G2bNSRXDPCabJknmDrgOPzbXp5v+8NwOlZOXkVwzvegPcb j6IS1lXDQJCj3T4OW1rex2ziw=
  • List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>
  • Thread-index: AdhP6kAbDA19BbYtRECv6F4UUU/GxA==
  • Thread-topic: XenVif div by zero on Tx path after resume.

Hi Paul (and others!)

I have done a bit of digging on this, and it looks like it's due to changes 
made in the suspend/resume path. See log at the bottom for the failure case.

In summary:

- Originally I suspect suspend callbacks were early (which actually lowered 
frontend state)...
- and late, which is after the resume when devices and the system are powering 
back up.
- ... and there was some (more) synchronization between suspend callbacks and 
PDO power state changes.

For reasons which are not obvious, the frontend and pdo power states are left 
as up and running, and the
Late callback cycles the frontend state down and back up again, leaving the 
VifSuspendCallbackLate to actually take the frontend back to the final 
CONNECTED state.

This raises a whole bunch of questions, not least:

- Initial suspend does not lower PDO power state because it's on boot path / or 
some other reason?
- Why frontend suspend callback early just sets "online" to false, instead of 
actually lowering the state properly.
- Where we actually use some suspend callbacks to request a change in system 
power, or is the suspend / resume / migrate supposed to be totally transparent?
- How we're supposed to synchronise the Tx path with suspend / resume if the 
latter does not command some system or power state change visible to the OS 
when we request the guest suspends.

As it is, the suspend late callbacks happen in a deferred manner, and there's 
nothing to stop the Tx path from making a request to send a packet if the OS 
cannot / has not seen a PDO power state change for the PV network device.

As such, the current DIV by zero fix of dropping the packet seems to be to be 
an acceptable workaround. The alternative would be perhaps to explicitly 
synchronize the VIF suspend callbacks with PDO power state changes for the PV 
network device. How?

Thoughts?

XEN|DEBUG: ====> (xenvif.sys + 0000000000008A40)
xenvif|FRONTEND: PATH: device/vif/0
xenvif|FRONTEND: DEBUG CALLERS NEXT PUT PTR: 15
xenvif|FRONTEND: CALLER (0): __FrontendResume to state  (PdoResume, 
FdoAddPhysicalDeviceObject)
xenvif|FRONTEND: CALLER (1): __PdoD3ToD0 to state 3 (PdoStartDevice)
xenvif|FRONTEND: CALLER (2): VifEnable to state 4
xenvif|FRONTEND: CALLER (3): __FrontendSuspend to state 0 
(FrontendSuspendCallbackLate)
xenvif|FRONTEND: CALLER (4): __FrontendResume to state 1 
(FrontendSuspendCallbackLate)
xenvif|FRONTEND: CALLER (5): __PdoD0ToD3 to state 1 (PdoSuspendCallbackLate)
xenvif|FRONTEND: CALLER (6): __PdoD3ToD0 to state 3 (PdoSuspendCallbackLate)
xenvif|FRONTEND: CALLER (7): VifSuspendCallbackLate to state 4
xenvif|FRONTEND: CALLER (8): __FrontendSuspend to state 0 
(FrontendSuspendCallbackLate)
xenvif|FRONTEND: CALLER (9): __FrontendResume to state 1 
(FrontendSuspendCallbackLate)
xenvif|FRONTEND: CALLER (10): __PdoD0ToD3 to state 1 (PdoSuspendCallbackLate)
xenvif|FRONTEND: CALLER (11): __PdoD3ToD0 to state 3 (PdoSuspendCallbackLate)
xenvif|FRONTEND: CALLER (12): VifSuspendCallbackLate to state 4
xenvif|FRONTEND: CALLER (13): __FrontendSuspend to state 0 
(FrontendSuspendCallbackLate)
xenvif|FRONTEND: CALLER (14): __FrontendResume to state 1 
(FrontendSuspendCallbackLate)
xenvif|FRONTEND: CALLER (15): (none) to state 0

xen|BUGCHECK: ====>
xen|BUGCHECK: ASSERTION_FAILURE: FFFFF80113373A40 FFFFF80113373A60 
000000000000144E 0000000000000000
xen|BUGCHECK: FILE: 
E:\jenkins\workspace\nvif_private_martinhar_CA-355670\local\src\xenvif\transmitter.c
 LINE: 5198
xen|BUGCHECK: TEXT: !NT_SUCCESS(status)







 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.