[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs
Hi Andrew, > > The bug is still here, so we can exclude a microcode issue. > > Good - that is one further angle excluded. Always make sure you are > running with up-to-date microcode, but it looks like we back to > investigating a logical bug in libvmi or Xen. I reimplemented a small test, without the Drakvuf/Libvmi layers, that will inject traps on one API in Windows (NtCreateUserProcess), in the same way that Drakvuf does. I did some quick testing yesterday, with a Python script that was repeatedly starting the binary to monitor the API, and at the same time starting Ansible to run "c:\Windows\system32\reg.exe /?" via WinRM, to trigger some process creation. The traps are working, I see the software breakpoint hit, switching to the default view for singlestepping, and switching back to the execution view, so that's already good. After a series of tests on 1 or 4 VCPUs, my domain end up in 2 possible states: - frozen: the mouse doesn't move: so I would guess the VCPU are blocked. I'm calling the xc_(un)pause_domain APIs multiple times when I write to the shadow copies, but It's always synchronous, so I doubt that they interfered and "paused" the domain. Also, the log output I have before I detect that Ansible failed to execute is that the resume succeded and Xen is ready to process VMI events. - BSOD: that's the second possibility, apparently I'm corrupting critical data structure in the operating system, and the Windbg analysis is inconclusive, so I can't tell much. Either way, I can't execute this test sequentially 10 000 times without a crash. -> Could you look at the implementation, and tell me if I misused the APIs somewhere ? https://gist.github.com/mtarral/d99ce5524cfcfb5290eaa05702c3e8e7 I used the compat APIs, like Drakvuf does. @Tamas, if you could check the traps implementation. You also have stress-test.py, which is the small test suite that I used, and the screenshot showing the stdout preceding a test failure, when Ansible couldn't contact WinRM service because the domain was frozen. Note: I stole some code from libvmi, to handle page read/write in Xen. PS: in the case where the domain is frozen, and I destroy the domain, a (null) entry will remain in xl list, despite that my stress-test.py process is already dead. I have 4 of these entries in my xl list right now. Might be worth looking into it also. Best regards, Mathieu _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |