Assuming your source is still mostly based on an earlier version of
mine, have a look at the current hg tree - save/restore is all working
there again as of a week or so ago.
I get what you have done for save/restore in current hg tree. My design
have some differences.
How are you communicating with the backend driver during restore? In my
first version of xenvbd I relied on the fact that some of the init calls
to the scsiport driver were called at PASSIVE_LEVEL and so it was safe
to make calls to the xenbus routine. Once things are up and running
though, all scsiport code runs at DIRQL (hardware IRQ level) and you
can't call any xenbus code from there as it involves
KeWaitForSingleObject etc.
To work around that I make the pci driver put xenvbd into a mode where
it doesn't process anything (set a flag and fire an irq to xenvbd then
wait for an acknowledgement), and the pci driver itself does all the
xenbus setup for xenvbd, and then enables xenvbd again via the same
mechanism. It works well and the scsiport driver can act more like a
physical hardware device driver - it doesn't need to know anything about
xenbus etc.
Windows is a bit of a pain to work with sometimes - it has better
documentation but its limits are absolutely set in stone!
In my implementation, I share two kernel event between pci and vbd
driver. One is suspend event and the other is resume event.
To suspend:
Pci driver and vbd driver are both register watcher for
'control/shutdown'. When they get 'suspend', pci driver will initialize
suspend event and use this event to wait for all child device change
state to suspend. Meanwhile, when vbd/vif driver get 'suspend', it will
set a flat to fail all scsi command and interrupt. (I know this
approach needs to improve since StartIo/Interrupt routine IRQL are
higher then watcher routine.) After that, vbd driver will set frontend
state to closing->closed->initializing.Then it will set suspend
event to notify pci driver that this child device has prepared to
suspend. Then vbd 'control/shutdown' watcher will wait for resume
event to clean flag to let driver can process interrupt and scsi
command. <---- this area woks fine, blk backend state can change
state properly.
When pci driver realize that all child device suspend, it will terminal
some thread, shutdown xenbus, disable/clean event channel, then call
hypercall to let system suspend. <---this area also works fine.
System can suspend succeed.
To resume:
When system resume, pci driver will reinitialize xenbus, enable event
channel,.... Then set resume event to notify all child device to change
state to resume. When vbd driver get resume event, it will reinitialize
vbd device. Allocate memory, get event channel, get grant entry,
initialize share ring. Then it will try to set frontend state to
connected to notify backend that frontend are prepared to work. If
succeed, it will clean flag to let vbd driver can deal with interrupt
and scsi command. Now, I get a issue when set frontend state to
connected. Blk backend will change state to closing and then closed
instead of connect.
Another info, after resume, vbd driver can read/write/watch xenstore
key properly. It also can get event channel, get grant entry and
initialize share ring and write proper info to xenstore. The key
problem is that I don't know why blk backend change state to
closing/closed. I check xenstore and find that ring-ref and event
channel in DomU looks like OK. Where can I get some log/info to figure
out why?
And any guys who family with blk backend drivers, please give me some
guide/info about which case will cause blk backend change state to
InitializingWait to closing/closed. Does any info I write to xenstore I
need to check?
Thanks,
Wayne
|