[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [win-pv-devel] xenvbd (8.x) - blkback/tapdisk3 problems



Hello.

On Mon, 31 Oct 2016, Martin Cerveny wrote:

Some update.

On Mon, 31 Oct 2016, Paul Durrant wrote:
Sent: 30 October 2016 12:18
On Fri, 28 Oct 2016, Paul Durrant wrote:
-----Original Message-----
I have problems with xenvbd (8.x). There was NOT problem with older pv-
drivers xenvbd (7.2x).
Questions @ bottom.

I use remote raw disk as source (multipath+iscsi+iser+ib).
Two configs:

--------------------------

1) use direct blkback (format=raw, vdev=hda, access=rw,
target=/dev/mapper/3600144f07a0542580000568ba94a0001)

Performance is good, but __unusable__ for working.

Every few seconds/minutes (randomly, depends on disk load) the
windows
hung on io-operations. I usually saw this more often during write
operations.

Sometimes (1:10) I saw "PdoReset" in "DebugView" (DomU):

00003034        10:12:32        XENVBD|__PdoReset:Target[0] ====>
00003054        10:12:53        XENVBD|__PdoReset:Target[0] <====

There is also restart log in Dom0, but no errors on disks/iscsi:

[ 3919.034421] xen-blkback:backend/vbd/3/768: prepare for reconnect
[ 3919.039869] xen-blkback:ring-ref 32, event-channel 40, protocol 1
(x86_64-
abi)


Yes, XENVBD is being asked to reset because Windows thinks the storage
is stalled and it looks like it was probably right. Suggests a loss of
event notification somewhere.

Ok, I suppose Dom0 logs are result of DomU reset, no problem.

Enviroments:
- Windows7 x64
- tested signed winpv drivers 8.1 and primary on development drivers 8.2
- xen 4.5.3, 4.6 and primary 4.7.0
- kernels "XenServer" - kernel-3.10.41-353.380450 (and others from XS6.5)
and kernel-3.10.96-495.383045.x86_64 (and others from XS7)
- blktap3 - blktap-3.0.0.xs1001-xs6.5.0 and blktap-3.2.0.xs1087-
xs7.0.0.x86_64

---------------------

Questions:

What is buggy in "direct blkback" chain ?
No idea. Possibly blkback, possibly the underlying storage. Your kernels are
old and blkback has undergone many changes in more recent kernels.

(I suppose that XS-kernel is super-tuned for Dom0 and there won't be such
problem).

Now updated Dom0:
- fedora 24 + update
- kernel - 4.7.9-200.fc24.x86_64
- xen 4.7.0 + some backports from XS

The problem is the same.
When were "changes" to blkback applied ?


Clearly a 4.7 kernel is about as up-to-date as you want and I am not aware of any changes in blkback since then so it does tend to suggest the problem is elsewhere. It could well be that XENVBD has some bad interaction with blkback... In my test environments I normally use QEMU qdisk as a backend. Do you see the same issues if you, say, point blkback at a loopback file mount or even an nbd device?

Ok. I will prepare and test and debug more in next weeks.

I add some Trace() and compile with defined "DBG".
I found problematic section and do some workaround.

The restart is done during "trim" handling "timeout" that cause PdoReset().
PdoDispatch() -> PdoDispatchControl() -> PdoManageDataSetAttributes() -> PdoSendTrimSynchronous() -> PdoSendAwaitSrb() == TIMEOUT(>9s)

===================

Example of ok trim log (without timeout between 00000010-00000011):
00000007        12:03:16        XENDISK|PdoDispatchControl:XXX 2987012
00000008        12:03:16        XENDISK|PdoManageDataSetAttributes:XXX 1
00000009        12:03:16        XENDISK|PdoSendTrimSynchronous:TRIM[0] 400 @ 
182c080
00000010        12:03:16        XENDISK|PdoSendAwaitSrb:XXX
00000011        12:03:16        XENDISK|__PdoSendAwaitSrb:XXX
00000012        12:03:16        XENDISK|PdoCompleteIrp:XXX

Example of timeout of trim (with timeout 00000021-00000027 (~37sec)
(timeot reset at 00000022 after +-9sec)):
00000017        12:03:32        XENDISK|PdoDispatchControl:XXX 2987012
00000018        12:03:32        XENDISK|PdoManageDataSetAttributes:XXX 1
00000019        12:03:32        XENDISK|PdoSendTrimSynchronous:TRIM[0] 4db8 @ 
6f1a168
00000020        12:03:32        XENDISK|PdoSendTrimSynchronous:TRIM[1] 2a20 @ 
124c38c0
00000021        12:03:32        XENDISK|PdoSendAwaitSrb:XXX
00000022        12:03:41        XENVBD|__PdoReset:Target[0] ====>
00000023        12:03:41        XENVBD|PdoReset:Target[0] ====> (Irql=2)
00000024        12:03:41        XENVBD|__PdoPauseDataPath:Target[0] : Waiting 
for 2 Submitted requests
00000025        12:04:09        XENVBD|NotifierDpc:Target[0] : Paused, 2 
outstanding
00000026        12:04:09        XENVBD|NotifierDpc:Target[0] : Paused, 1 
outstanding
00000027        12:04:09        XENDISK|__PdoSendAwaitSrb:XXX
00000028        12:04:09        XENDISK|PdoCompleteIrp:XXX
00000029        12:04:09        XENVBD|__PdoPauseDataPath:Target[0] : 0/2 
Submitted requests left (27838 iterrations)
00000030        12:04:09        XENVBD|__FrontendSetState:Target[0] @ (2) =====>
00000031        12:04:09        XENVBD|__FrontendSetState:Target[0] : ENABLED 
----> CLOSING
00000032        12:04:09        XENVBD|__FrontendSetState:Target[0] : in state 
CONNECTED
00000033        12:04:09        XENVBD|__UpdateBackendPath:<< 
/local/domain/0/backend/vbd/2/768
00000034        12:04:09        XENVBD|__UpdateBackendPath:>> 
/local/domain/0/backend/vbd/2/768
00000035        12:04:09        XENVBD|__WaitState:Target[0] : BACKEND_STATE  
-> Connected
00000036        12:04:09        XENVBD|___SetState:Target[0] : FRONTEND_STATE 
-> Closing
00000037        12:04:09        XENVBD|FdoScan:waiting...
00000038        12:04:09        XENVBD|__WaitState:Target[0] : BACKEND_STATE  
-> Closing
00000039        12:04:09        XENVBD|___SetState:Target[0] : FRONTEND_STATE 
-> Closed
00000040        12:04:09        XENVBD|FdoScan:waiting...
00000041        12:04:09        XENVBD|__WaitState:Target[0] : BACKEND_STATE  
-> Closed
00000042        12:04:09        XENVBD|__FrontendSetState:Target[0] : in state 
CLOSING
00000043        12:04:09        XENVBD|__FrontendSetState:Target[0] @ (2) 
<===== (SUCCEEDED)
00000044        12:04:09        XENVBD|__FrontendSetState:Target[0] @ (2) =====>
00000045        12:04:09        XENVBD|__FrontendSetState:Target[0] : CLOSING 
----> CLOSED
00000046        12:04:09        XENVBD|__FrontendSetState:Target[0] : in state 
CLOSED
00000047        12:04:09        XENVBD|__FrontendSetState:Target[0] @ (2) 
<===== (SUCCEEDED)
00000048        12:04:09        XENVBD|__FrontendSetState:Target[0] @ (2) =====>
00000049        12:04:09        XENVBD|__FrontendSetState:Target[0] : CLOSED 
----> ENABLED
00000050        12:04:09        XENVBD|__UpdateBackendPath:<< 
/local/domain/0/backend/vbd/2/768
00000051        12:04:09        XENVBD|__UpdateBackendPath:>> 
/local/domain/0/backend/vbd/2/768
00000052        12:04:09        XENVBD|FrontendWriteUsage:Target[0] : DUMP 
NOT_HIBER PAGE
00000053        12:04:09        XENVBD|___SetState:Target[0] : FRONTEND_STATE 
-> Initialising
00000054        12:04:09        XENVBD|FdoScan:waiting...
00000055        12:04:09        XENVBD|__WaitState:Target[0] : BACKEND_STATE  
-> Closed
00000056        12:04:09        XENVBD|__WaitState:Target[0] : BACKEND_STATE  
-> InitWait
00000057        12:04:09        XENVBD|PdoUpdateInquiryData:Target[0] : 
VDI-UUID = {00000000-0000-0000-0000-000000000000}
00000058        12:04:09        XENVBD|FrontendPrepare:Target[0] : BackendId 0 
(/local/domain/0/backend/vbd/2/768)
00000059        12:04:09        XENVBD|__FrontendSetState:Target[0] : in state 
PREPARED
00000060        12:04:09        XENVBD|___SetState:Target[0] : FRONTEND_STATE 
-> Initialised
00000061        12:04:09        XENVBD|FdoScan:waiting...
00000062        12:04:09        XENVBD|__WaitState:Target[0] : BACKEND_STATE  
-> InitWait
00000063        12:04:09        XENVBD|FdoScan:waiting...
00000064        12:04:09        XENVBD|__WaitState:Target[0] : BACKEND_STATE  
-> Connected
00000065        12:04:09        XENVBD|___SetState:Target[0] : FRONTEND_STATE 
-> Connected
00000066        12:04:09        XENVBD|FdoScan:waiting...
00000067        12:04:09        XENVBD|__FrontendSetState:Target[0] : in state 
CONNECTED
00000068        12:04:09        XENVBD|__FrontendSetState:Target[0] : in state 
ENABLED
00000069        12:04:09        XENVBD|__FrontendSetState:Target[0] @ (2) 
<===== (SUCCEEDED)
00000070        12:04:09        XENVBD|PdoReset:Target[0] <==== (Irql=2)
00000071        12:04:09        XENVBD|__PdoReset:Target[0] <====

=============================

The "disk-tap" does not forward/support "Trim" (with log "DISCARD not supported, 
suppressing") and works ok.
I found workaround - do not forward trim reqests.
I commented out case "DeviceDsmAction_Trim" in "PdoManageDataSetAttributes()" to fall to 
"PdoForwardIrpAndForget()".
(Windows 7 do not ask for "DeviceTrimProperty" to disable it in properties.)

Martin

Thanks, Martin
 Paul
Thanks for answers or hints, Martin Cerveny
 Paul
Thanks for answers, Martin Cerveny

_______________________________________________
win-pv-devel mailing list
win-pv-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel



_______________________________________________
win-pv-devel mailing list
win-pv-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel


_______________________________________________
win-pv-devel mailing list
win-pv-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.