[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use of newer locking primitives...



On 30/08/2022 13:40, Martin Harvey wrote:


-----Original Message-----
From: Paul Durrant <xadimgnik@xxxxxxxxx>

I wouldn't necessarily be convinced that moving away from the atomic lock/queue 
would be any faster.

Humm. Maybe depends on H/W cache implementation. If you're doing interlocked 
ops on the lock, and non-interlocked ops on a queue, and they are on separate 
cache lines... maybe.

Limiting the queue drain may help from a fairness PoV, which I guess could be 
achieved by actually blocking on the ring lock if the queue gets to a certain 
length.

- We do a XenBus GntTabRevoke for every single damn fragment individually on 
the Tx fast path. There are probably some ways of batching / deferring these so 
as to make everything run a bit more smoothly.

A revoke ought to be fairly cheap; it's essentially some bit flips. It does, of 
course, involve some memory barriering... but is that really slowing things 
down?

Well, it does do a SCHEDOP_Yield, where I think perhaps 
KeStallExecutionProcessor might be better for a shorter (microsecond) wait. If 
the yield is a hypercall, then I think assorted context switching etc is wasted 
work, and poss longer wait granularity.

Surely KeStallExecutionProcessor() is likely to HLT while waiting, so why is that going to be better than a yield? Also the yield is only done if the interlocked op fails, which it shouldn't do in the normal case.


It would be *really nice* to get vTune / uProf working on guests with all the 
uArchitecture counters working so we could actually find out exactly how many 
cycles get spent where.

We see 99% of TxPollDpc's completing rapidly (microseconds), and then very 
occasionally we get a 4ms PollDpc. This is timing via ETW. If I get time to 
repro it (cpl of weeks), then I might try changing the stall mechanisms, and 
possibly the locking, separately.


Right, that could be because something has just dumped (or is still dumping) a load of packets on the queue. Whoever has the lock has to drain the entire queue... that's the lack of fairness I was referring to. If there's a need to let other DPCs run then perhaps that requirement could be relaxed... but we'd have to be careful that packets don't get stuck in the queue.

  Paul

MH.




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.