[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Use of newer locking primitives...
On 30/08/2022 13:40, Martin Harvey wrote: -----Original Message----- From: Paul Durrant <xadimgnik@xxxxxxxxx>I wouldn't necessarily be convinced that moving away from the atomic lock/queue would be any faster.Humm. Maybe depends on H/W cache implementation. If you're doing interlocked ops on the lock, and non-interlocked ops on a queue, and they are on separate cache lines... maybe.Limiting the queue drain may help from a fairness PoV, which I guess could be achieved by actually blocking on the ring lock if the queue gets to a certain length.- We do a XenBus GntTabRevoke for every single damn fragment individually on the Tx fast path. There are probably some ways of batching / deferring these so as to make everything run a bit more smoothly.A revoke ought to be fairly cheap; it's essentially some bit flips. It does, of course, involve some memory barriering... but is that really slowing things down?Well, it does do a SCHEDOP_Yield, where I think perhaps KeStallExecutionProcessor might be better for a shorter (microsecond) wait. If the yield is a hypercall, then I think assorted context switching etc is wasted work, and poss longer wait granularity. Surely KeStallExecutionProcessor() is likely to HLT while waiting, so why is that going to be better than a yield? Also the yield is only done if the interlocked op fails, which it shouldn't do in the normal case. It would be *really nice* to get vTune / uProf working on guests with all the uArchitecture counters working so we could actually find out exactly how many cycles get spent where. We see 99% of TxPollDpc's completing rapidly (microseconds), and then very occasionally we get a 4ms PollDpc. This is timing via ETW. If I get time to repro it (cpl of weeks), then I might try changing the stall mechanisms, and possibly the locking, separately. Right, that could be because something has just dumped (or is still dumping) a load of packets on the queue. Whoever has the lock has to drain the entire queue... that's the lack of fairness I was referring to. If there's a need to let other DPCs run then perhaps that requirement could be relaxed... but we'd have to be careful that packets don't get stuck in the queue. Paul MH.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |