[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 13/16] xen/arm: Add support for GIC v3
On Tue, 3 Jun 2014, Ian Campbell wrote: > On Tue, 2014-06-03 at 10:05 +0100, Julien Grall wrote: > > > > On 03/06/14 09:54, Ian Campbell wrote: > > > On Mon, 2014-06-02 at 18:33 +0100, Stefano Stabellini wrote: > > >>> +static inline void gicv3_restore_lr(int nr_lrs, const struct vcpu *v) > > >>> +{ > > >>> + /* Fall through for all the cases */ > > >>> + switch ( nr_lrs ) > > >>> + { > > >>> + case 16: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[15], ICH_LR15_EL2); > > >>> + case 15: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[14], ICH_LR14_EL2); > > >>> + case 14: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[13], ICH_LR13_EL2); > > >>> + case 13: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[12], ICH_LR12_EL2); > > >>> + case 12: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[11], ICH_LR11_EL2); > > >>> + case 11: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[10], ICH_LR10_EL2); > > >>> + case 10: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[9], ICH_LR9_EL2); > > >>> + case 9: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[8], ICH_LR8_EL2); > > >>> + case 8: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[7], ICH_LR7_EL2); > > >>> + case 7: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[6], ICH_LR6_EL2); > > >>> + case 6: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[5], ICH_LR5_EL2); > > >>> + case 5: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[4], ICH_LR4_EL2); > > >>> + case 4: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[3], ICH_LR3_EL2); > > >>> + case 3: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[2], ICH_LR2_EL2); > > >>> + case 2: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[1], ICH_LR1_EL2); > > >>> + case 1: > > >>> + WRITE_SYSREG(v->arch.gic.v3.lr[0], ICH_LR0_EL2); > > >>> + break; > > >>> + default: > > >>> + BUG(); > > >>> + } > > >>> +} > > >> > > >> Given that the number of LR registers has grown up quite a bit from > > >> GICv2, we should optimize this code and only save and restore the > > >> registers that are actually in used by checking on lr_mask. > > > > > > I'm not convinced that will be faster, the code above involves a > > > straight line with no branches, but potentially some unnecessary stores. > > > Looping or lr_mask involves a loop, a test_bit and most critically a > > > switch over the lr number (to select the correct sysreg, which is a > > > static register, not an opcode argument), but has no redundant stores. I > > > suspect the straight line version will actually be quicker. > > > > The solution suggested by Stefano doesn't work as it is. We only > > save/restore the LR register marked as used in the lr_mask (which is > > per-VCPU). We may end up with LR from the previous running VCPU because > > Xen will restore only the necessary LRs. > > > > To fix the solution would be to clear the LRs when we save all LRs. But > > I'm not convince it will be faster than the current solution. > > yes, that's a second nail in that idea's coffin I think. I agree that we would need to run a few benchmarks to be sure and I agree that this is OK in first instance. However keep in mind that last time I checked on GICv2 reading or writing to one LR costs about 120-140 nanoseconds. I dropped the idea of using lr_mask on GICv2 because there are usually only very few LRs (4), so it wouldn't make sense there but I bet it makes sense here. 16*130 ns = 2080 ns That would be about 4000 cycles on a 2Ghz processor. We can certainly loop through a bitmask in far less than that. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |