|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 28/45] xen: arm: arm64 trap handling.
On Thu, 2013-02-14 at 14:39 +0000, Ian Campbell wrote:
> > > +hyp_sync:
> > > + entry hyp=1
> > > + msr daifclr, #2
> > > + adr lr, return_to_hypervisor
> > > + mov x0, sp
> > > + b do_trap_hypervisor
> >
> > This pattern (call fnX with arg0 == sp and lr == fnY) is repeated
> > quite a few times. Could we have another tidying macro for it?
>
> I'm half considering doing away with the preload lr+b and just using bl
> instead and putting the tail stuff in a macro like the entry stuff.
>
> But if we do stick with this way then sure.
I didn't go the exit macro route (still undecided about that) but I did
decide to drop the lr+b stuff. I have a feeling that the branch
predictor will do better with the simple bl version, which ISTR reading
incorporates a hint to the predictor that this is a subroutine call (but
I can't find that reference right now. TBH even that smells a bit too
much of premature optimisation given there's no actual hardware yet, but
the bl version is the straightforward/obvious thing to use so lets go
with that.
I did experiment with a macro:
@@ -96,6 +96,14 @@ lr .req x30 // link register
.endm
/*
+ * Call a function, passing sp as the first and only argument
+ */
+ .macro call_with_sp, fn
+ mov x0, sp
+ bl \fn
+ .endm
+
+/*
* Bad Abort numbers
*-----------------
*/
@@ -130,15 +138,14 @@ hyp_error_invalid:
hyp_sync:
entry hyp=1
msr daifclr, #2
- adr lr, return_to_hypervisor
- mov x0, sp
- b do_trap_hypervisor
+ call_with_sp do_trap_hypervisor
+ b return_to_hypervisor
hyp_irq:
But TBH I think that looks worse than the open coded:
hyp_irq:
entry hyp=1
- adr lr, return_to_hypervisor
mov x0, sp
- b do_trap_irq
+ bl do_trap_irq
+ b return_to_hypervisor
>
> >
> > > +ENTRY(return_to_new_vcpu)
> > > +ENTRY(return_to_guest)
> > > +ENTRY(return_to_hypervisor)
> > > + ldp x21, x22, [sp, #UREGS_PC] // load ELR, SPSR
> > > +
> > > + pop x0, x1
> > > + pop x2, x3
> > > + pop x4, x5
> > > + pop x6, x7
> > > + pop x8, x9
> > > +
> > > + /* XXX handle return to guest tasks, soft irqs etc */
> > > +
> > > + msr elr_el2, x21 // set up the return data
> > > + msr spsr_el2, x22
> >
> > Done here becasue it's roughly half-way between the load and the
> > overwrite below? Should we be using x28/x29 for this to give ourselves
> > even more pipeline space?
x21/22 are used as scratch registers in this way in the rest of the file
too and I'd rather be consistent about that. As it stands there are 5
instructions either side of this usage, I think that will do, at least
in the absence of any useful ability to measure things...
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |