[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [ARM] Native application design and discussion (I hope)



On Wed, 10 May 2017, Volodymyr Babchuk wrote:
> Hi Julien,
> 
> Returning back to Native apps, I think we can make ctx switch even
> faster by dropping p2m code. Imagine that we already created stage 1
> MMU for native application. Then to switch to app it we need only:
> 
> 1. Enable TGE bit in HCR
> 2. Disable VM bit in HCR
> 3. Save/Program EL1_TTBR and friends
> 3.5 (optionally) save/restore FPU state
> 4. Save/Restore general purpose registers + SP + CSR + PC to jump to
> an app in EL0 state.
> 
> This can be done in "real" vcpu or in idle vcpu context. No differences there.
> 
> Exception handling in hypervisor would became tricky because of vcpu
> absence for native app. Current implementation of entry.S always says
> general purpose registers to a vcpu structure. Basically, we should
> teach entry.S and traps.c about native apps.
> Am I missing something?

The nicest way to do this is probably to create another saved_context
in arch_vcpu for EL0 apps. That way, changes to traps.c and entry.S will
be almost nothing.


> 
> On 10 May 2017 at 13:48, Julien Grall <julien.grall@xxxxxxx> wrote:
> > Hi George,
> >
> >
> > On 05/10/2017 11:03 AM, George Dunlap wrote:
> >>
> >> On 10/05/17 11:00, Julien Grall wrote:
> >>>
> >>>
> >>>
> >>> On 05/10/2017 10:56 AM, George Dunlap wrote:
> >>>>
> >>>> On 09/05/17 19:29, Stefano Stabellini wrote:
> >>>>>
> >>>>> On Tue, 9 May 2017, Dario Faggioli wrote:
> >>>>>>>>
> >>>>>>>> And it should not be hard to give such code access to the context
> >>>>>>>> of
> >>>>>>>> the vCPU that was previously running (in x86, given we implement
> >>>>>>>> what
> >>>>>>>> we call lazy context switch, it's most likely still loaded in the
> >>>>>>>> pCPU!).
> >>>>>>>
> >>>>>>>
> >>>>>>> I agree with Stefano, switching to the idle vCPU is a pretty bad
> >>>>>>> idea.
> >>>>>>>
> >>>>>>> the idle vCPU is a fake vCPU on ARM to stick with the common code
> >>>>>>> (we
> >>>>>>> never leave the hypervisor). In the case of the EL0 app, we want to
> >>>>>>> change exception level to run the code with lower privilege.
> >>>>>>>
> >>>>>>> Also IHMO, it should only be used when there are nothing to run and
> >>>>>>> not
> >>>>>>> re-purposed for running EL0 app.
> >>>>>>>
> >>>>>> It's already purposed for running when there is nothing to do _or_
> >>>>>> when
> >>>>>> there are tasklets.
> >>>>>>
> >>>>>> I do see your point about privilege level, though. And I agree with
> >>>>>> George that it looks very similar to when, in the x86 world, we tried
> >>>>>> to put the infra together for switching to Ring3 to run some pieces of
> >>>>>> Xen code.
> >>>>>
> >>>>>
> >>>>> Right, and just to add to it, context switching to the idle vcpu has a
> >>>>> cost, but it doesn't give us any security benefits whatsever. If Xen is
> >>>>> going to spend time on context switching, it is better to do it in a
> >>>>> way that introduces a security boundary.
> >>>>
> >>>>
> >>>> "Context switching" to the idle vcpu doesn't actually save or change any
> >>>> registers, nor does it flush the TLB.  It's more or less just accounting
> >>>> for the scheduler.  So it has a cost (going through the scheduler) but
> >>>> not a very large one.
> >>>
> >>>
> >>> It depends on the architecture. For ARM we don't yet support lazy
> >>> context switch. So effectively, the cost to "context switch" to the idle
> >>> vCPU will be quite high.
> >>
> >>
> >> Oh, right.  Sorry, I thought I had seen code implementing lazy context
> >> switch in ARM, but I must have imagined it.  That is indeed a material
> >> consideration.
> >>
> >> Is there a particular reason that lazy context switch is difficult on
> >> ARM?  If not it should be a fairly important bit of low-hanging fruit
> >> from a performance perspective.
> >
> >
> > I am not entirely sure what you are doing on x86. Let me explain what we do
> > and why context switch is heavy on ARM.
> >
> > In the case of ARM, when entering to the hypervisor, we only save the bare
> > minimum (all non-banked registers + registers useful for handling guest
> > request),  and left the rest untouched.
> >
> > Our save/restore functions are quite big because it involving saving/restore
> > state of the interrupt controller, FPU... So we have a fast exit/entry but
> > slow context switch.
> >
> > What we currently do is avoiding save/restore the idle vCPU because we
> > always stay in the hypervisor exception level. However we still restore all
> > the registers of the previous running vCPU and restore the one of the next
> > running vCPU.
> >
> > This has a big impact on the workload when running vCPU and waiting for
> > interrupts (hence the patch from Stefano to limit entering in the hypervisor
> > though it is not by default).
> >
> > I made the assumption the idle vCPU is only running when nothing has to be
> > done. But as you mentioned tasklet can be done there too. So running tasklet
> > on Xen ARM will have an high cost.
> >
> > A list of optimization we could do on ARM is:
> >         - Avoiding restore if the vCPU stay the same before and after idle
> > vPCU
> >         - Avoiding save/restore if vCPU is dedicated to a pCPU
> >
> > Do you have any other optimization on x86?
> >
> > Cheers,
> >
> > --
> > Julien Grall
> 
> 
> 
> -- 
> WBR Volodymyr Babchuk aka lorc [+380976646013]
> mailto: vlad.babchuk@xxxxxxxxx
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.