[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Notes on stubdoms and latency on ARM



On Mon, 19 Jun 2017, George Dunlap wrote:
> On 17/06/17 01:14, Volodymyr Babchuk wrote:
> > Hello George,
> > 
> > On 31 May 2017 at 20:02, George Dunlap <george.dunlap@xxxxxxxxxx> wrote:
> >>>> There is no way out: if the stubdom needs events, then we'll have to
> >>>> expose and context switch the vGIC. If it doesn't, then we can skip the
> >>>> vGIC. However, we would have a similar problem with EL0 apps: I am
> >>>> assuming that EL0 apps don't need to handle interrupts, but if they do,
> >>>> then they might need something like a vGIC.
> >>> Hm. Correct me, but if we want make stubdom to handle some requests
> >>> (e.g. emulate MMIO access), then it needs events, and thus it needs
> >>> interrupts. At least, I'm not aware about any other mechanism, that
> >>> allows hypervisor to signal to a domain.
> >>> On other hand, EL0 app (as I see them) does not need such events.
> >>> Basically, you just call function `handle_mmio()` right in the app.
> >>> So, apps can live without interrupts and they still be able to handle
> >>> request.
> >>
> >> So remember that "interrupt" and "event" are basically the same as
> >> "structured callback".  When anything happens that Xen wants to tell the
> >> EL0 app about, it has to have a way of telling it.  If the EL0 app is
> >> handling a device, it has to have some way of getting interrupts from
> >> that device; if it needs to emulate devices sent to the guest, it needs
> >> some way to tell Xen to deliver an interrupt to the guest.
> > Basically yes. There should be mechanism to request something from
> > native application. Question is how this mechanism can be implemented.
> > Classical approach is a even-driven loop:
> > 
> > while(1) {
> >     wait_for_event();
> >     handle_event_event();
> >     return_back_results();
> > }
> > 
> > wait_for_event() can by anything from WFI instruction to read() on
> > socket. This is how stubdoms are working. I agree with you: there are
> > no sense to repeat this in native apps.
> > 
> >> Now, we could make the EL0 app interface "interruptless".  Xen could
> >> write information about pending events in a shared memory region, and
> >> the EL0 app could check that before calling some sort of block()
> >> hypercall, and check it again when it returns from the block() call.
> > 
> >> But the shared event information starts to look an awful lot like events
> >> and/or pending bits on an interrupt controller -- the only difference
> >> being that you aren't interrupted if you're already running.
> > 
> > Actually there are third way, which I have used. I described it in
> > original email (check out [1]).
> > Basically, native application is dead until it is needed by
> > hypervisor. When hypervisor wants some services from app, it setups
> > parameters, switches mode to EL0 and jumps at app entry point.
> 
> What's the difference between "jumps to an app entry point" and "jumps
> to an interrupt handling routine"?  And what's the difference between
> "Tells Xen about the location of the app entry point" and "tells Xen
> about the location of the interrupt handling routine"?
> 
> If you want this "EL0 app" thing to be able to provide extra security
> over just running the code inside of Xen, then the code must not be able
> to DoS the host by spinning forever instead of returning.

I think that the "extra security" was mostly Julien's and my goal.
Volodymyr would be OK with having the code in Xen, if I recall correctly
from past conversations.

In any case, wouldn't the usual Xen timer interrupt prevent this scenario
from happening?


> What happens if two different pcpus in Xen decide they want to activate
> some "app" functionality?

It should work fine as long as the app code is written to be able to
cope with it (spin_locks, etc).


> >> I'm pretty sure you could run in this mode using the existing interfaces
> >> if you didn't want the hassle of dealing with asynchrony.  If that's the
> >> case, then why bother inventing an entirely new interface, with its own
> >> bugs and duplication of functionality?  Why not just use what we already
> >> have?
> > Because we are concerned about latency. In my benchmark, my native app
> > PoC is 1.6 times faster than stubdom.
> 
> But given the conversation so far, it seems likely that that is mainly
> due to the fact that context switching on ARM has not been optimized.

True. However, Volodymyr took the time to demonstrate the performance of
EL0 apps vs. stubdoms with a PoC, which is much more than most Xen
contributors do. Nodoby provided numbers for a faster ARM context switch
yet. I don't know on whom should fall the burden of proving that a
lighter context switch can match the EL0 app numbers. I am not sure it
would be fair to ask Volodymyr to do it.


> Just to be clear -- I'm not adamantly opposed to a new interface similar
> to what you're describing above.  But I would be opposed to introducing
> a new interface that doesn't achieve the stated goals (more secure, &c),
> or a new interface that is the same as the old one but rewritten a bit.
> 
> The point of having this design discussion up front is to prevent a
> situation where you spend months coding up something which is ultimately
> rejected.  There are a lot of things that are hard to predict until
> there's actually code to review, but at the moment the "jumps to an
> interrupt handling routine" approach looks unpromising.

Did you mean "jumps to a app entry point" or "jumps to an interrupt
handling routine"?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.