[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Improving the network-stack performance over Xen
Hi Dimos, Thanks for looking into this! Thinking about it, I think we have several problems. 1. I think the Activations.wait API is difficult to use / unsafe:
(* Block waiting for an event to occur on a particular port *) let wait evtchn = if Eventchn.is_valid evtchn then begin let port = Eventchn.to_int evtchn in
let th, u = Lwt.task () in let node = Lwt_sequence.add_l u event_cb.(port) in Lwt.on_cancel th (fun _ -> Lwt_sequence.remove node); th
end else Lwt.fail Generation.Invalid When you call Activations.wait you are added to a 'sequence' (like a list) of people to wake up when the next event occurs. A typical driver would call Activations.wait in a loop, block for an event, wake up, signal some other thread to do work and then block again. However if the thread running the loop blocks anywhere else, then the thread will not be added to the sequence straight away and any notifications that arrive during the gap will be dropped. I noticed this when debugging my block backend implementation. I think netif has this problem:
let listen nf fn = (* Listen for the activation to poll the interface *) let rec poll_t t = lwt () = refill_requests t in ^^^ blocks here, can miss events
rx_poll t fn; tx_poll t; (* Evtchn.notify nf.t.evtchn; *) lwt new_t = try_lwt Activations.wait t.evtchn >> return t
with | Generation.Invalid -> Console.log_s "Waiting for plug in listen" >> wait_for_plug nf >> Console.log_s "Done..." >>
return nf.t in poll_t new_t in poll_t nf.t I think we should change the semantics of Activations.wait to be more level-triggered rather than edge-triggered (i.e. more like the underlying behaviour of xen) like this:
type event (** a particular event *) val wait: Evtchn.t -> event option -> event Lwt.t (** [wait evtchn None] returns [Some e] where [e] is the latest event.
[wait evtchn (Some e)] returns [Some e'] where [e'] is a later event than [e] *) In the implementation we could have "type event = int" and maintain a counter of "number of times this event has been signalled". When you call Activations.wait, you would pass in the number of the last event you saw, and the thread would block until a new event is available. This way you wouldn't have to be registered in the table when the event arrives.
2. SCHEDOP_poll has a low (arbitrary) nr_ports limit 704 static long do_poll(struct sched_poll *sched_poll) 705 { 706 struct vcpu *v = current; 707 struct domain *d = v->domain; 708 evtchn_port_t port;
709 long rc; 710 unsigned int i; 711 712 /* Fairly arbitrary limit. */ 713 if ( sched_poll->nr_ports > 128 ) 714 return -EINVAL;
The total number of available event channels for a 64-bit guest is 4096 using the current ABI (a new interface is under development which allows even more). The limit of 128 is probably imposed to limit the amount of time the hypercall takes, to avoid hitting scalability limits like you do in userspace with select().
One of the use-cases I'd like to use Mirage for is to run backend services (like xenstore or blkback) for all the domains on a host. This requires at least one event channel per client domain. We routinely run ~300 VMs/host, so the 128 limit is too small. Plus a quick grep around Linux shows that it doesn't use SCHEDOP_poll very much-- I think we should focus on using the hypercalls that other OSes are using, for maximum chance of success.
So I think we should switch from select()-like behaviour using SCHEDOP_poll to interrupt-based delivery using SCHEDOP_block. I note that upstream mini-os does this by default too. I'll take a look at this.
Cheers, Dave On Fri, Sep 13, 2013 at 11:50 PM, Dimosthenis Pediaditakis <dimosthenis.pediaditakis@xxxxxxxxxxxx> wrote:
Dave Scott
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |