[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2] xSplice design



On 10/30/2015 10:39 AM, Martin Pohlack wrote:
On 29.10.2015 17:55, Ross Lagerwall wrote:
On 10/27/2015 12:05 PM, Ross Lagerwall wrote:
On 06/12/2015 12:39 PM, Martin Pohlack wrote:
On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote:
[...]
## Hypercalls

We will employ the sub operations of the system management hypercall
(sysctl).
There are to be four sub-operations:

   * upload the payloads.
   * listing of payloads summary uploaded and their state.
   * getting an particular payload summary and its state.
   * command to apply, delete, or revert the payload.

The patching is asynchronous therefore the caller is responsible
to verify that it has been applied properly by retrieving the summary
of it
and verifying that there are no error codes associated with the payload.

We **MUST** make it asynchronous due to the nature of patching: it
requires
every physical CPU to be lock-step with each other. The patching
mechanism
while an implementation detail, is not an short operation and as such
the design **MUST** assume it will be an long-running operation.

I am not convinced yet, that you need an asynchronous approach here.

The experience from our prototype suggests that hotpatching itself is
not an expensive operation.  It can usually be completed well below 1ms
with the most expensive part being getting the hypervisor to a quiet
state.


FWIW, my current implementation (which is almost certainly not optimal)
tested on a 72 CPU machine takes about 3ms, whether idle or fully loaded.


Let me correct that: it takes 60 Îs to 100 Îs to synchronize and apply
the patch (on the same hardware) when synchronous console logging is
turned off.

The interesting (and very rare) case is if other CPUs are busy in Xen
already, for example, with memory scrubbing or other long-running
activities.  Those are hard to interrupt and delay patching activity.

Having multiple guests in a reboot-loop / being restarted all the time
might help triggering this case.


I have been able to trigger this which is why I put in a (currently hard-coded) 10ms timeout in the synchronization code otherwise it gives up and returns an error. It could then be optionally retried by the user at a later point.

--
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.