[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Livepatching and Xen Security



> > 1. Having tested live-patching thoroughly for at least some version of
> > the codebase
> >
> > 2. Having tested live-patching for one of the Xen 4.9 RCs.
> >
> > Thoughts?
> 
> As a statement of what XenServer is doing:

As a statement of what Oracle is doing.

We have been using livepatching for a year or so.

It is a bit older (thanks to Xend dependency, <grumble>) so not as
fresh as Xen 4.9.

We had quite a few of livepatches, including some that are not
XSAs, with success. We did run in some issues:

 - We compiled Xen with '--maxcpus=384' and the livepatch tools didn't include
   that, which made all the GCC compiled local variable names different. Once we
   got the --maxcpus=XYZ to match it all was good. But that took a while
   to figure out.

 - We had an interesting issue where the .fixup and .ex_table were not
   properly updated (Ross fixed that in the tool upstream and it was backported
   in livepatch-tools). That was in Xen 4.8 timeframe.
   
https://github.com/rosslagerwall/livepatch-build-tools/commit/ae7ae0c31866f6ee2715a601fd5067d700d6084a
  

 - Replacing the livepatch with another hit a snag if both livepatches
   had the same symbol name (x86_emulate.c#_get_fpu). We came up with
   a skanky tool (symbol_rename) that just renames symbols which we use
   to always rename symbols (symbol_rename 83e0707.livepatch 
"x86_emulate.c#_get_fpu" get_fpu_83e0707)

Those are livepatch-tools issues, and not the hypervisor code (albeit
the last one could be fixed in the hypervisor by having code to deal
with global and local symbol and ignoring collision with local symbols).

With the hypervisor code, we have not had any issues - it has been
running smoothly with various types of guests.

Also I've been running (almost every night) an test of the livepatches
that are part of the Xen source.

> 
> Independent of this, the nature of what qualifies as "a correct patch"
> is subjective and very context dependent.  Consider a scenario with two
> users, the same version of the livepatch tools, an identical source
> patch, and an identical source version of Xen.  There is a very real
> possibility that these two users could get one valid and one invalid
> patch based solely on something like the compiler settings used to build
> the hypervisor they are patching.

Yes. It is imperative that the livepatch be built on the same exact
compiler as what the hypervisor was built with. Fortunatly the config.h
file exposes all of that so it is easy enough to verify that.
..snip..

> Therefore, I think it would be a mistake for us to include anything
> pertaining to "creating a livepatch, correct or otherwise" within a
> support statement.  There are many variables which we as upstream can't
> control.

It may be good to include an FAQ or such describing some of these
issues (aka an knowledge base) that detail our findings and how
we worked around them.

I will update the Wiki with it regardless of this discussion.
> 
> As for the 4th point, about what a guest can do to prevent application
> of a livepatch.
> 
> The default timeout is insufficient to quiesce Xen if a VM with a few
> VCPUs is migrating.  In this scenario, I believe p2m_lock contention is
> the underlying reason, but the point stands that there are plenty of
> things a guest can do to prevent Xen being able to suitably quiesce.
> 
> As a host administrator attempting to apply the livepatch, you get
> informed that Xen failed to quiesce and the livepatch application
> failed.  Options range from upping the timeout on the next patching
> attempt, to possibly even manually pausing the troublesome VM for a second.

Pausing a VM is aceeptable I would think.
> 
> I also think it unwise to consider any scenarios like this within the
> security statement, otherwise we will have to issue an XSA stating
> "Guests doing normal unprivileged things can cause Xen to be
> insufficient quiescent to apply livepatches with the deliberately
> conservative defaults".  What remediation would we suggest for this?

xl pause :-)

> 
> 
> On the points of unexpected access to the hypercalls, and Xen doing the
> wrong thing when presented with a legitimate correct livepatch, I think
> these are in principle fine for inclusion within a support statement.

Yes.
> 
> I would ask however how confident we are that there are no ELF parsing
> bugs in the code?  I think it might be very prudent to try and build a
> userspace harness for it and let ALF have a go.

Jan did an excellent job when reviewing the code. But there is of course
the possibility that something slipped our mind.

I would (in my opinion as livepatch maintainer) not to gate the support
part on this as well, I have no clue how to setup ALF and this may
take quite a while to get done.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.