[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xenconsoled CPU denial of service problem



On Wed, Aug 30, 2006 at 07:13:30PM +0100, Keir Fraser wrote:
> On 29/8/06 4:59 pm, "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote:
> 
> > The patch sets
> >   - data size 5 kb
> >   - period 200 ms
> >   - delay 200 ms
> 
> A few comments:
>  * I think the 'delay' parameter is not really useful. Think of this simply
> as a simple credit-based scheduler that replenishes credit every 'period'.
> So in this case you get 5kB every 200ms. If the domU offers more data in a
> period, it must wait until its credit is replenished at the end of the
> current 200ms period.
>  * I'm not sure bytes of data is the right thing to limit here. The main
> thing that hoses domain0 is the repeated rescheduling of the console daemon,
> and that is fundamentally caused by event-channel notifications. So it might
> make sense to rate limit the number of times we read the event-channel port
> from xc_evtchn_pending -- e.g., no more than 10 times a second (should be
> plenty). This has a few advantages: 1. Looking just at data transferred
> doesn't stop a malicious domain from hosing you with no-op event-channel
> notifications; 2. This is a fundamental metric that can be measured and
> rate-limited on all backend interfaces, so perhaps we can come up with some
> common library of code that we apply to all backends/daemons.

I've re-worked the patch based on this principle of "n events allowed
in each time-slice", setting n=30 & the time-slice = 200ms. The code
was actually much simpler than my previous patch so its definitely a
winning strategy.  Testing by running

 'while /bin/true ; do echo t > /proc/sysrq-trigger; done'

..in one of the guest VMs on a  2.2 GHz Opteron, shows no significant CPU
utilization attributed to xenconsoled. I've not examined whether this code
can be put into a common library - it was simple enough to integrate in
the xenconsoled event loop

> It may turn out we need to rate limit on data *as well*, if it turns out
> that sinking many kilobytes of data a second is prohibitively expensive, but
> I doubt this will happen. For a start, the strict limiting of notifications
> will encourage data to get queued up and improve batching of console data,
> and batches of data should be processed quite efficiently. This same
> argument extends to other backends (e.g., batching of requests in xenstored,
> netback, blkback, etc).

Based on initial testing it doesn't look like the data rate itself was causing
any significant overhead, once the event channel port reads were limited.


 Signed-off by: Daniel P. Berrange <berrange@xxxxxxxxxx>

Regards,
Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 

Attachment: xen-console-ratelimit-4.patch
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.