[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?



Addendum:

        The Dells are actually R715.
        The dom0 kernel is actually vmlinuz-2.6.18-194.32.1.el5xen

Cheers,
Pim

On Feb 8, 2011, at 13:22 , Pim van Riezen wrote:

> Good day,
> 
> In a scenario where we saw several dom0 nodes fall down due to a sustained 
> SYN flood to a network range, we have been investigating issues with Xen 
> under high network load. The results so far seem to be not so pretty. We 
> recreated a lab setup that can reproduce the scenario with some reliability, 
> although it takes a bit of trial-and-error to get crashes out of it.
> 
> SETUP:
> 2x Dell R710
>       - 4x 6core AMD Opteron 6174
>       - 128GB memory
>       - Broadcom BCM5709
>       - LSI SAS2008 rev.02
>       - Emulex Saturn-X FC adapter
>       - CentOS 5.5 w/ gitco Xen 4.0.1
> 
> 1x NexSan SATABeast FC raid
> 1x Brocade FC switch
> 5x Flood sources (Dell R210)
> 
> The dom0 machines are loaded with 50 PV images, coupled to a LVM partition on 
> FC, half of which are set to start compiling a kernel in rc.local. There are 
> also 2 HVM images on both machines doing the same.
> 
> Networking for all guests is configured in the bridging setup, attached to a 
> specific vlan that arrives tagged at the Dom0. So vifs end up in xenbr86, née 
> xenbr0.86.
> 
> Grub conf for the dom0s:
> 
>       kernel /xen.gz-4.0.1 dom0_mem=2048M max_cstate=0 cpuidle=off
>       module /vmlinuz-2.6.18-194.11.4.el5xen ro root=LABEL=/ elevator=deadline
> xencons=tty
> 
> The flooding is always done to either the entire IP range the guests live in 
> (in case of SYN floods) or a sub-range of about 50 IPs (in case of UDP 
> floods), with random source addresses.
> 
> ISSUE:
> When the pps rate gets into the insane territory (gigabit link saturated or 
> near-saturated), the machine seems to start losing track of interrupts. 
> Depending on the severity, this leads to CPU soft lockups on random cores. 
> Under more dire circumstances, other hardware attached to the PCI bus starts 
> timing out making the kernel lose track of storage. Usually the 
> SAS-controller is the first to go, but I've also seen timeouts on the FC 
> controller.
> 
> THINGS TRIED:
> 1. Raising the broadcom RX ring from 255 to 3000. No noticable effects.
> 2. Downgrading to Xen 3.4.3. No effect.
> 3. Different Dell BIOS versions. No effect.
> 4. Lowering number of guests -> effects get less serious. Not a serious 
> option.
> 5. Using ipt_LIMIT in the FORWARD table set to 10000/s -> effects get less 
> serious when dealing with tcp SYN attacks. No effect when dealing with 28byte 
> UDP attacks.
> 6. Disabling HPET as per 
> http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html with 
> cpuidle=0 and disabling irqbalance -> effects get less serious.
> 
> The changes in 6 stop the machine from completely crapping itself, but I 
> still get soft lockups, although they seem to be limited to one of these two 
> paths:
> 
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff8027458e>] smp_call_function_many+0x38/0x4c
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff80274688>] smp_call_function+0x4e/0x5e
> [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42
> [<ffffffff8028fdd7>] on_each_cpu+0x10/0x2a
> [<ffffffff802d7428>] kill_bdev+0x1b/0x30
> [<ffffffff802d7a47>] __blkdev_put+0x4f/0x169
> [<ffffffff80213492>] __fput+0xd3/0x1bd
> [<ffffffff802243cb>] filp_close+0x5c/0x64
> [<ffffffff8021e5d0>] sys_close+0x88/0xbd
> [<ffffffff802602f9>] tracesys+0xab/0xb6
> 
> and
> 
> [<ffffffff8026f4f3>] raw_safe_halt+0x84/0xa8
> [<ffffffff8026ca88>] xen_idle+0x38/0x4a
> [<ffffffff8024af6c>] cpu_idle+0x97/0xba
> [<ffffffff8064eb0f>] start_kernel+0x21f/0x224
> [<ffffffff8064e1e5>] _sinittext+0x1e5/0x1eb
> 
> In some scenarios, an application running on the dom0 that relies on 
> pthread_cond_timedwait seems to be hanging in all its thread on that specific 
> call. This may be related to some timing going wonky during the attack, not 
> sure.
> 
> Is there anything more we can try?
> 
> Cheers,
> Pim van Riezen
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.