Xen project Mailing List

Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?

From: Pim van Riezen <pi+lists@xxxxxxxxxxxx>

Date: Tue, 8 Feb 2011 13:39:06 +0100

Delivery-date: Tue, 08 Feb 2011 04:40:01 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Addendum: The Dells are actually R715. The dom0 kernel is actually vmlinuz-2.6.18-194.32.1.el5xen Cheers, Pim On Feb 8, 2011, at 13:22 , Pim van Riezen wrote: > Good day, > > In a scenario where we saw several dom0 nodes fall down due to a sustained > SYN flood to a network range, we have been investigating issues with Xen > under high network load. The results so far seem to be not so pretty. We > recreated a lab setup that can reproduce the scenario with some reliability, > although it takes a bit of trial-and-error to get crashes out of it. > > SETUP: > 2x Dell R710 > - 4x 6core AMD Opteron 6174 > - 128GB memory > - Broadcom BCM5709 > - LSI SAS2008 rev.02 > - Emulex Saturn-X FC adapter > - CentOS 5.5 w/ gitco Xen 4.0.1 > > 1x NexSan SATABeast FC raid > 1x Brocade FC switch > 5x Flood sources (Dell R210) > > The dom0 machines are loaded with 50 PV images, coupled to a LVM partition on > FC, half of which are set to start compiling a kernel in rc.local. There are > also 2 HVM images on both machines doing the same. > > Networking for all guests is configured in the bridging setup, attached to a > specific vlan that arrives tagged at the Dom0. So vifs end up in xenbr86, née > xenbr0.86. > > Grub conf for the dom0s: > > kernel /xen.gz-4.0.1 dom0_mem=2048M max_cstate=0 cpuidle=off > module /vmlinuz-2.6.18-194.11.4.el5xen ro root=LABEL=/ elevator=deadline > xencons=tty > > The flooding is always done to either the entire IP range the guests live in > (in case of SYN floods) or a sub-range of about 50 IPs (in case of UDP > floods), with random source addresses. > > ISSUE: > When the pps rate gets into the insane territory (gigabit link saturated or > near-saturated), the machine seems to start losing track of interrupts. > Depending on the severity, this leads to CPU soft lockups on random cores. > Under more dire circumstances, other hardware attached to the PCI bus starts > timing out making the kernel lose track of storage. Usually the > SAS-controller is the first to go, but I've also seen timeouts on the FC > controller. > > THINGS TRIED: > 1. Raising the broadcom RX ring from 255 to 3000. No noticable effects. > 2. Downgrading to Xen 3.4.3. No effect. > 3. Different Dell BIOS versions. No effect. > 4. Lowering number of guests -> effects get less serious. Not a serious > option. > 5. Using ipt_LIMIT in the FORWARD table set to 10000/s -> effects get less > serious when dealing with tcp SYN attacks. No effect when dealing with 28byte > UDP attacks. > 6. Disabling HPET as per > http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html with > cpuidle=0 and disabling irqbalance -> effects get less serious. > > The changes in 6 stop the machine from completely crapping itself, but I > still get soft lockups, although they seem to be limited to one of these two > paths: > > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > [<ffffffff8027458e>] smp_call_function_many+0x38/0x4c > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > [<ffffffff80274688>] smp_call_function+0x4e/0x5e > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > [<ffffffff8028fdd7>] on_each_cpu+0x10/0x2a > [<ffffffff802d7428>] kill_bdev+0x1b/0x30 > [<ffffffff802d7a47>] __blkdev_put+0x4f/0x169 > [<ffffffff80213492>] __fput+0xd3/0x1bd > [<ffffffff802243cb>] filp_close+0x5c/0x64 > [<ffffffff8021e5d0>] sys_close+0x88/0xbd > [<ffffffff802602f9>] tracesys+0xab/0xb6 > > and > > [<ffffffff8026f4f3>] raw_safe_halt+0x84/0xa8 > [<ffffffff8026ca88>] xen_idle+0x38/0x4a > [<ffffffff8024af6c>] cpu_idle+0x97/0xba > [<ffffffff8064eb0f>] start_kernel+0x21f/0x224 > [<ffffffff8064e1e5>] _sinittext+0x1e5/0x1eb > > In some scenarios, an application running on the dom0 that relies on > pthread_cond_timedwait seems to be hanging in all its thread on that specific > call. This may be related to some timing going wonky during the attack, not > sure. > > Is there anything more we can try? > > Cheers, > Pim van Riezen > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.