Xen project Mailing List

Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?

To: Pim van Riezen <pi+lists@xxxxxxxxxxxx>

From: Pasi Kärkkäinen <pasik@xxxxxx>

Date: Tue, 8 Feb 2011 17:53:59 +0200

Delivery-date: Tue, 08 Feb 2011 07:54:40 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Tue, Feb 08, 2011 at 01:39:06PM +0100, Pim van Riezen wrote: > Addendum: > > The Dells are actually R715. > The dom0 kernel is actually vmlinuz-2.6.18-194.32.1.el5xen > Have you gived dom0 fixed amount of memory, and also increase dom0 vcpu weights so that dom0 will always get enough cpu time to take care of things? http://wiki.xensource.com/xenwiki/XenBestPractices -- Pasi > Cheers, > Pim > > On Feb 8, 2011, at 13:22 , Pim van Riezen wrote: > > > Good day, > > > > In a scenario where we saw several dom0 nodes fall down due to a sustained > > SYN flood to a network range, we have been investigating issues with Xen > > under high network load. The results so far seem to be not so pretty. We > > recreated a lab setup that can reproduce the scenario with some > > reliability, although it takes a bit of trial-and-error to get crashes out > > of it. > > > > SETUP: > > 2x Dell R710 > > - 4x 6core AMD Opteron 6174 > > - 128GB memory > > - Broadcom BCM5709 > > - LSI SAS2008 rev.02 > > - Emulex Saturn-X FC adapter > > - CentOS 5.5 w/ gitco Xen 4.0.1 > > > > 1x NexSan SATABeast FC raid > > 1x Brocade FC switch > > 5x Flood sources (Dell R210) > > > > The dom0 machines are loaded with 50 PV images, coupled to a LVM partition > > on FC, half of which are set to start compiling a kernel in rc.local. There > > are also 2 HVM images on both machines doing the same. > > > > Networking for all guests is configured in the bridging setup, attached to > > a specific vlan that arrives tagged at the Dom0. So vifs end up in xenbr86, > > née xenbr0.86. > > > > Grub conf for the dom0s: > > > > kernel /xen.gz-4.0.1 dom0_mem=2048M max_cstate=0 cpuidle=off > > module /vmlinuz-2.6.18-194.11.4.el5xen ro root=LABEL=/ elevator=deadline > > xencons=tty > > > > The flooding is always done to either the entire IP range the guests live > > in (in case of SYN floods) or a sub-range of about 50 IPs (in case of UDP > > floods), with random source addresses. > > > > ISSUE: > > When the pps rate gets into the insane territory (gigabit link saturated or > > near-saturated), the machine seems to start losing track of interrupts. > > Depending on the severity, this leads to CPU soft lockups on random cores. > > Under more dire circumstances, other hardware attached to the PCI bus > > starts timing out making the kernel lose track of storage. Usually the > > SAS-controller is the first to go, but I've also seen timeouts on the FC > > controller. > > > > THINGS TRIED: > > 1. Raising the broadcom RX ring from 255 to 3000. No noticable effects. > > 2. Downgrading to Xen 3.4.3. No effect. > > 3. Different Dell BIOS versions. No effect. > > 4. Lowering number of guests -> effects get less serious. Not a serious > > option. > > 5. Using ipt_LIMIT in the FORWARD table set to 10000/s -> effects get less > > serious when dealing with tcp SYN attacks. No effect when dealing with > > 28byte UDP attacks. > > 6. Disabling HPET as per > > http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html > > with cpuidle=0 and disabling irqbalance -> effects get less serious. > > > > The changes in 6 stop the machine from completely crapping itself, but I > > still get soft lockups, although they seem to be limited to one of these > > two paths: > > > > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > > [<ffffffff8027458e>] smp_call_function_many+0x38/0x4c > > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > > [<ffffffff80274688>] smp_call_function+0x4e/0x5e > > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > > [<ffffffff8028fdd7>] on_each_cpu+0x10/0x2a > > [<ffffffff802d7428>] kill_bdev+0x1b/0x30 > > [<ffffffff802d7a47>] __blkdev_put+0x4f/0x169 > > [<ffffffff80213492>] __fput+0xd3/0x1bd > > [<ffffffff802243cb>] filp_close+0x5c/0x64 > > [<ffffffff8021e5d0>] sys_close+0x88/0xbd > > [<ffffffff802602f9>] tracesys+0xab/0xb6 > > > > and > > > > [<ffffffff8026f4f3>] raw_safe_halt+0x84/0xa8 > > [<ffffffff8026ca88>] xen_idle+0x38/0x4a > > [<ffffffff8024af6c>] cpu_idle+0x97/0xba > > [<ffffffff8064eb0f>] start_kernel+0x21f/0x224 > > [<ffffffff8064e1e5>] _sinittext+0x1e5/0x1eb > > > > In some scenarios, an application running on the dom0 that relies on > > pthread_cond_timedwait seems to be hanging in all its thread on that > > specific call. This may be related to some timing going wonky during the > > attack, not sure. > > > > Is there anything more we can try? > > > > Cheers, > > Pim van Riezen > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.