[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough
On Thu, Nov 18, 2010 at 4:20 PM, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> wrote: >> We did suspect it, since our old setting was HZ=1000 and we assigned >> more than 10 VCPUs to domU. But we don't see the performance difference >> with HZ=100. > > FWIW, it didn't appear that the problems were proportional to HZ. > Seemed more that somehow the pvclock became incorrect and spent > a lot of time rereading the pvclock value. We decided to enable lock stat in the kernel to track down all those lock activities in the profile report. The first thing I noticed was kmemleak was at the top of the list (/proc/lock_stat) so we disabled kmemleak. This boosted our I/O performance to 119k IOPS (from 31k). One of our developers (Bruce Edge) suggested killing ntpd so I did. This resulted in another significant bump in I/O performance to 209k IOPS. The question now is why ntpd? Is it the source of all or most of those pvclock_clocksource_read in the profile report? > >> -----Original Message----- >> From: Lin, Ray [mailto:Ray.Lin@xxxxxxx] >> Sent: Thursday, November 18, 2010 2:40 PM >> To: Dan Magenheimer; Dante Cinco; Konrad Wilk >> Cc: Jeremy Fitzhardinge; Xen-devel; mathieu.desnoyers@xxxxxxxxxx; >> Andrew Thomas; keir.fraser@xxxxxxxxxxxxx; Chris Mason >> Subject: RE: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 >> pvops domU kernel with PCI passthrough >> >> >> >> -----Original Message----- >> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel- >> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Dan Magenheimer >> Sent: Thursday, November 18, 2010 1:21 PM >> To: Dante Cinco; Konrad Wilk >> Cc: Jeremy Fitzhardinge; Xen-devel; mathieu.desnoyers@xxxxxxxxxx; >> Andrew Thomas; keir.fraser@xxxxxxxxxxxxx; Chris Mason >> Subject: RE: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 >> pvops domU kernel with PCI passthrough >> >> In case it is related: >> http://lists.xensource.com/archives/html/xen-devel/2010- >> 07/msg01247.html >> >> Although I never went further on this investigation, it appeared to me >> that pvclock_clocksource_read was getting called at least an order-of- >> magnitude more frequently than expected in some circumstances for some >> kernels. And IIRC it was scaled by the number of vcpus. >> >> We did suspect it, since our old setting was HZ=1000 and we assigned >> more than 10 VCPUs to domU. But we don't see the performance difference >> with HZ=100. >> >> > -----Original Message----- >> > From: Dante Cinco [mailto:dantecinco@xxxxxxxxx] >> > Sent: Thursday, November 18, 2010 12:36 PM >> > To: Konrad Rzeszutek Wilk >> > Cc: Jeremy Fitzhardinge; Xen-devel; mathieu.desnoyers@xxxxxxxxxx; >> > Andrew Thomas; keir.fraser@xxxxxxxxxxxxx; Chris Mason >> > Subject: Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 >> > pvops domU kernel with PCI passthrough >> > >> > I mentioned earlier in an previous post to this thread that I'm able >> > to apply Dulloor's xenoprofile patch to the dom0 kernel but not the >> > domU kernel. So I can't do active-domain profiling but I'm able to do >> > passive-domain profiling but I don't know how reliable the results >> are >> > since it shows pvclock_clocksource_read as the top consumer of CPU >> > cycles at 28%. >> > >> > CPU: Intel Architectural Perfmon, speed 2665.98 MHz (estimated) >> > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a >> > unit mask of 0x00 (No unit mask) count 100000 >> > samples % image name app name >> > symbol name >> > 918089 27.9310 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel pvclock_clocksource_read >> > 217811 6.6265 domain1-modules domain1-modules >> > /domain1-modules >> > 188327 5.7295 vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco- >> debug >> > vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug >> > mutex_spin_on_owner >> > 186684 5.6795 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel __xen_spin_lock >> > 149514 4.5487 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel __write_lock_failed >> > 123278 3.7505 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel __kernel_text_address >> > 122906 3.7392 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel xen_spin_unlock >> > 90903 2.7655 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel __spin_time_accum >> > 85880 2.6127 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel __module_address >> > 75223 2.2885 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel print_context_stack >> > 66778 2.0316 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel __module_text_address >> > 57389 1.7459 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel is_module_text_address >> > 47282 1.4385 xen-syms-4.1-unstable domain1-xen >> > syscall_enter >> > 47219 1.4365 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel prio_tree_insert >> > 46495 1.4145 vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco- >> debug >> > vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug >> > pvclock_clocksource_read >> > 44501 1.3539 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel prio_tree_left >> > 32482 0.9882 >> > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug >> > domain1-kernel native_read_tsc >> > >> > I ran oprofile (0.9.5 with xenoprofile patch) for 20 seconds while >> the >> > I/Os were running. Here's the command I used: >> > >> > opcontrol --start --xen=/boot/xen-syms-4.1-unstable >> > --vmlinux=/boot/vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug >> > --passive-domains=1 >> > --passive-images=/boot/vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu- >> > 5.11.dcinco-debug >> > >> > I had to remove dom0_max_vcpus=1 (but kept dom0_vcpus_pin=true) in >> the >> > Xen command line. Otherwise, oprofile only gives the samples from >> > CPU0. >> > >> > I'm going to try perf next. >> > >> > - Dante >> > >> > _______________________________________________ >> > Xen-devel mailing list >> > Xen-devel@xxxxxxxxxxxxxxxxxxx >> > http://lists.xensource.com/xen-devel >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |