[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] test report for Xen 4.3 RC1



On Thu, Jun 20, 2013 at 02:53:06AM +0000, Ren, Yongjie wrote:
> > -----Original Message-----
> > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx]
> > Sent: Monday, June 17, 2013 10:23 PM
> > To: Ren, Yongjie
> > Cc: george.dunlap@xxxxxxxxxxxxx; Xu, YongweiX; Liu, SongtaoX; Tian,
> > Yongxue; xen-devel@xxxxxxxxxxxxx
> > Subject: Re: [Xen-devel] test report for Xen 4.3 RC1
> > 
> > On Sun, Jun 16, 2013 at 04:10:22AM +0000, Ren, Yongjie wrote:
> > > > -----Original Message-----
> > > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx]
> > > > Sent: Wednesday, June 05, 2013 10:50 PM
> > > > To: Ren, Yongjie
> > > > Cc: george.dunlap@xxxxxxxxxxxxx; Xu, YongweiX; Liu, SongtaoX; Tian,
> > > > Yongxue; xen-devel@xxxxxxxxxxxxx
> > > > Subject: Re: [Xen-devel] test report for Xen 4.3 RC1
> > > >
> > > > > >
> > > >
> > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1851
> > > > > > > >
> > > > > > > > That looks like you are hitting the udev race.
> > > > > > > >
> > > > > > > > Could you verify that these patches:
> > > > > > > > https://lkml.org/lkml/2013/5/13/520
> > > > > > > >
> > > > > > > > fix the issue (They are destined for v3.11)
> > > > > > > >
> > > > > > > Not tried yet. I'll update it to you later.
> > > > > >
> > > > > > Thanks!
> > > > > > >
> > > > > We tested kernel 3.9.3 with the 2 patches you mentioned, and found
> > this
> > > > > bug still exist. For example, we did CPU online-offline for Dom0 for
> > 100
> > > > times,
> > > > > and found 2 times (of 100 times) failed.
> > > >
> > > > Hm, does it fail b/c udev can't online the sysfs entry?
> > > >
> > > I think no.
> > > When it fails to online CPU #3 (trying online #1~#3), it doesn't show any
> > info
> > > about CPU #3 via the output of "devadm monitor --env" CMD. It does
> > show
> > > info about #1 and #2 which are onlined succefully.
> > 
> > And if you re-trigger the the 'xl vcpu-set' it eventually comes back up 
> > right?
> > 
> We don't use 'xl vcpu-set' command when doing the CPU hot-plug.
> We just call the xc_cpu_online/offline() in tools/libxc/xc_cpu_hotplug.c to 
> test.

Oh. That is very different than what I thought. You are not offlining/onlining
vCPUS - you offlining/onlining pCPUS! So Xen has to cramp the dom0 vCPUs in the
remaining vCPUS.

There should be no vCPU re-sizing correct?

> (see the attachment about my test code in that bugzilla.)
> And, yes, if a CPU failed to online, it can also be onlined again when we 
> re-trigger
> online function.
> 
> > >
> > > > .. snip..
> > > > > > >
> > > > > > > > >
> > > > > > > > > Old bugs: (11)
> > > > > > > > > 1. [ACPI] Dom0 can't resume from S3 sleep
> > > > > > > > >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1707
> > > > > > > >
> > > > > > > > That should be fixed in v3.11 (as now we have the fixes)
> > > > > > > > Could you try v3.10 with the Rafael's ACPI tree merged in?
> > > > > > > > (so the patches that he wants to submit for v3.11)
> > > > > > > >
> > > > > > > I re-tested with Rafel's linux-pm.git tree (master and 
> > > > > > > acpi-hotplug
> > > > > > branch),
> > > > > > > and found Dom0 S3 sleep/resume can't work, either.
> > > > > >
> > > > > > The patches he has to submit for v3.11 are in the linux-next branch.
> > > > > > You need to use that branch.
> > > > > >
> > > > > Dom0 S3 sleep/resume doesn't work with linux-next branch, either.
> > > > > attached the log.
> > > >
> > > > It does work on my box. So I am not sure if this is related to the
> > > > IvyTown box you are using. Does it work on other machines?
> > > >
> > > No, it doesn't work on other machines, either. I also tried on
> > SandyBridge,
> > > IvyBridge desktop and Haswell mobile machines.
> > 
> > I just double checked on my AMD machines with v3.10-rc5 with
> > these extra patches:
> > 
> > ebe2886 x86/cpa: Use pte_attrs instead of pte_flags on
> > CPA/set_p.._wb/wc operations.
> > 7c4ae96 Revert "xen/pat: Disable PAT support for now."
> > 729c6ec Revert "xen/pat: Disable PAT using pat_enabled value."
> > bd4fd16 microcode_xen: Add support for AMD family >= 15h
> > 6271c21 x86/microcode: check proper return code.
> > b9a48c8 xen: add CPU microcode update driver
> > c62566c cpu: make sure that cpu/online file created before KOBJ_ADD is
> > emitted
> > 0790542 cpu: fix "crash_notes" and "crash_notes_size" leaks in
> > register_cpu()
> > f90099b xen / ACPI / sleep: Register an acpi_suspend_lowlevel callback.
> > 29ca6e9 x86 / ACPI / sleep: Provide registration for
> > acpi_suspend_lowlevel.
> > 
> > and it worked. Let me recompile a kernel without most of them to
> > doublecheck
> > whether those patches are making the ACPI S3 suspend/resume working.
> > This is with Xen 4.3 (82cb411). The machine is M5A97, BIOS 1208
> > 04/18/2012
> > with 01:00.0 VGA compatible controller: NVIDIA Corporation G84 [GeForce
> > 8600 GT] (rev a1)
> > as its graphic card.
> > 
> After re-testing with linux-pm.git tree (kernel:3.10.rc6+ commit: a913b188df) 
> on
> my IvyTown-EP and IvyBridge desktop systems, Dom0 S3 sleep/resume can work!
> When these codes are upstreamed to linux.git tree, I can close this bug.

Yes! Thought Ben found another issue with extended sleep - where it will
not use the hypercall. <sigh>
> 
> > >
> > > > >
> > > > > > >
> > > > > > > > > 2. [XL]"xl vcpu-set" causes dom0 crash or panic
> > > > > > > > >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1730
> > > > > > > >
> > > > > > > > That I think is fixed in v3.10. Could you please check 
> > > > > > > > v3.10-rc3?
> > > > > > > >
> > > > > > > Still exists on v3.10-rc3.
> > > > > > > The following command lines can reproduce it:
> > > > > > > # xl vcpu-set 0 1
> > > > > > > # xl vcpu-set 0 20
> > > > > >
> > > > > > Ugh, same exact stack trace? And can you attach the full dmesg or
> > > > serial
> > > > > > output (so that Ican see what there is at bootup)
> > > > > >
> > > > > Yes, the same. Also attached in this mail.
> > > >
> > > > One of the fixes is this one:
> > > > http://www.gossamer-threads.com/lists/xen/devel/284897
> > > >
> > > > but the other ones I had not seen. I am wondering if the
> > > > update_sd_lb_stats is b/c of the previous conditions (that is the
> > > > tick_nohz_idle_start hadn't been called).
> > > >
> > > > It is a shoot in the dark - but if you use the above mentioned patch
> > > > do you still see the update_sd_lb_stats crash?
> > > >
> > > Yes, with the patch we still see the update_sd_lb_stats crash.
> > > It has almost the same trace log as before. Log file is attached.
> > 
> > Would it be possible to do a bit of 'git bisect' to figure out why
> > this started?
> >
> It's hard.
> This issue exists for a long time. We don't even know which version of 
> linux upstream as dom0 can work for this bug.

Then a bit of digging will be needed. Sadly I am out of time to do this
ATM.

> 
> > > > > > > > > 4. 'xl vcpu-set' can't decrease the vCPU number of a HVM
> > guest
> > > > > > > > >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1822
> > > > > > > >
> > > > > > > > That I believe was an QEMU bug:
> > > > > > > >
> > > > http://lists.xen.org/archives/html/xen-devel/2013-05/msg01054.html
> > > > > > > >
> > > > > > > > which should be in QEMU traditional now (05-21 was when it
> > went
> > > > > > > > in the tree)
> > > > > > > >
> > > > > > > In this year or past year, this bug always exists (at least in our
> > > > testing).
> > > > > > > 'xl vcpu-set' can't decrease the vCPU number of a HVM guest
> > > > > >
> > > > > > Could you retry with Xen 4.3 please?
> > > > > >
> > > > > With Xen 4.3 & Linux:3.10.0-rc3, I can't decrease the vCPU number of
> > a
> > > > guest.
> > > >
> > > sorry, when I said this message, I still use rhel6.4 kernel as the guest.
> > > After upgrading guest kernel to 3.10.0-rc3, the result became better.
> > > Basically vCPU increment/decrement can work fine. I'll close that bug.
> > 
> > Excellent!
> > > But there's still a minor issue as following.
> > > After booting guest with 'vcpus=4' and 'maxvcpus=32', change its vCPU
> > number.
> > > # xl vcpu-set $domID 32
> > > then you can only get less than 32 (e.g. 19) CPUs in the guest; again, you
> > set
> > > vCPU number to 32 (from 19), then it works to get 32vCPU for the guest.
> > > but 'xl vcpu-set $domID 8' can work fine as we expected.
> > > vCPU decrement has the same result.
> > > Can you also have a try to reproduce my issue?
> > 
> This issue doesn't exist when using the latest QEMU traditional tree.
> My pervious QEMU was old (March 2013), and I found some of your patches 
> were applied in May 2013. These fixes can fix the issue we reported. 
> Close this bug.

Yes!
> 
> But, it introduced another issue: when doing 'xl vcpu-set' for HVM several
> times (e.g. 5 times), the guest will panic. Log is attached.
> Before your patches in qemu traditional tree in May 2013, we never meet
> guest kernel panic. 
> dom0: 3.10.0-rc3
> Xen: 4.3.0-RCx
> QEMU: the latest traditional tree
> guest kernel: 3.10.0-RC3
> I'll file another bug to track this bug ?

Please.
> Can you reproduce this ?

Could you tell me how you are doing 'xl vcpu-set'? Is there a particular
test script you are using?

> 
> > Sure. Now how many PCPUS do you have? And what version of QEMU
> > traditional
> > were you using?
> > 
> There're 32 pCPU in that system we used.
> 
> Best Regards,
>      Yongjie (Jay)



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.