[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] support for more than 32 VCPUs when migrating PVHVM guest



On Mon, Feb 02, 2015 at 12:03:28PM +0100, Vitaly Kuznetsov wrote:
> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> writes:
> 
> > On 02/02/15 10:47, Vitaly Kuznetsov wrote:
> >> Hi Konrad,
> >>
> >> I just hit an issue with PVHVM guests after save/restore (or migration),
> >> if a PVHVM guest has > 32 VCPUs it hangs. Turns out, you saw it almost a
> >> year ago and even wrote patches to call VCPUOP_register_vcpu_info after
> >> resume. Unfortunately these patches never made it to xen/kernel. Do you
> >> have a plan to pick this up? What were the arguments against your
> >> suggestion?
> >
> > 32 VCPUs is the legacy limit for HVM guests, but should not have any
> > remaining artefacts these days.
> >
> > Do you know why the hang occurs?  I can't spot anything in the legacy
> > migration code which would enforce such a limit.
> >
> > What is the subject of the thread you reference so I can search for it?
> >
> 
> Sorry, I should have send the link:
> 
> http://lists.xen.org/archives/html/xen-devel/2014-04/msg00794.html
> 
> Konrad's patches:
> 
> http://lists.xen.org/archives/html/xen-devel/2014-04/msg01199.html
> 
> The issue is that we don't call VCPUOP_register_vcpu_info after
> suspend/resume (or migration) and it is mandatory.

The issues I saw were that with the enablement of that everything
(which is what Jan requested) seems to work - except that I , ah here it is:

http://lists.xen.org/archives/html/xen-devel/2014-04/msg02875.html
err:

http://lists.xen.org/archives/html/xen-devel/2014-04/msg02945.html

        > The VCPUOP_send_nmi did cause the HVM to get an NMI and it spitted out
        > 'Dazed and confused'. It also noticed corruption:
        > 
        > [    3.611742] Corrupted low memory at c000fffc (fffc phys) = 00029b00
        > [    2.386785] Corrupted low memory at ffff88000000fff8 (fff8 phys) = 
        > 2990000000000
        > 
        > Which is odd because there does not seem to be anything in the path
        > of hypervisor that would cause this.

        Indeed. This looks a little like a segment descriptor got modified here
        with a descriptor table base of zero and a selector of 0xfff8. That
        corruption needs to be hunted down in any case before enabling
        VCPUOP_send_nmi for HVM.


I did not get a chance to "hunt down" that pesky issue. That is the only
thing holding this patchset.

Said patch is in my queue of patches to upstream (amongts 30 other ones) -
and I am working through the review/issues - but it will take me quite some
time - so if you feel like taking a stab at this - please do!

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.