[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] A race condition introduced by changeset 15175: Re-init hypercall stubs page after HVM save/restore


  • To: 'Keir Fraser' <keir.fraser@xxxxxxxxxxxxx>, "'xen-devel@xxxxxxxxxxxxxxxxxxx'" <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Cui, Dexuan" <dexuan.cui@xxxxxxxxx>
  • Date: Tue, 7 Oct 2008 18:08:11 +0800
  • Accept-language: zh-CN, en-US
  • Acceptlanguage: zh-CN, en-US
  • Cc:
  • Delivery-date: Tue, 07 Oct 2008 03:08:37 -0700
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AckoZJqd7Bw334uKTW2HgFcuLGFhCg==
  • Thread-topic: A race condition introduced by changeset 15175: Re-init hypercall stubs page after HVM save/restore

For an SMP Linux HVM guest with PV drivers inserted, when we do save/restore 
(or LiveMigration) for the guest, it might panic after it's restored.
The panic point is inside ap_suspend():
 ....
    while (info->do_spin) {
        cpu_relax();
        read_lock(&suspend_lock);
        HYPERVISOR_yield();      ----> guest might panic on the invocation of 
this function.
        read_unlock(&suspend_lock);
    }
...

The root cause is: ap might be invoking the hypercall while bsp is asking the 
hypervisor to re-initialize the hypercall page when the guest has been just 
restored!

What's the purpose of re-initializing the hypercall page here? To improve the 
compatibility in the case the src/target hosts have different hypercall stub 
codes?

PS, I'm using c/s 18353 to debug the issue. (the latest xen-unstable.hg's S/R 
and L/M are broken by 18383.

Thanks,
-- Dexuan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.