[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: DomU lockups after resume from S3 on Core i5 processors



On 07/06/10 00:07, Joanna Rutkowska wrote:
> On 07/05/10 23:28, Joanna Rutkowska wrote:
>> On 07/05/10 12:38, Joanna Rutkowska wrote:
>>> I'm experiencing very reproducible DomU lockups that occur after I
>>> resume the system from an S3 sleep. Strangely this seem to happen only
>>> on my Core i5 systems (tested on two different machines), but not on
>>> older Core 2 Duo systems.
>>>
>>> Usually this causes the apps (e.g. Firefox) running in DomUs to become
>>> unresponsive, but sometimes I see that some very limited functionality
>>> of the app is still available (e.g. I can open/close Tabs in Firefox,
>>> but cannot do much anything more). Also, when I log in to the DomU via
>>> xm console, I usually can see the login prompt, can enter the username,
>>> but then the console hangs.
>>>
>>> I tried to attach to such a hanged DomU using gdbserver-xen, but when I
>>> subsequently try to attach to the server from gdb (via the target
>>> 127.0.0.1:9999 command), my gdb segfaults (how funny!).
>>>
>>> I'm running Xen 3.4.3, and fairly recent pvops0 kernel in DomU. In Dom0
>>> I run 2.6.34-xenlinux kernel (opensuse patches), but I doubt it is
>>> relevant in any way.
>>>
>>> This seems like a scheduling problem, and, because it seems to affect
>>> Core i5 processors, but not Core 2 Duos, it might have something to do
>>> with Hyperthreading perhaps?
>>>
>> Ok, finally got the gdbsever working. This is the backtrace I get when
>> attaching to a lockedup DomU after resume:
>>
>> #0  0xffffffff810093aa in ?? ()
>> #1  0xffffffff8168be18 in ?? ()
>> #2  0xffff880003a21600 in ?? ()
>> #3  0xffffffff8100ee63 in HYPERVISOR_sched_op ()
>>     at
>> /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/arch/x86/include/asm/xen/hypercall.h:292
>> #4  xen_safe_halt () at arch/x86/xen/irq.c:104
>> #5  0xffffffff8100c33e in raw_safe_halt () at
>> /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/arch/x86/include/asm/paravirt.h:110
>> #6  xen_idle () at arch/x86/xen/setup.c:193
>> #7  0xffffffff81011cdd in cpu_idle () at arch/x86/kernel/process_64.c:143
>> #8  0xffffffff8144b997 in rest_init () at init/main.c:445
>> #9  0xffffffff81824ddc in start_kernel () at init/main.c:695
>> #10 0xffffffff818242c1 in x86_64_start_reservations
>> (real_mode_data=<value optimized out>) at arch/x86/kernel/head64.c:123
>> #11 0xffffffff81828160 in xen_start_kernel () at
>> arch/x86/xen/enlighten.c:1300
>> #12 0xffffffff838f3000 in ?? ()
>> #13 0xffffffff838f4000 in ?? ()
>> #14 0xffffffff838f5000 in ?? ()
>>
>> Any ideas?
>>
> ... and when I disabled Hyperthreading in BIOS, the problem seems to
> gone. Obviously this is not a desired solution...
> 

I've added a simple hook to pm-util, so that it does xm pause for all
the running DomUs just before suspend, and later, just after resume it
does xm unpause for all paused DomUs. The problem seems to be gone now,
after a dozen or more suspend/resumes.

The actual pm-utils script can be seen here:

https://qubes-os.org/gitweb/?p=joanna/core.git;a=blob;f=dom0/pm-utils/02qubes-pause-vms;h=5da1be84a86c2e3a95548e52e4672e988d6779a8;hb=c8ef500588452d39b4b41e9f38066c22c6b832ad

It uses Qubes-specific qvm-run command, but I guess it would be easy to
implement the same functionality in the xm command, e.g.:

xm pause all_running

and

xm pause all_paused

joanna.

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.