[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regression with commit "x86/pv: Drop int80_bounce from struct pv_vcpu" f75b1a5247b3b311d3aa50de4c0e5f2d68085cb1



On 13/03/2018 23:28, Sander Eikelenboom wrote:
> On 13/03/18 23:01, Andrew Cooper wrote:
>> On 10/03/18 16:14, Sander Eikelenboom wrote:
>>> Hi Andrew,
>>>
>>> It seems commit "x86/pv: Drop int80_bounce from struct pv_vcpu" 
>>> (f75b1a5247b3b311d3aa50de4c0e5f2d68085cb1) causes an issue on my machine, 
>>> an AMD phenom X6.
>>>
>>> When trying to installing a new kernel package which runs the Debian
>>> update-initramfs tools with xen-unstable which happened to be at commit 
>>> c9bd8a73656d7435b1055ee8825823aee995993e as last commit the tool stalls
>>> and i get this kernel splat:
>>>
>>> [  284.910674] BUG: unable to handle kernel NULL pointer dereference at 
>>> 0000000000000000
>>> [  284.919696] IP:           (null)
>>> [  284.928315] PGD 0 P4D 0 
>>> [  284.943343] Oops: 0010 [#1] SMP NOPTI
>>> [  284.957008] Modules linked in:
>>> [  284.965521] CPU: 5 PID: 24729 Comm: ld-linux.so.2 Not tainted 
>>> 4.16.0-rc4-20180305-linus-pvhpatches-doflr+ #1
>>> [  284.974154] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>>> V1.8B1 09/13/2010
>>> [  284.983198] RIP: e030:          (null)
>>> [  284.992006] RSP: e02b:ffffc90001497ed8 EFLAGS: 00010286
>>> [  285.000612] RAX: 0000000000000000 RBX: ffff880074c64500 RCX: 
>>> ffffffff82f8d1c0
>>> [  285.009122] RDX: ffffffff82f8d1c0 RSI: 0000000020020002 RDI: 
>>> ffffffff82f8d1c0
>>> [  285.017598] RBP: ffff880074c64b7c R08: 0000000000000000 R09: 
>>> 0000000000000000
>>> [  285.025999] R10: 0000000000000000 R11: 0000000000000000 R12: 
>>> ffffffff82f8d1c0
>>> [  285.034400] R13: 0000000000000000 R14: 0000000000000000 R15: 
>>> ffff880074c64b50
>>> [  285.042718] FS:  00007f919fe2eb40(0000) GS:ffff88007d140000(0000) 
>>> knlGS:0000000000000000
>>> [  285.051001] CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
>>> [  285.059458] CR2: 0000000000000000 CR3: 0000000002824000 CR4: 
>>> 0000000000000660
>>> [  285.067813] Call Trace:
>>> [  285.075947]  ? task_work_run+0x85/0xa0
>>> [  285.084025]  ? exit_to_usermode_loop+0x72/0x80
>>> [  285.091980]  ? do_int80_syscall_32+0xfe/0x120
>>> [  285.099896]  ? entry_INT80_compat+0x7f/0x90
>>> [  285.107688]  ? fpu__drop+0x23/0x40
>>> [  285.115362] Code:  Bad RIP value.
>>> [  285.123072] RIP:           (null) RSP: ffffc90001497ed8
>>> [  285.130714] CR2: 0000000000000000
>>> [  285.138219] ---[ end trace 4d3317497f4ba022 ]---
>>> [  285.145671] Fixing recursive fault but reboot is needed!
>>>
>>> After updating xen-unstable to the latest available commit 
>>> 185413355fe331cbc926d48568838227234c9a20,
>>> the tool doesn't stall anymore but i still get a kernel splat:
>>>
>>> [  198.594638] ------------[ cut here ]------------
>>> [  198.594641] Invalid address limit on user-mode return
>>> [  198.594651] WARNING: CPU: 1 PID: 75 at ./include/linux/syscalls.h:236 
>>> do_int80_syscall_32+0xe5/0x120
>>> [  198.594652] Modules linked in:
>>> [  198.594655] CPU: 1 PID: 75 Comm: kworker/1:1 Not tainted 
>>> 4.16.0-rc4-20180305-linus-pvhpatches-doflr+ #1
>>> [  198.594656] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>>> V1.8B1 09/13/2010
>>> [  198.594658] Workqueue: events free_work
>>> [  198.594660] RIP: e030:do_int80_syscall_32+0xe5/0x120
>>> [  198.594661] RSP: e02b:ffffc90000b8ff40 EFLAGS: 00010086
>>> [  198.594662] RAX: 0000000000000029 RBX: ffffc90000b8ff58 RCX: 
>>> ffffffff82868e38
>>> [  198.594663] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 
>>> 0000000000000001
>>> [  198.594664] RBP: ffff880078623980 R08: 0000000000000dfa R09: 
>>> 000000000000063b
>>> [  198.594664] R10: 0000000000000000 R11: 000000000000063b R12: 
>>> 0000000000000000
>>> [  198.594665] R13: 0000000000000000 R14: 0000000000000000 R15: 
>>> 0000000000000000
>>> [  198.594672] FS:  00007fa252372b40(0000) GS:ffff88007d040000(0000) 
>>> knlGS:0000000000000000
>>> [  198.594673] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  198.594674] CR2: 00000000f7f303e4 CR3: 0000000002824000 CR4: 
>>> 0000000000000660
>>> [  198.594676] Call Trace:
>>> [  198.594683]  entry_INT80_compat+0x7f/0x90
>>> [  198.594685]  ? vunmap_page_range+0x2a0/0x340
>>> [  198.594686] Code: 03 7f 48 8b 75 00 f7 c6 0e 38 00 00 75 2e 83 65 08 f9 
>>> 5b 5d c3 e8 0c fb ff ff e9 53 ff ff ff 48 c7 c7 58 35 57 82 e8 ab 3e 0c 00 
>>> <0f> 0b bf 09 00 00 00 48 89 ee e8 8c 00 0d 00 eb b8 48 89 df e8 
>>> [  198.594706] ---[ end trace 90bcd2147bc825ef ]---
>>>
>>> After reverting commit f75b1a5247b3b311d3aa50de4c0e5f2d68085cb1 the issue 
>>> is gone.
>> Can you try this patch?
> Hi Andrew,
>
> Testing with: ldd -v /lib/x86_64-linux-gnu/libc.so.6
> seems to indicate the patch works !
> Hopefully it also does, for all the others :)

I'll do a proper fix tomorrow.

This bug only manifests when we service an int80, and then the next
action the vcpu undergoes (including event delivery, which has a side
effect of squashing the bug) is an exception which gets fixed up via
emulation.  Under these circumstances, we deliver the int80 a second
time on the way out of Xen, instead of continuing normally after the
instruction which caused the exception.  Debugging this was
substantially confused by the fact that something in Linux (haven't
worked out exactly what, but it is before userspace starts) really does
issue an int80 from kernel context.

Also, by a shear coincidence, this bug is resolved by a safety check I
decided to pro-actively add to patch 4 of the series (not yet
committed), which is why my end result testing against all PV guests the
XenServer test system knows about, didn't encounter any problems.  As
for my unit tests, they never went on to to have an emulated pagetable
write following the int80, which is why they never tickled the bug.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.