[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 145393: regressions - FAIL



On Sun, Jan 19, 2020 at 02:36:32AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > Sent: Tuesday, December 31, 2019 11:30 PM
> > 
> > On Mon, Dec 30, 2019 at 08:19:23PM +0000, osstest service owner wrote:
> > > flight 145393 xen-unstable real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/145393/
> > >
> > > Regressions :-(
> > >
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >  test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail
> > REGR. vs. 145025
> > 
> > While da9290639eb5d6ac did fix the vmlaunch error, now the L1 guest
> > seems to loose interrupts:
> > 
> > [  412.127078] NETDEV WATCHDOG: eth0 (e1000): transmit queue 0 timed
> > out
> > [  412.151837] ------------[ cut here ]------------
> > [  412.164281] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:320
> > dev_watchdog+0x252/0x260
> > [  412.185821] Modules linked in: xen_gntalloc ext4 mbcache jbd2 e1000
> > sym53c8xx
> > [  412.204399] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.150+ #1
> > [  412.223988] Hardware name: Xen HVM domU, BIOS 4.14-unstable
> > 12/30/2019
> > [  412.241657] task: ffffffff82213480 task.stack: ffffffff82200000
> > [  412.256979] RIP: e030:dev_watchdog+0x252/0x260
> > [  412.268444] RSP: e02b:ffff88801fc03e90 EFLAGS: 00010286
> > [  412.281727] RAX: 0000000000000039 RBX: 0000000000000000 RCX:
> > 0000000000000000
> > [  412.300097] RDX: ffff88801fc1de70 RSI: ffff88801fc16298 RDI:
> > ffff88801fc16298
> > [  412.318283] RBP: ffff888006c6e41c R08: 000000000001f066 R09:
> > 000000000000023b
> > [  412.336540] R10: ffff88801fc1a3f0 R11: ffffffff8287d96d R12:
> > ffff888006c6e000
> > [  412.354643] R13: 0000000000000000 R14: ffff888006e3ac80 R15:
> > 0000000000000001
> > [  412.373034] FS:  00007fa05293ecc0(0000) GS:ffff88801fc00000(0000)
> > knlGS:0000000000000000
> > [  412.393367] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  412.408112] CR2: 00007fd80ff16000 CR3: 000000000ce78000 CR4:
> > 0000000000040660
> > [  412.426338] Call Trace:
> > [  412.432747]  <IRQ>
> > [  412.438102]  ? dev_deactivate_queue.constprop.33+0x50/0x50
> > [  412.451896]  call_timer_fn+0x2b/0x130
> > [  412.464208]  run_timer_softirq+0x3d8/0x4b0
> > [  412.474598]  ? handle_irq_event_percpu+0x3c/0x50
> > [  412.486426]  __do_softirq+0x116/0x2ce
> > [  412.495883]  irq_exit+0xcd/0xe0
> > [  412.503999]  xen_evtchn_do_upcall+0x27/0x40
> > [  412.514626]  xen_do_hypervisor_callback+0x29/0x40
> > [  412.526684]  </IRQ>
> > [  412.532252] RIP: e030:xen_hypercall_sched_op+0xa/0x20
> > [  412.545034] RSP: e02b:ffffffff82203ea0 EFLAGS: 00000246
> > [  412.558347] RAX: 0000000000000000 RBX: ffffffff82213480 RCX:
> > ffffffff810013aa
> > [  412.576390] RDX: ffffffff822483e8 RSI: deadbeefdeadf00d RDI:
> > deadbeefdeadf00d
> > [  412.594580] RBP: 0000000000000000 R08: ffffffffffffffff R09:
> > 0000000000000000
> > [  412.612831] R10: ffffffff82203e30 R11: 0000000000000246 R12:
> > ffffffff82213480
> > [  412.630980] R13: 0000000000000000 R14: ffffffff82213480 R15:
> > ffffffff82238e80
> > [  412.649138]  ? xen_hypercall_sched_op+0xa/0x20
> > [  412.660671]  ? xen_safe_halt+0xc/0x20
> > [  412.670177]  ? default_idle+0x23/0x110
> > [  412.679862]  ? do_idle+0x168/0x1f0
> > [  412.688666]  ? cpu_startup_entry+0x14/0x20
> > [  412.699059]  ? start_kernel+0x4c3/0x4cb
> > [  412.708807]  ? xen_start_kernel+0x527/0x530
> > [  412.720776] Code: cb e9 a0 fe ff ff 0f 0b 4c 89 e7 c6 05 00 d6 c6 00 01 
> > e8 82
> > 89 fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 30 fb 01 82 e8 44 e9 a6 ff <0f> 
> > 0b e9
> > 58 fe ff ff 0f 1f 80 00 00 00 00 41 57 41 56 41 55 41
> > [  412.767900] ---[ end trace d9e35c3f725f4b57 ]---
> > [  412.780193] e1000 0000:00:05.0 eth0: Reset adapter
> > 
> > This only happens when L1 is using x2APIC and a guest has been
> > launched (by L1). Prior to launching any guest L1 seems to be fully
> > functional. I'm currently trying to figure out how/when that interrupt
> > is lost, which I bet it's related to the merging of vmcs between L1
> > and L2 done in L0.
> > 
> > As a workaround I could disable exposing x2APIC in CPUID when nested
> > virtualization is enabled on Intel.
> > 
> 
> any progress on this problem? Please let me know if I overlooked a more
> recent mail. possibly it's useful to fully compare the APICv related setting
> in vmcs02 and vmcs12. Alternatively, you may disable all APICv features
> to see whether APICv is the main reason.

Hello,

Yes, found out what was causing the issue, patches are at:

https://lists.xenproject.org/archives/html/xen-devel/2020-01/msg00437.html

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.