[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Questions about GPLPV stability tests



Oops initially only replied to Konrad. This is a resend to xen-devel.

On Tue, Jan 3, 2012 at 4:55 PM, Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx> w=
rote:
> On Mon, Dec 26, 2011 at 06:45:32PM +0000, Roderick Colenbrander wrote:
>> I have finally got some results (the tests are still running). Let me
>> first summarize what tests were running.
>
> Thanks for being so dilligient in providing full details. It helps after
> reading this after the holidays.
>>
>> In total I had 4 machines. All systems had at least Xen 4.1.2 + Linux
>> 2.6.32.48 (and one had a Xen 4.1.3 snapshot). I divided the systems
>> into 2 groups and each group ran a different test. There are 2
>> differencess between the groups:
>> 1) one was the use of blktap vs not using blktap. The main reason for
>> having the blktap systems in even though it adds noise, is that it
>> kept some of tests close to what they previously were.
>> 2) the non-blktap systems used a different cpu pinning configuration.
>> Previously Dom0 was using 0-3 and DomU 4-7, but due to hyperthreading
>> both VMs used the same cores (0 and 4 are on the same physical core).
>>
>> Now to the initial results.
>> - so far each machine has been up for 10 days.
>> - one machine failed early on due to a PCI device assignment issue. I
>> suspect that at some point due to a race condition, ownership of the
>> PCI device wasn't transferred correctly back from DomU to Dom0 on VM
>> shutdown. Xm and Xl respectively fail with 'Error: failed to assign
>> device 03:00.0: maybe it has already been assigned to other domain, or
>> maybe it doesn't exist.' From a quick glance at the logs it looks like
>
> Lets ignore that one. It is harmelss and I should probably remove it.

Okay good to know that it should probably removed.

>> on shutdown of a previous VM, the 'unmap bdf' never happened. Not sure
>> why, but I guess due to a potential toolchain bug.
>>
>> - One of the non-blktap machines had a kernel Oops. It happened on VM
>> shutdown and I wouldn't be surprised if it was similar to the crashes
>> we wanted to debug. The system is in a usable state though, but this
>> may have been due to the use of Linux 2.6.32.48 or the different CPU
>> pinning configuration. Some of the serial output:
>>
>> Thu Dec 22 23:17:38 2011 - 1232 =A0 0: 87 =A0 =A0113 =A0 365 =A0 blkif-b=
ackend
>> =A0 =A0 =A0525325sec
>>
>> Thu Dec 22 23:30:07 2011 - 1259 =A0 0: 12 =A0 =A06 =A0 =A0 222 =A0 xenbu=
s
>> =A0 =A0 =A0526065sec
>> Thu Dec 22 23:30:07 2011 - 1250 =A0 0: 62 =A0 =A042 =A0 =A01461 =A0ahci
>> =A0 =A0 =A0526065sec
>> Thu Dec 22 23:30:07 2011 - 1249 =A0 0: 37 =A0 =A028 =A0 =A075 =A0 =A0eth=
0-rx-0
>> =A0 =A0 =A0526065sec
>> Thu Dec 22 23:30:07 2011 - 1248 =A0 0: 71 =A0 =A0305 =A0 933 =A0 eth0-tx=
-0
>> =A0 =A0 =A0526065sec
>> Thu Dec 22 23:30:07 2011 - 1243 =A0 0: 6 =A0 =A0 3 =A0 =A0 134
>> evtchn:xenstored =A0 =A0 526065sec
>> Thu Dec 22 23:30:07 2011 - 1241 =A0 0: 6 =A0 =A0 3 =A0 =A0 87582
>> evtchn:xenstored =A0 =A0 526065sec
>> Thu Dec 22 23:30:07 2011 - 1240 =A0 0: 256 =A0 261 =A0 54162 evtchn:qemu=
-dm
>> =A0 =A0 =A0526065sec
>> Thu Dec 22 23:30:07 2011 - 1239 =A0 0: 244 =A0 251 =A0 7219 =A0evtchn:qe=
mu-dm
>> =A0 =A0 =A0526065sec
>> Thu Dec 22 23:30:07 2011 - 1238 =A0 0: 289 =A0 273 =A0 5931 =A0evtchn:qe=
mu-dm
>> =A0 =A0 =A0526065sec
>> Thu Dec 22 23:30:07 2011 - 1237 =A0 0: 248 =A0 245 =A0 4279 =A0evtchn:qe=
mu-dm
>> =A0 =A0 =A0526065sec
>> Thu Dec 22 23:30:07 2011 - 1236 =A0 0: 12 =A0 =A07 =A0 =A0 315 =A0 blkif=
-backend
>> =A0 =A0 =A0526065sec
>> Thu Dec 22 23:30:07 2011 - 1234 =A0 0: 7 =A0 =A0 4 =A0 =A0 43 =A0 =A0vet=
h1
>> =A0 =A0 =A0526065sec
>> Thu Dec 22 23:30:07 2011 - 1232 CPU0 going backwards (-3304)!
>> Thu Dec 22 23:30:07 2011 - =A0 19 CPU0 going backwards (-212)!
>> Thu Dec 22 23:30:07 2011 - =A0 18 =A0 0: 35 =A0 =A017 =A0 =A035 =A0 =A0e=
hci_hcd:usb1,
>> uhci_hcd:usb8 526065sec
>> Thu Dec 22 23:30:07 2011 - =A0 16 CPU0 going backwards (-176)!
>> Thu Dec 22 23:30:07 2011 - =A0 12 CPU0 going backwards (-4)!
>> Thu Dec 22 23:30:07 2011 - =A0 =A04 CPU0 going backwards (-1)!
>> Thu Dec 22 23:30:12 2011 - 1250 =A0 0: 384 =A0 213 =A0 1461 =A0ahci
>> =A0 =A0 =A0526070sec
>> Thu Dec 22 23:30:12 2011 - 1249 =A0 0: 14 =A0 =A021 =A0 =A075 =A0 =A0eth=
0-rx-0
>> =A0 =A0 =A0526070sec
>> Thu Dec 22 23:30:12 2011 - 1240 =A0 0: 260 =A0 260 =A0 54162 evtchn:qemu=
-dm
>> =A0 =A0 =A0526070sec
>> Thu Dec 22 23:30:12 2011 - 1239 =A0 0: 279 =A0 265 =A0 7219 =A0evtchn:qe=
mu-dm
>> =A0 =A0 =A0526070sec
>> Thu Dec 22 23:30:13 2011 - 1238 =A0 0: 271 =A0 272 =A0 5931 =A0evtchn:qe=
mu-dm
>> =A0 =A0 =A0526070sec
>> Thu Dec 22 23:30:13 2011 - 1237 =A0 0: 261 =A0 253 =A0 4279 =A0evtchn:qe=
mu-dm
>> =A0 =A0 =A0526070sec
>> Thu Dec 22 23:30:13 2011 - 1236 =A0 0: 145 =A0 76 =A0 =A0315 =A0 blkif-b=
ackend
>> =A0 =A0 =A0526070sec
>> Thu Dec 22 23:30:18 2011 - 1250 =A0 0: 372 =A0 293 =A0 1461 =A0ahci
>> =A0 =A0 =A0526075sec
>> Thu Dec 22 23:30:18 2011 - 1249 =A0 0: 26 =A0 =A024 =A0 =A075 =A0 =A0eth=
0-rx-0
>> =A0 =A0 =A0526075sec
>> Thu Dec 22 23:30:18 2011 - 1248 =A0 0: 16 =A0 =A086 =A0 =A0933 =A0 eth0-=
tx-0
>> =A0 =A0 =A0526075sec
>> Thu Dec 22 23:30:18 2011 - 1240 =A0 0: 234 =A0 247 =A0 54162 evtchn:qemu=
-dm
>> =A0 =A0 =A0526075sec
>> Thu Dec 22 23:30:18 2011 - 1239 =A0 0: 234 =A0 249 =A0 7219 =A0evtchn:qe=
mu-dm
>> =A0 =A0 =A0526075sec
>> Thu Dec 22 23:30:18 2011 - 1238 =A0 0: 241 =A0 256 =A0 5931 =A0evtchn:qe=
mu-dm
>> =A0 =A0 =A0526075sec
>> Thu Dec 22 23:30:18 2011 - 1237 =A0 0: 224 =A0 239 =A0 4279 =A0evtchn:qe=
mu-dm
>> =A0 =A0 =A0526075sec
>> Thu Dec 22 23:30:18 2011 - 1236 =A0 0: 100 =A0 88 =A0 =A0315 =A0 blkif-b=
ackend
>> =A0 =A0 =A0526075sec
>> Thu Dec 22 23:30:23 2011 - 1249 =A0 0: 16 =A0 =A020 =A0 =A075 =A0 =A0eth=
0-rx-0
>> =A0 =A0 =A0526080sec
>> Thu Dec 22 23:30:23 2011 - 1248 =A0 0: 7 =A0 =A0 46 =A0 =A0933 =A0 eth0-=
tx-0
>> =A0 =A0 =A0526080sec
>> Thu Dec 22 23:30:23 2011 - 1240 =A0 0: 8 =A0 =A0 128 =A0 54162 evtchn:qe=
mu-dm
>> =A0 =A0 =A0526080sec
>> Thu Dec 22 23:30:23 2011 - 1239 =A0 0: 16 =A0 =A0133 =A0 7219 =A0evtchn:=
qemu-dm
>> =A0 =A0 =A0526080sec
>> Thu Dec 22 23:30:23 2011 - 1238 =A0 0: 28 =A0 =A0142 =A0 5931 =A0evtchn:=
qemu-dm
>> =A0 =A0 =A0526080sec
>> Thu Dec 22 23:30:23 2011 - 1233 CPU0 going backwards (-12648)!
>> Thu Dec 22 23:30:23 2011 - =A0 19 CPU0 going backwards (-212)!
>> Thu Dec 22 23:30:23 2011 - =A0 18 =A0 0: 35 =A0 =A017 =A0 =A042 =A0 =A0e=
hci_hcd:usb1,
>> uhci_hcd:usb8 526080sec
>> Thu Dec 22 23:30:23 2011 - =A0 16 CPU0 going backwards (-176)!
>> Thu Dec 22 23:30:23 2011 - =A0 12 CPU0 going backwards (-4)!
>> Thu Dec 22 23:30:23 2011 - =A0 =A04 CPU0 going backwards (-1)!
>>
>> [533804.785589] BUG: unable to handle kernel NULL pointer dereference
>> at 0000000000000008
>> [533804.793913] IP: [<ffffffff814bdba9>] _spin_lock+0x5/0x15
>> [533804.799556] PGD 0
>> [533804.801896] Oops: 0002 [#1] SMP
>> [533804.805448] last sysfs file:
>> /sys/devices/pci0000:00/0000:00:03.0/0000:03:00.0/class
>> [533804.813736] CPU 0
>> [533804.816066] Modules linked in: dm_snapshot dm_mod ipmi_si ipmi_devin=
tf
>> [533804.822914] Pid: 21632, comm: qemu-dm Not tainted 2.6.32.48 #1 X8STi
>> [533804.829595] RIP: e030:[<ffffffff814bdba9>] =A0[<ffffffff814bdba9>]
>> _spin_lock+0x5/0x15
>> [533804.837873] RSP: e02b:ffff88005f187c50 =A0EFLAGS: 00010202
>> [533804.843513] RAX: 0000000000000100 RBX: ffff88007d6923c0 RCX:
>> 0000000000000003
>> [533804.851192] RDX: ffff88007deb3180 RSI: ffff88007d6923c0 RDI:
>> 0000000000000008
>> [533804.858871] RBP: ffff88007e260128 R08: 0000000000000000 R09:
>> 0000000000000000
>> [533804.866552] R10: ffff88007121eb40 R11: ffffffff811b862b R12:
>> ffff88007fac1e40
>> [533804.874541] R13: ffff88007e3c3340 R14: ffff88007e3c3340 R15:
>> ffff88007fdfbc00
>> [533804.882243] FS: =A000007f5cdcefe6f0(0000) GS:ffff880028038000(0000)
>> knlGS:0000000000000000
>> [533804.890874] CS: =A0e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [533804.896948] CR2: 0000000000000008 CR3: 0000000001001000 CR4:
>> 0000000000002660
>> [533804.904627] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [533804.912306] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [533804.919985] Process qemu-dm (pid: 21632, threadinfo
>> ffff88005f186000, task ffff880074f4e270)
>> [533804.928971] Stack:
>> [533804.931312] =A0ffffffff810a9ad0 ffff88007deb3180 ffff88007e260100
>> ffff88007e260100
>> [533804.938762] <0> ffffffff8124222d 0000000000000008 0000000000000008
>> ffff88007deb3180
>> [533804.946900] <0> ffffffff810b245e ffff88007fac1e40 ffff88007deb3180
>> 0000000000000000
>> [533804.955465] Call Trace:
>> [533804.958244] =A0[<ffffffff810a9ad0>] ? mmu_notifier_unregister+0x27/0=
xa5
>> [533804.965019] =A0[<ffffffff8124222d>] ? gntdev_release+0xc3/0xd1
>> [533804.971007] =A0[<ffffffff810b245e>] ? __fput+0x100/0x1af
>> [533804.976477] =A0[<ffffffff810af998>] ? filp_close+0x5b/0x62
>> [533804.982119] =A0[<ffffffff81042989>] ? put_files_struct+0x64/0xc1
>> [533804.988280] =A0[<ffffffff810441f0>] ? do_exit+0x209/0x68d
>> [533804.993836] =A0[<ffffffff810441f8>] ? do_exit+0x211/0x68d
>> [533804.999390] =A0[<ffffffff810446e9>] ? do_group_exit+0x75/0x9c
>> [533805.005294] =A0[<ffffffff8104cf1d>] ? get_signal_to_deliver+0x30d/0x=
338
>> [533805.012063] =A0[<ffffffff81010f7a>] ? do_notify_resume+0x8a/0x6b1
>> [533805.018310] =A0[<ffffffff810bdb3a>] ? poll_select_copy_remaining+0xd=
0/0xf3
>> [533805.025339] =A0[<ffffffff81011c8e>] ? int_signal+0x12/0x17
>> [533805.030980] Code: 00 00 00 01 74 05 e8 67 1c d2 ff 48 89 d0 5e c3
>> ff 14 25 20 2d 6a 81 f0 81 2f 00 00 00 01 74 05 e8 4d 1c d2 ff c3 b8
>> 00 01 00 00 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 c3 f0 81 2f
>> 00 00
>> [533805.050454] RIP =A0[<ffffffff814bdba9>] _spin_lock+0x5/0x15
>> [533805.056182] =A0RSP <ffff88005f187c50>
>> [533805.059997] CR2: 0000000000000008
>> [533805.063638] ---[ end trace 85ee7cbec9ce72ac ]---
>> [533805.068584] Fixing recursive fault but reboot is needed!
>>
>> If I had to guess, some of the gnttab code in qemu triggered a bug in
>> the gntdev code? I have some experience with gnttab/gntdev, but don't
>> know the inner parts of it very well.
>
> It certainly looks that way. Or rather that we hit a race - what
> mmu_notifier_unregister tried to call was already de-allocated.
> [edit: Perhaps this is related to a TLB flush fix:
> =A07899891c7d161752f29abcc9bc0a9c6c3a3af26c xen mmu: fix a race window
> causing leave_mm BUG()]
>
> That would explain the hang you got. The qemu-dm is stuck waiting for
> gntdev to respond (you should be able to verify this by attaching gdb to
> qemu-dm and seeing the backtrace - it should be stuck at some ioctl
> call). And the kernel sees that this particular process is not doing
> anything. Also we have some gntdev in a bad state (but that should be OK
> to the other processes) - so I am not sure how that "hangs" your box.
>
> The interrupts being disabled does not seem to occur with this crash?
> Does read_interrupts code you are running is still spewing data right? Or
> does it stop as well?
>
> But lets fix one thing at a time.
>
> Looking at the code in 2.6.32 it is the version that went upstream
> as git commit ab31523c2fcac557226bac72cbdf5fafe01f9a26 and it has
> not had any of the features/bug-fixes that went with the upstream
> version:
>
> ab31523 xen/gntdev: allow usermode to map granted pages
> 8d3eaea xen/gntdev: add VM_PFNMAP to vma
> 9329e76 xen: gntdev: move use of GNTMAP_contains_pte next to the map_op
> ba5d101 xen/gntdev: stop using "token" argument
> f0a70c8 xen/gntdev: Fix circular locking dependency
> a12b4eb xen gntdev: use gnttab_map_refs and gnttab_unmap_refs
> ef91082 xen-gntdev: Change page limit to be global instead of per-open
> a879211 xen-gntdev: Use find_vma rather than iterating our vma list
> manually
> 68b025c xen-gntdev: Add reference counting to maps
> aab8f11 xen-gntdev: Support mapping in HVM domains
> bdc612d xen/gntalloc,gntdev: Add unmap notify ioctl
> 90b6f30 xen-gntdev: Fix memory leak when mmap fails
> 0ea22f0 xen-gntdev: Fix unmap notify on PV domains
> 84e4075 xen-gntdev: Use map->vma for checking map validity
> b57c186 xen-gntdev: Avoid unmapping ranges twice
> 12996fc xen-gntdev: Avoid double-mapping memory
> 9960be9 xen-gntdev: prevent using UNMAP_NOTIFY_CLEAR_BYTE on read-only
> mappings
> 77c35ac xen-gntdev: Fix incorrect use of zero handle
> f4ee4af xen-gntdev: Add cast to pointer
> 38eaeb0 xen: gntdev: fix build warning
> d79647a xen/gntdev,gntalloc: Remove unneeded VM flags
> ca47cea xen-gntdev: Use ballooned pages for grant mappings
> 12f0258 xen-gntdev: return -EFAULT on copy_to_user failure
> a93e20a xen-gntdev: unlock on error path in gntdev_mmap()
>
> The big question is whether the bug you are hitting is fixed by one of th=
ose
> git commits or that an unrelated fix (like the vmalloc_sync_all or the
> kmap_atomic one) which are:
> 461ae488ecb125b140d7ea29ceeedbcce9327003 mm: sync vmalloc address space p=
age tables in alloc_vm_area(
> 2cd1c8d4dc7ecca9e9431e2dabe41ae9c7d89e51 x86/paravirt: PTE updates in k(u=
n)map_atomic need to be synchronous, regardless of lazy_mmu mode
>
> But the errors that one gets without those two git commits is much
> different from what you get. So I doubt it is one of those.

I kept a close look at most of the gntdev related patches since they
are important to us. As you said, most of the changes look related to
this bug.

>
> The CPU time going backwards is disturbing. It does imply that the Xen
> hypervisor has stopped updating the timer information. Or maybe it has
> not, but the gntdev has crashed and left all the interrupts disabled.
>


Just for the record, aside from the 'Oops' the machine is working
fine. It's responsive, networking works, you can login etcetera.

> So three questions:
> =A01). Is the read_intr printing any data after the crash?
The tool was still running afterwards. Let me show a couple of lines
before (was in previous email) and after the Oops.

Thu Dec 22 23:30:18 2011 - 1250   0: 372   293   1461  ahci
     526075sec
Thu Dec 22 23:30:18 2011 - 1249   0: 26    24    75    eth0-rx-0
     526075sec
Thu Dec 22 23:30:18 2011 - 1248   0: 16    86    933   eth0-tx-0
     526075sec
Thu Dec 22 23:30:18 2011 - 1240   0: 234   247   54162 evtchn:qemu-dm
     526075sec
Thu Dec 22 23:30:18 2011 - 1239   0: 234   249   7219  evtchn:qemu-dm
     526075sec
Thu Dec 22 23:30:18 2011 - 1238   0: 241   256   5931  evtchn:qemu-dm
     526075sec
Thu Dec 22 23:30:18 2011 - 1237   0: 224   239   4279  evtchn:qemu-dm
     526075sec
Thu Dec 22 23:30:18 2011 - 1236   0: 100   88    315   blkif-backend
     526075sec
Thu Dec 22 23:30:23 2011 - 1249   0: 16    20    75    eth0-rx-0
     526080sec
Thu Dec 22 23:30:23 2011 - 1248   0: 7     46    933   eth0-tx-0
     526080sec
Thu Dec 22 23:30:23 2011 - 1240   0: 8     128   54162 evtchn:qemu-dm
     526080sec
Thu Dec 22 23:30:23 2011 - 1239   0: 16    133   7219  evtchn:qemu-dm
     526080sec
Thu Dec 22 23:30:23 2011 - 1238   0: 28    142   5931  evtchn:qemu-dm
     526080sec
Thu Dec 22 23:30:23 2011 - 1233 CPU0 going backwards (-12648)!
Thu Dec 22 23:30:23 2011 -   19 CPU0 going backwards (-212)!
Thu Dec 22 23:30:23 2011 -   18   0: 35    17    42    ehci_hcd:usb1,
uhci_hcd:usb8 526080sec
Thu Dec 22 23:30:23 2011 -   16 CPU0 going backwards (-176)!
Thu Dec 22 23:30:23 2011 -   12 CPU0 going backwards (-4)!
Thu Dec 22 23:30:23 2011 -    4 CPU0 going backwards (-1)!
[533804.785589] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000008
[533804.793913] IP: [<ffffffff814bdba9>] _spin_lock+0x5/0x15
[533804.799556] PGD 0
[533804.801896] Oops: 0002 [#1] SMP
...
...
[533805.068584] Fixing recursive fault but reboot is needed!
Thu Dec 22 23:30:28 2011 - 1249   0: 15    18    75    eth0-rx-0
     526085sec
Thu Dec 22 23:30:28 2011 - 1241   0: 1361  682   87582
evtchn:xenstored     526085sec
Thu Dec 22 23:30:28 2011 - 1240 CPU0 going backwards (-391001)!
Thu Dec 22 23:30:28 2011 - 1239 CPU0 going backwards (-45746)!
Thu Dec 22 23:30:28 2011 - 1238 CPU0 going backwards (-73084)!
Thu Dec 22 23:30:28 2011 - 1237 CPU0 going backwards (-82968)!
Thu Dec 22 23:30:28 2011 - 1236 CPU0 going backwards (-6887)!
Thu Dec 22 23:30:28 2011 - 1235 CPU0 going backwards (-440)!
Thu Dec 22 23:30:28 2011 - 1234 CPU0 going backwards (-243)!
Thu Dec 22 23:30:28 2011 -   19 CPU0 going backwards (-215)!
Thu Dec 22 23:30:28 2011 -   18 CPU0 going backwards (-2)!
Thu Dec 22 23:30:28 2011 -   16 CPU0 going backwards (-179)!
Thu Dec 22 23:30:28 2011 -   12 CPU0 going backwards (-2)!
Thu Dec 22 23:30:33 2011 - 1249   0: 32    25    75    eth0-rx-0
     526090sec
Thu Dec 22 23:30:33 2011 - 1248   0: 18    21    933   eth0-tx-0
     526090sec
Thu Dec 22 23:30:38 2011 - 1250   0: 6     23    1461  ahci
     526095sec
Thu Dec 22 23:30:38 2011 - 1249   0: 18    22    75    eth0-rx-0
     526095sec
Thu Dec 22 23:30:43 2011 - 1249   0: 10    16    75    eth0-rx-0
     526100sec
Thu Dec 22 23:30:48 2011 - 1249   0: 14    15    75    eth0-rx-0
     526105sec
Thu Dec 22 23:30:53 2011 - 1249   0: 10    12    75    eth0-rx-0
     526110sec
Thu Dec 22 23:30:58 2011 - 1249   0: 12    12    75    eth0-rx-0
     526115sec


> =A02). If you attach gdb to qemu-dm can you see where it is stuck at?
I don't seem to be able to attach to the qemu-dm process. When I try
to attach, gdb never completes attaching to the process. Not sure why
it doesn't work, it works fine on other processes.

> =A03). Is the console activate at all? If not, can you SSH in (some
> =A0 =A0 network cards switch to polling so you can still login in a machi=
ne
> =A0 =A0 even if the interrupts are turned off - the atl1c something can d=
o
> =A0 =A0 it and I think the r8169 as well, but not sure).
The system is still responsive. Both network and the VGA console work fine.

> =A04). If you had configured your dom0 console to use the serial console
> =A0 =A0 instead of going through the Xen hypervisor console (so
> =A0 =A0 console=3DttyS0 on Linux, and no com1=3DXXX and console=3Dcom1 on=
 Xen
> =A0 =A0 hypervisor line), could you get a back-track of where the Linux
> =A0 =A0 kernel is at using Alt-sysRq?

I tried to get some Alt-sysRq output just now. After two tries, the
kernel wasn't too happy with me and died for good. I hope the limited
information is of some use.

[1654538.605575] SysRq : Show Blocked State
[1654538.609723]   task                        PC stack   pid father
[1654538.616093] qemu-dm       D 0000000000000000     0 21632  21633 0x0000=
0004
[1654538.623354]  ffffffff8169e3b0 0000000000000246 0000000000000000
ffff88005f187a28
[1654538.631319]  ffff88005f187968 0000000000000000 000000000000f4f8
ffff88005f187fd8
[1654538.639284]  0000000000013c80 0000000000013c80 ffff880074f4e270
ffff880074f4e518
[1654538.647249] Call Trace:
[1654538.650114]  [<ffffffff811fd43c>] ? vgacon_cursor+0x0/0x17e
[1654538.656100]  [<ffffffff810440b8>] ? do_exit+0xd1/0x68d
[1654538.661655]  [<ffffffff8100eeef>] ? xen_restore_fl_direct_end+0x0/0x1
[1654538.668511]  [<ffffffff814bdbd4>] ? _spin_unlock_irqrestore+0xc/0xd
[1654538.675192]  [<ffffffff81041573>] ? release_console_sem+0x17d/0x1ae
[1654538.681873]  [<ffffffff814bea6c>] ? oops_end+0xae/0xb3
[1654538.687429]  [<ffffffff8102d636>] ? no_context+0x1ff/0x20e
[1654538.693332]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654538.700186]  [<ffffffff8102d7eb>] ? __bad_area_nosemaphore+0x1a6/0x1ca
[1654538.707129]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654538.713985]  [<ffffffff8100ef02>] ? check_events+0x12/0x20
[1654538.719887]  [<ffffffff81035e28>] ? check_preempt_wakeup+0x0/0x15f
[1654538.726482]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654538.733346]  [<ffffffff8100ef02>] ? check_events+0x12/0x20
[1654538.739247]  [<ffffffff81035e28>] ? check_preempt_wakeup+0x0/0x15f
[1654538.745842]  [<ffffffff814bfeba>] ? do_page_fault+0x116/0x2fc
[1654538.752004]  [<ffffffff814bdf45>] ? page_fault+0x25/0x30
[1654538.757738]  [<ffffffff811b862b>] ? cap_file_free_security+0x0/0x1
[1654538.764338]  [<ffffffff814bdba9>] ? _spin_lock+0x5/0x15
[1654538.769980]  [<ffffffff810a9ad0>] ? mmu_notifier_unregister+0x27/0xa5
[1654538.776836]  [<ffffffff8124222d>] ? gntdev_release+0xc3/0xd1
[1654538.782911]  [<ffffffff810b245e>] ? __fput+0x100/0x1af
[1654538.788467]  [<ffffffff810af998>] ? filp_close+0x5b/0x62
[1654538.794195]  [<ffffffff81042989>] ? put_files_struct+0x64/0xc1
[1654538.800444]  [<ffffffff810441f0>] ? do_exit+0x209/0x68d
[1654538.806087]  [<ffffffff810441f8>] ? do_exit+0x211/0x68d
[1654538.811727]  [<ffffffff810446e9>] ? do_group_exit+0x75/0x9c
[1654538.817717]  [<ffffffff8104cf1d>] ? get_signal_to_deliver+0x30d/0x338
[1654538.824572]  [<ffffffff81010f7a>] ? do_notify_resume+0x8a/0x6b1
[1654538.830907]  [<ffffffff810bdb3a>] ? poll_select_copy_remaining+0xd0/0x=
f3
[1654538.838022]  [<ffffffff81011c8e>] ? int_signal+0x12/0x17

It looks like the gntdev file got closed by qemu-dm which triggers the
gntdev cleanup code.

What I was wondering about is 'gntdev_release'.
static int gntdev_release(struct inode *inode, struct file *flip)
{
     struct gntdev_priv *priv =3D flip->private_data;

     ..
     spin_lock(&priv->lock);
     <do cleanup>
     spin_unlock(&priv->lock);

     mmu_notifier_unregister(..);
     kfree(priv);
     return 0;
}

I don't know enough about the file descriptor code in Linux. What
happens if multiple applications (or threads) open gntdev, do they get
the same "flip"? If they do then it looks like there can be a race if
two threads call this code close to each other. I remember there being
some ref counting mechanism on file descriptors, so I guess this is
not the problem.

Wed Jan  4 22:48:45 2012 - 1250   0: 6     4     1461  ahci
     1644285sec
Wed Jan  4 22:48:45 2012 - 1249   0: 19    19    122   eth0-rx-0
     1644285sec
Wed Jan  4 22:48:50 2012 - 1249   0: 19    19    122   eth0-rx-0
     1644290sec
Wed Jan  4 22:48:55 2012 - 1249   0: 14    16    122   eth0-rx-0
     1644295sec
Wed Jan  4 22:49:00 2012 - 1249   0: 25    21    122   eth0-rx-0
     1644300sec
Wed Jan  4 22:49:00 2012 - 1248   0: 13    7     933   eth0-tx-0
     1644300sec
Wed Jan  4 22:49:05 2012 - 1249   0: 21    21    122   eth0-rx-0
     1644305sec
Wed Jan  4 22:49:05 2012 - 1248   0: 7     7     933   eth0-tx-0
     1644305sec
Wed Jan  4 22:49:10 2012 - 1249   0: 21    21    122   eth0-rx-0
     1644310sec
Wed Jan  4 22:49:10 2012 - 1248   0: 15    11    933   eth0-tx-0
     1644310sec
Wed Jan  4 22:49:15 2012 - 1249   0: 15    18    122   eth0-rx-0
     1644315sec
Wed Jan  4 22:49:21 2012 - 1250   0: 9     5     1461  ahci
     1644320sec
Wed Jan  4 22:49:21 2012 - 1249   0: 28    23    122   eth0-rx-0
     1644320sec
Wed Jan  4 22:49:21 2012 - 1248   0: 20    13    933   eth0-tx-0
     1644320sec
Wed Jan  4 22:49:26 2012 - 1249   0: 14    18    122   eth0-rx-0
     1644325sec
Wed Jan  4 22:49:31 2012 - 1249   0: 23    21    122   eth0-rx-0
     1644330sec
Wed Jan  4 22:49:31 2012 - 1248   0: 21    14    933   eth0-tx-0
     1644330sec
Wed Jan  4 22:49:36 2012 - 1249   0: 18    19    122   eth0-rx-0
     1644335sec
Wed Jan  4 22:49:41 2012 - 1249   0: 14    17    122   eth0-rx-0
     1644340sec
Wed Jan  4 22:49:46 2012 - 1249   0: 18    17    122   eth0-rx-0
     1644345sec
Wed Jan  4 22:49:51 2012 - 1249   0: 21    19    122   eth0-rx-0
     1644350sec
Wed Jan  4 22:49:56 2012 - 1250   0: 6     4     1461  ahci
     1644355sec
Wed Jan  4 22:49:56 2012 - 1249   0: 11    15    122   eth0-rx-0
     1644355sec
Wed Jan  4 22:50:01 2012 - 1249   0: 31    23    122   eth0-rx-0
     1644360sec
Wed Jan  4 22:50:01 2012 - 1248   0: 15    8     933   eth0-tx-0
     1644360sec
Wed Jan  4 22:50:06 2012 - 1249   0: 20    22    122   eth0-rx-0
     1644365sec
Wed Jan  4 22:50:06 2012 - 1248   0: 9     8     933   eth0-tx-0
     1644365sec
Wed Jan  4 22:50:11 2012 - 1249   0: 28    25    122   eth0-rx-0
     1644370sec
Wed Jan  4 22:50:11 2012 - 1248   0: 14    11    933   eth0-tx-0
     1644370sec

[1654627.877585] SysRq : Show backtrace of all active CPUs
[1654627.883041] sending NMI to all CPUs:
[1654627.887060] BUG: unable to handle kernel paging request at ffffffffff5=
fc310
[1654627.894437] IP: [<ffffffff81026743>] _flat_send_IPI_mask+0x4b/0x78
[1654627.901049] PGD 1003067 PUD 1004067 PMD 17a5067 PTE 0
[1654627.906500] Thread overran stack, or stack corrupted
[1654627.911883] Oops: 0002 [#2] SMP
[1654627.915515] last sysfs file:
/sys/devices/pci0000:00/0000:00:1a.0/usb3/3-1/3-1:1.0/uevent
[1654627.924320] CPU 0
[1654627.926746] Modules linked in: dm_snapshot dm_mod ipmi_si ipmi_devintf
[1654627.933681] Pid: 0, comm: swapper Tainted: G      D    2.6.32.48 #1 X8=
STi
[1654627.940883] RIP: e030:[<ffffffff81026743>]  [<ffffffff81026743>]
_flat_send_IPI_mask+0x4b/0x78
[1654627.950121] RSP: e02b:ffff88002803b988  EFLAGS: 00010006
[1654627.955872] RAX: 000000000f000000 RBX: 0000000000000800 RCX:
00000000ffff0095
[1654627.963644] RDX: ffff880028038000 RSI: 0000000000000002 RDI:
000000000000000f
[1654627.971410] RBP: 0000000000000002 R08: 0000000000153363 R09:
ffffffff812608df
[1654627.979176] R10: ffff8800000ba6c0 R11: ffffffff811fd43c R12:
ffffffff817000b0
[1654627.986949] R13: 000000000000000f R14: 000000000000000f R15:
0000000000000026
[1654627.994715] FS:  00007f4c15299910(0000) GS:ffff880028038000(0000)
knlGS:0000000000000000
[1654628.003434] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[1654628.009595] CR2: ffffffffff5fc310 CR3: 000000007b581000 CR4:
0000000000002660
[1654628.017362] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[1654628.025127] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[1654628.032891] Process swapper (pid: 0, threadinfo ffffffff8169a000,
task ffffffff8169e3b0)
[1654628.041609] Stack:
[1654628.044036]  ffffffff8100e871 0000000000000000 000000000000006c
0000000000000001
[1654628.051576] <0> ffff88007e613800 ffffffff810240ef
ffffffff816cce90 ffffffff81265fb5
[1654628.059807] <0> 00000000f28a6723 0000000000000000
0000000000000001 ffff88007e613800
[1654628.068469] Call Trace:
[1654628.071330]  <IRQ>
[1654628.073835]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654628.080691]  [<ffffffff810240ef>] ?
arch_trigger_all_cpu_backtrace+0x3d/0x5d
[1654628.088370]  [<ffffffff81265fb5>] ? __handle_sysrq+0xaf/0x14b
[1654628.094531]  [<ffffffff8125f057>] ? kbd_event+0x35e/0x616
[1654628.100347]  [<ffffffff814bdbd4>] ? _spin_unlock_irqrestore+0xc/0xd
[1654628.107029]  [<ffffffff8136ec5f>] ? input_pass_event+0x5a/0x7b
[1654628.113276]  [<ffffffff81370b8e>] ? input_event+0x5c/0x82
[1654628.119094]  [<ffffffff8139fb34>] ? hidinput_hid_event+0x291/0x2c1
[1654628.125853]  [<ffffffff8139c992>] ? hid_process_event+0xdd/0x121
[1654628.132274]  [<ffffffff8139cd83>] ? hid_report_raw_event+0x3ad/0x42f
[1654628.139042]  [<ffffffff8139cdf1>] ? hid_report_raw_event+0x41b/0x42f
[1654628.145812]  [<ffffffff8139d040>] ? hid_input_report+0x23b/0x260
[1654628.152235]  [<ffffffff813a5042>] ? hid_irq_in+0xad/0x212
[1654628.158050]  [<ffffffff81344b34>] ? usb_hcd_giveback_urb+0x76/0xa9
[1654628.164646]  [<ffffffff813632d6>] ? uhci_giveback_urb+0x10c/0x226
[1654628.171153]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654628.178010]  [<ffffffff8100ef02>] ? check_events+0x12/0x20
[1654628.183912]  [<ffffffff81363b71>] ? uhci_scan_schedule+0x677/0x937
[1654628.190505]  [<ffffffff81366284>] ? uhci_irq+0x8c0/0x8dd
[1654628.196236]  [<ffffffff810550bf>] ? hrtimer_interrupt+0xe3/0x18c
[1654628.202658]  [<ffffffff8100ee1e>] ? xen_vcpuop_set_next_event+0x0/0x60
[1654628.209598]  [<ffffffff8100ee1e>] ? xen_vcpuop_set_next_event+0x0/0x60
[1654628.216541]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654628.223397]  [<ffffffff8100ef02>] ? check_events+0x12/0x20
[1654628.229299]  [<ffffffff8100ee1e>] ? xen_vcpuop_set_next_event+0x0/0x60
[1654628.236240]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654628.243097]  [<ffffffff81344605>] ? usb_hcd_irq+0x39/0x7e
[1654628.248913]  [<ffffffff81077993>] ? handle_IRQ_event+0x2b/0xbd
[1654628.255160]  [<ffffffff810791cf>] ? handle_fasteoi_irq+0x78/0xaf
[1654628.261584]  [<ffffffff8123b382>] ? __xen_evtchn_do_upcall+0x1d0/0x28d
[1654628.268526]  [<ffffffff8123bc9c>] ? xen_evtchn_do_upcall+0x2e/0x42
[1654628.275120]  [<ffffffff81012b7e>] ? xen_do_hypervisor_callback+0x1e/0x=
30
[1654628.282235]  <EOI>
[1654628.284748]  [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
[1654628.291084]  [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
[1654628.297419]  [<ffffffff8100e8ef>] ? xen_safe_halt+0xc/0x15
[1654628.303322]  [<ffffffff81017ec7>] ? default_idle+0x21/0x3d
[1654628.309223]  [<ffffffff81010dd7>] ? cpu_idle+0x59/0x91
[1654628.314780]  [<ffffffff81728c8c>] ? start_kernel+0x381/0x38d
[1654628.320855]  [<ffffffff8172bde1>] ? xen_start_kernel+0x5aa/0x5b0
[1654628.327274] Code: 48 8b 05 f1 00 6e 00 83 fe 02 8b 58 34 75 0a ff
90 58 01 00 00 eb 0e f3 90 8b 04 25 00 c3 5f ff f6 c4 10 75 f2 44 89
e8 c1 e0 18 <89> 04 25 10 c3 5f ff 89 e8 09 d8 80 cf 04 83 fd 02 0f 44
c3 89
[1654628.346854] RIP  [<ffffffff81026743>] _flat_send_IPI_mask+0x4b/0x78
[1654628.353535]  RSP <ffff88002803b988>
[1654628.357436] CR2: ffffffffff5fc310
[1654628.361164] ---[ end trace 85ee7cbec9ce72ad ]---

Then it died:
[1654628.366199] Kernel panic - not syncing: Fatal exception in interrupt
[1654628.372968] Pid: 0, comm: swapper Tainted: G      D    2.6.32.48 #1
[1654628.379649] Call Trace:
[1654628.382509]  <IRQ>  [<ffffffff814bbd88>] ? panic+0x86/0x13e
[1654628.388490]  [<ffffffff812608df>] ? set_cursor+0x3d/0x63
[1654628.394219]  [<ffffffff8100eeef>] ? xen_restore_fl_direct_end+0x0/0x1
[1654628.401074]  [<ffffffff812608df>] ? set_cursor+0x3d/0x63
[1654628.406802]  [<ffffffff812608df>] ? set_cursor+0x3d/0x63
[1654628.412533]  [<ffffffff811fd43c>] ? vgacon_cursor+0x0/0x17e
[1654628.418520]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654628.425376]  [<ffffffff8100ef02>] ? check_events+0x12/0x20
[1654628.431277]  [<ffffffff811fd43c>] ? vgacon_cursor+0x0/0x17e
[1654628.437266]  [<ffffffff812608df>] ? set_cursor+0x3d/0x63
[1654628.442995]  [<ffffffff814bea64>] ? oops_end+0xa6/0xb3
[1654628.448553]  [<ffffffff8102d636>] ? no_context+0x1ff/0x20e
[1654628.454452]  [<ffffffff8100ef02>] ? check_events+0x12/0x20
[1654628.460356]  [<ffffffff811fd43c>] ? vgacon_cursor+0x0/0x17e
[1654628.466343]  [<ffffffff8102d7eb>] ? __bad_area_nosemaphore+0x1a6/0x1ca
[1654628.473286]  [<ffffffff8100eeef>] ? xen_restore_fl_direct_end+0x0/0x1
[1654628.480143]  [<ffffffff81041ad5>] ? vprintk+0x310/0x33c
[1654628.485783]  [<ffffffff8100eeef>] ? xen_restore_fl_direct_end+0x0/0x1
[1654628.492639]  [<ffffffff8100c269>] ? __raw_callee_save_xen_pte_val+0x11=
/0x1e
[1654628.500013]  [<ffffffff811fd43c>] ? vgacon_cursor+0x0/0x17e
[1654628.506003]  [<ffffffff8102cf52>] ? spurious_fault+0x147/0x1db
[1654628.512251]  [<ffffffff814bfe0d>] ? do_page_fault+0x69/0x2fc
[1654628.518327]  [<ffffffff814bdf45>] ? page_fault+0x25/0x30
[1654628.524056]  [<ffffffff811fd43c>] ? vgacon_cursor+0x0/0x17e
[1654628.535471]  [<ffffffff812608df>] ? set_cursor+0x3d/0x63
[1654628.541200]  [<ffffffff81026743>] ? _flat_send_IPI_mask+0x4b/0x78
[1654628.547715]  [<ffffffff8102672d>] ? _flat_send_IPI_mask+0x35/0x78
[1654628.554225]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654628.561080]  [<ffffffff810240ef>] ?
arch_trigger_all_cpu_backtrace+0x3d/0x5d
[1654628.568759]  [<ffffffff81265fb5>] ? __handle_sysrq+0xaf/0x14b
[1654628.574921]  [<ffffffff8125f057>] ? kbd_event+0x35e/0x616
[1654628.580736]  [<ffffffff814bdbd4>] ? _spin_unlock_irqrestore+0xc/0xd
[1654628.587418]  [<ffffffff8136ec5f>] ? input_pass_event+0x5a/0x7b
[1654628.593666]  [<ffffffff81370b8e>] ? input_event+0x5c/0x82
[1654628.599482]  [<ffffffff8139fb34>] ? hidinput_hid_event+0x291/0x2c1
[1654628.606077]  [<ffffffff8139c992>] ? hid_process_event+0xdd/0x121
[1654628.612500]  [<ffffffff8139cd83>] ? hid_report_raw_event+0x3ad/0x42f
[1654628.619268]  [<ffffffff8139cdf1>] ? hid_report_raw_event+0x41b/0x42f
[1654628.626045]  [<ffffffff8139d040>] ? hid_input_report+0x23b/0x260
[1654628.632469]  [<ffffffff813a5042>] ? hid_irq_in+0xad/0x212
[1654628.638284]  [<ffffffff81344b34>] ? usb_hcd_giveback_urb+0x76/0xa9
[1654628.644879]  [<ffffffff813632d6>] ? uhci_giveback_urb+0x10c/0x226
[1654628.651387]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654628.658243]  [<ffffffff8100ef02>] ? check_events+0x12/0x20
[1654628.664145]  [<ffffffff81363b71>] ? uhci_scan_schedule+0x677/0x937
[1654628.670740]  [<ffffffff81366284>] ? uhci_irq+0x8c0/0x8dd
[1654628.676468]  [<ffffffff810550bf>] ? hrtimer_interrupt+0xe3/0x18c
[1654628.682891]  [<ffffffff8100ee1e>] ? xen_vcpuop_set_next_event+0x0/0x60
[1654628.689834]  [<ffffffff8100ee1e>] ? xen_vcpuop_set_next_event+0x0/0x60
[1654628.696774]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654628.703629]  [<ffffffff8100ef02>] ? check_events+0x12/0x20
[1654628.709532]  [<ffffffff8100ee1e>] ? xen_vcpuop_set_next_event+0x0/0x60
[1654628.716487]  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0x9/0xa
[1654628.723348]  [<ffffffff81344605>] ? usb_hcd_irq+0x39/0x7e
[1654628.729163]  [<ffffffff81077993>] ? handle_IRQ_event+0x2b/0xbd
[1654628.735412]  [<ffffffff810791cf>] ? handle_fasteoi_irq+0x78/0xaf
[1654628.741834]  [<ffffffff8123b382>] ? __xen_evtchn_do_upcall+0x1d0/0x28d
[1654628.748776]  [<ffffffff8123bc9c>] ? xen_evtchn_do_upcall+0x2e/0x42
[1654628.755371]  [<ffffffff81012b7e>] ? xen_do_hypervisor_callback+0x1e/0x=
30
[1654628.762485]  <EOI>  [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
[1654628.769428]  [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
[1654628.775764]  [<ffffffff8100e8ef>] ? xen_safe_halt+0xc/0x15
[1654628.781667]  [<ffffffff81017ec7>] ? default_idle+0x21/0x3d
[1654628.787568]  [<ffffffff81010dd7>] ? cpu_idle+0x59/0x91
[1654628.793124]  [<ffffffff81728c8c>] ? start_kernel+0x381/0x38d
[1654628.799199]  [<ffffffff8172bde1>] ? xen_start_kernel+0x5aa/0x5b0
(XEN) [2012-01-04 22:50:14] Domain 0 crashed: rebooting machine in 5 second=
s.

>
> Thanks and sorry for taking so long. Just coming back from holidays.
>

One of the 'blktap' machines is still running tests. It is running
Linux 2.6.32.48 and Xen 4.1.3-rc1. By now it has probably ran a 5000
VMs and it has been up for about 20 days. Before it wouldn't survive
~1000 VMs and 5 days. It would then get into that very, very slow
state. My feeling is that Linux 2.6.32.48 had some fixes which the
problems I was having. Before I ran Xen 4.1.3-rc1 in combination with
Linux 2.6.32.37 on it and it had the slowness problem.

It would be nice if we can get that other bug fixed as well. It could
be around in 3.x kernels as well. I would love to move to Linux 3.2,
but it requires a lot of testing which will take 1-2 months. At least
2.6.32.48 feels more stable and we may be able to fix a few more bugs
in it.

Thanks for your help!
Roderick

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.