[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] kernel BUG at drivers/block/xen-blkfront.c:1711



On 08/10/2016 08:33 PM, Evgenii Shatokhin wrote:
> On 14.07.2016 15:04, Bob Liu wrote:
>>
>> On 07/14/2016 07:49 PM, Evgenii Shatokhin wrote:
>>> On 11.07.2016 15:04, Bob Liu wrote:
>>>>
>>>>
>>>> On 07/11/2016 04:50 PM, Evgenii Shatokhin wrote:
>>>>> On 06.06.2016 11:42, Dario Faggioli wrote:
>>>>>> Just Cc-ing some Linux, block, and Xen on CentOS people...
>>>>>>
>>>>>
>>>>> Ping.
>>>>>
>>>>> Any suggestions how to debug this or what might cause the problem?
>>>>>
>>>>> Obviously, we cannot control Xen on the Amazon's servers. But perhaps 
>>>>> there is something we can do at the kernel's side, is it?
>>>>>
>>>>>> On Mon, 2016-06-06 at 11:24 +0300, Evgenii Shatokhin wrote:
>>>>>>> (Resending this bug report because the message I sent last week did
>>>>>>> not
>>>>>>> make it to the mailing list somehow.)
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> One of our users gets kernel panics from time to time when he tries
>>>>>>> to
>>>>>>> use his Amazon EC2 instance with CentOS7 x64 in it [1]. Kernel panic
>>>>>>> happens within minutes from the moment the instance starts. The
>>>>>>> problem
>>>>>>> does not show up every time, however.
>>>>>>>
>>>>>>> The user first observed the problem with a custom kernel, but it was
>>>>>>> found later that the stock kernel 3.10.0-327.18.2.el7.x86_64 from
>>>>>>> CentOS7 was affected as well.
>>>>
>>>> Please try this patch:
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7b0767502b5db11cb1f0daef2d01f6d71b1192dc
>>>>
>>>> Regards,
>>>> Bob
>>>>
>>>
>>> Unfortunately, it did not help. The same BUG_ON() in 
>>> blkfront_setup_indirect() still triggers in our kernel based on RHEL's 
>>> 3.10.0-327.18.2, where I added the patch.
>>>
>>> As far as I can see, the patch makes sure the indirect pages are added to 
>>> the list only if (!info->feature_persistent) holds. I suppose it holds in 
>>> our case and the pages are added to the list because the triggered BUG_ON() 
>>> is here:
>>>
>>>      if (!info->feature_persistent && info->max_indirect_segments) {
>>>          <...>
>>>          BUG_ON(!list_empty(&info->indirect_pages));
>>>          <...>
>>>      }
>>>
>>
>> That's odd.
>> Could you please try to reproduce this issue with a recent upstream kernel?
>>
>> Thanks,
>> Bob
> 
> No luck with the upstream kernel 4.7.0 so far due to unrelated issues (bad 
> initrd, I suppose, so the system does not even boot).
> 
> However, the problem reproduced with the stable upstream kernel 3.14.74. 
> After the system booted the second time with this kernel, that BUG_ON 
> triggered:
>      kernel BUG at drivers/block/xen-blkfront.c:1701
> 

Could you please provide more detail on how to reproduce this bug? I'd like to 
have a test.

Thanks!
Bob

>>
>>> So the problem is still out there somewhere, it seems.
>>>
>>> Regards,
>>> Evgenii
>>>
>>>>>>>
>>>>>>> The part of the system log he was able to retrieve is attached. Here
>>>>>>> is
>>>>>>> the bug info, for convenience:
>>>>>>>
>>>>>>> ------------------------------------
>>>>>>> [    2.246912] kernel BUG at drivers/block/xen-blkfront.c:1711!
>>>>>>> [    2.246912] invalid opcode: 0000 [#1] SMP
>>>>>>> [    2.246912] Modules linked in: ata_generic pata_acpi
>>>>>>> crct10dif_pclmul
>>>>>>> crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel
>>>>>>> xen_netfront xen_blkfront(+) aesni_intel lrw ata_piix gf128mul
>>>>>>> glue_helper ablk_helper cryptd libata serio_raw floppy sunrpc
>>>>>>> dm_mirror
>>>>>>> dm_region_hash dm_log dm_mod scsi_transport_iscsi
>>>>>>> [    2.246912] CPU: 1 PID: 50 Comm: xenwatch Not tainted
>>>>>>> 3.10.0-327.18.2.el7.x86_64 #1
>>>>>>> [    2.246912] Hardware name: Xen HVM domU, BIOS 4.2.amazon
>>>>>>> 12/07/2015
>>>>>>> [    2.246912] task: ffff8800e9fcb980 ti: ffff8800e98bc000 task.ti:
>>>>>>> ffff8800e98bc000
>>>>>>> [    2.246912] RIP: 0010:[<ffffffffa015584f>]  [<ffffffffa015584f>]
>>>>>>> blkfront_setup_indirect+0x41f/0x430 [xen_blkfront]
>>>>>>> [    2.246912] RSP: 0018:ffff8800e98bfcd0  EFLAGS: 00010283
>>>>>>> [    2.246912] RAX: ffff8800353e15c0 RBX: ffff8800e98c52c8 RCX:
>>>>>>> 0000000000000020
>>>>>>> [    2.246912] RDX: ffff8800353e15b0 RSI: ffff8800e98c52b8 RDI:
>>>>>>> ffff8800353e15d0
>>>>>>> [    2.246912] RBP: ffff8800e98bfd20 R08: ffff8800353e15b0 R09:
>>>>>>> ffff8800eb403c00
>>>>>>> [    2.246912] R10: ffffffffa0155532 R11: ffffffffffffffe8 R12:
>>>>>>> ffff8800e98c4000
>>>>>>> [    2.246912] R13: ffff8800e98c52b8 R14: 0000000000000020 R15:
>>>>>>> ffff8800353e15c0
>>>>>>> [    2.246912] FS:  0000000000000000(0000) GS:ffff8800efc20000(0000)
>>>>>>> knlGS:0000000000000000
>>>>>>> [    2.246912] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [    2.246912] CR2: 00007f1b615ef000 CR3: 00000000e2b44000 CR4:
>>>>>>> 00000000001406e0
>>>>>>> [    2.246912] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>>>>> 0000000000000000
>>>>>>> [    2.246912] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>>>>>>> 0000000000000400
>>>>>>> [    2.246912] Stack:
>>>>>>> [    2.246912]  0000000000000020 0000000000000001 00000020a0157217
>>>>>>> 00000100e98bfdbc
>>>>>>> [    2.246912]  0000000027efa3ef ffff8800e98bfdbc ffff8800e98ce000
>>>>>>> ffff8800e98c4000
>>>>>>> [    2.246912]  ffff8800e98ce040 0000000000000001 ffff8800e98bfe08
>>>>>>> ffffffffa0155d4c
>>>>>>> [    2.246912] Call Trace:
>>>>>>> [    2.246912]  [<ffffffffa0155d4c>] blkback_changed+0x4ec/0xfc8
>>>>>>> [xen_blkfront]
>>>>>>> [    2.246912]  [<ffffffff813a6fd0>] ? xenbus_gather+0x170/0x190
>>>>>>> [    2.246912]  [<ffffffff816322f5>] ? __slab_free+0x10e/0x277
>>>>>>> [    2.246912]  [<ffffffff813a805d>]
>>>>>>> xenbus_otherend_changed+0xad/0x110
>>>>>>> [    2.246912]  [<ffffffff813a7257>] ? xenwatch_thread+0x77/0x180
>>>>>>> [    2.246912]  [<ffffffff813a9ba3>] backend_changed+0x13/0x20
>>>>>>> [    2.246912]  [<ffffffff813a7246>] xenwatch_thread+0x66/0x180
>>>>>>> [    2.246912]  [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
>>>>>>> [    2.246912]  [<ffffffff813a71e0>] ?
>>>>>>> unregister_xenbus_watch+0x1f0/0x1f0
>>>>>>> [    2.246912]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
>>>>>>> [    2.246912]  [<ffffffff810a5a20>] ?
>>>>>>> kthread_create_on_node+0x140/0x140
>>>>>>> [    2.246912]  [<ffffffff81646118>] ret_from_fork+0x58/0x90
>>>>>>> [    2.246912]  [<ffffffff810a5a20>] ?
>>>>>>> kthread_create_on_node+0x140/0x140
>>>>>>> [    2.246912] Code: e1 48 85 c0 75 ce 49 8d 84 24 40 01 00 00 48 89
>>>>>>> 45
>>>>>>> b8 e9 91 fd ff ff 4c 89 ff e8 8d ae 06 e1 e9 f2 fc ff ff 31 c0 e9 2e
>>>>>>> fe
>>>>>>> ff ff <0f> 0b e8 9a 57 f2 e0 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44
>>>>>>> 00
>>>>>>> [    2.246912] RIP  [<ffffffffa015584f>]
>>>>>>> blkfront_setup_indirect+0x41f/0x430 [xen_blkfront]
>>>>>>> [    2.246912]  RSP <ffff8800e98bfcd0>
>>>>>>> [    2.491574] ---[ end trace 8a9b992812627c71 ]---
>>>>>>> [    2.495618] Kernel panic - not syncing: Fatal exception
>>>>>>> ------------------------------------
>>>>>>>
>>>>>>> Xen version 4.2.
>>>>>>>
>>>>>>> EC2 instance type: c3.large with EBS magnetic storage, if that
>>>>>>> matters.
>>>>>>>
>>>>>>> Here is the code where the BUG_ON triggers (drivers/block/xen-
>>>>>>> blkfront.c):
>>>>>>> ------------------------------------
>>>>>>> if (!info->feature_persistent && info->max_indirect_segments) {
>>>>>>>         /*
>>>>>>>             * We are using indirect descriptors but not persistent
>>>>>>>             * grants, we need to allocate a set of pages that can be
>>>>>>>             * used for mapping indirect grefs
>>>>>>>             */
>>>>>>>         int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE;
>>>>>>>
>>>>>>>         BUG_ON(!list_empty(&info->indirect_pages)); // << This one hits.
>>>>>>>         for (i = 0; i < num; i++) {
>>>>>>>             struct page *indirect_page = alloc_page(GFP_NOIO);
>>>>>>>             if (!indirect_page)
>>>>>>>                 goto out_of_memory;
>>>>>>>             list_add(&indirect_page->lru, &info->indirect_pages);
>>>>>>>         }
>>>>>>> }
>>>>>>> ------------------------------------
>>>>>>>
>>>>>>> As we checked, 'info->indirect_pages' list indeed contained around
>>>>>>> 30
>>>>>>> elements at that point.
>>>>>>>
>>>>>>> Any ideas what may cause this and how to fix it?
>>>>>>>
>>>>>>> If any other data are needed, please let me know.
>>>>>>>
>>>>>>> References:
>>>>>>> [1] https://bugs.openvz.org/browse/OVZ-6718

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.