|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
Wednesday, September 11, 2013, 6:25:35 PM, you wrote:
> Wednesday, September 11, 2013, 2:39:01 PM, you wrote:
>> On 11/09/13 13:08, Stefano Stabellini wrote:
>>> On Wed, 11 Sep 2013, David Vrabel wrote:
>>>> On 10/09/13 19:13, Sander Eikelenboom wrote:
>>>>> Hi Wei,
>>>>>
>>>>> Just back from holiday i tried running a new xen-unstable and
>>>>> linux kernel (current Linus his tree + Konrad's last pull request
>>>>> merged on top). I saw a thread and patch about a bug_on in
>>>>> increase_reservation ... i'm seeing the splat below in dom0 when
>>>>> guests get started.
>>>>
>>>> Yes, the use of __per_cpu() in decrease_reservation is not safe.
>>>>
>>>> Stefano, what testing did you give "xen/balloon: set a mapping for
>>>> ballooned out pages" (cd9151e2). The number of critical problems
>>>> it's had suggests not a lot?
>>>>
>>>> I'm also becoming less happy with the inconsistency between the
>>>> p2m updates between the different (non-)auto_translated_physmap
>>>> guest types.
>>>>
>>>> I think it (and all the attempts to fix it) should be reverted at
>>>> this stage and we should try again for 3.13 which some more through
>>>> testing and a more careful look at what updates to the p2m are
>>>> needed.
>>>
>>> Issues like this one are due to different kernel configurations /
>>> usage patters. To reproduce this issue one needs a preemptible kernel
>>> and blkback. I use a non-preemptible kernel and QEMU as block
>>> backend.
>>>
>>> Granted, in this case I should have tested blkback and both
>>> preemptible and non-preemptible kernel configurations. But in
>>> general it is nearly impossible to test all the possible
>>> configurations beforehand, it is a classic case of combinatorial
>>> explosion.
>>>
>>> These are exactly the kind of things that an exposure to a wider
>>> range of users/developers help identify and fix.
>>>
>>> So I think that we did the right thing here, by sending a pull
>>> request early in the release cycle, so that now we have many other
>>> RCs to fix all the issues that come up.
>> That sounds fair.
>> Sander, does the follow patch fix this issue?
> Hi David,
> This patch indeed seems to fix it.
Spoke too soon, starting a guest is fixed now, shutting it down oopses:
[ 910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
[ 910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
[ 910.995427] device vif13.0 left promiscuous mode
[ 911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
[ 911.364617] ------------[ cut here ]------------
[ 911.371022] kernel BUG at drivers/xen/balloon.c:365!
[ 911.377395] invalid opcode: 0000 [#1] PREEMPT SMP
[ 911.383775] Modules linked in:
[ 911.390099] CPU: 4 PID: 16626 Comm: kworker/4:2 Not tainted
3.11.0-mw-20130911-notl-balloonfix #1
[ 911.396620] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1
09/13/2010
[ 911.403172] Workqueue: events balloon_process
[ 911.409724] task: ffff88004eef2120 ti: ffff88004d44a000 task.ti:
ffff88004d44a000
[ 911.416321] RIP: e030:[<ffffffff814a7264>] [<ffffffff814a7264>]
balloon_process+0x2f4/0x300
[ 911.423168] RSP: e02b:ffff88004d44bd08 EFLAGS: 00010207
[ 911.430016] RAX: 00000000005462f6 RBX: 0000000000000000 RCX: 0000000000000001
[ 911.436980] RDX: ffffffff82d67000 RSI: 00000000000001b3 RDI: 000000000004b5b3
[ 911.444011] RBP: ffff88004d44bd68 R08: 0000000000000000 R09: ffff88004eef2840
[ 911.450898] R10: 0000000000000000 R11: 0000000000000001 R12: 0000160000000000
[ 911.457625] R13: 0000000000000001 R14: ffffea00012d6cc0 R15: 000000000004b5b3
[ 911.464405] FS: 00007fdfbfd58700(0000) GS:ffff88005f700000(0000)
knlGS:0000000000000000
[ 911.471219] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 911.478082] CR2: 00007f2bc3fdaf30 CR3: 0000000050ac9000 CR4: 0000000000000660
[ 911.485030] Stack:
[ 911.491763] 0000000000000000 0000000000000001 ffffffff82d31900
0000000000000001
[ 911.498705] 0000000000000000 0000000000007ff0 ffff880000000000
ffffffff82266d00
[ 911.505751] ffff88005f712c80 ffff88005f717000 0000000000000000
ffff88004f693080
[ 911.512803] Call Trace:
[ 911.519802] [<ffffffff810b0e1d>] process_one_work+0x1bd/0x430
[ 911.526905] [<ffffffff810b0db7>] ? process_one_work+0x157/0x430
[ 911.533975] [<ffffffff810b2012>] worker_thread+0x112/0x360
[ 911.540988] [<ffffffff810b1f00>] ? manage_workers.isra.23+0x290/0x290
[ 911.547921] [<ffffffff810b8c86>] kthread+0xd6/0xe0
[ 911.554787] [<ffffffff81a4a23b>] ? _raw_spin_unlock_irq+0x2b/0x70
[ 911.561854] [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
[ 911.568932] [<ffffffff81a4b2bc>] ret_from_fork+0x7c/0xb0
[ 911.575955] [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
[ 911.582948] Code: 6d 26 82 bf 08 00 00 00 e8 aa 91 c0 ff eb 9b 31 c9 e9 61
fe ff ff 0f 0b 0f 0b 4c 89 ff e8 35 2e b6 ff 48 ff c0 0f 84 9c fe ff ff <0f> 0b
66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 89 d7
[ 911.597747] RIP [<ffffffff814a7264>] balloon_process+0x2f4/0x300
[ 911.605129] RSP <ffff88004d44bd08>
[ 911.612506] ---[ end trace 04f1bc1297ee6627 ]---
[ 911.619911] BUG: unable to handle kernel paging request at ffffffffffffffa8
[ 911.627457] IP: [<ffffffff810b8f4b>] kthread_data+0xb/0x20
[ 911.635063] PGD 220f067 PUD 2211067 PMD 0
[ 911.642664] Oops: 0000 [#2] PREEMPT SMP
[ 911.650087] Modules linked in:
[ 911.657394] CPU: 4 PID: 16626 Comm: kworker/4:2 Tainted: G D
3.11.0-mw-20130911-notl-balloonfix #1
[ 911.664959] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1
09/13/2010
[ 911.672722] task: ffff88004eef2120 ti: ffff88004d44a000 task.ti:
ffff88004d44a000
[ 911.680547] RIP: e030:[<ffffffff810b8f4b>] [<ffffffff810b8f4b>]
kthread_data+0xb/0x20
[ 911.688475] RSP: e02b:ffff88004d44b8e8 EFLAGS: 00010092
[ 911.696220] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 000000000000000f
[ 911.704218] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff88004eef2120
[ 911.712222] RBP: ffff88004d44b8e8 R08: ffff88004eef21b0 R09: 0000000000000165
[ 911.720089] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004eef2510
[ 911.727639] R13: 0000000000000004 R14: ffff88004eef2110 R15: ffff88004eef2120
[ 911.735015] FS: 00007f68eb707700(0000) GS:ffff88005f700000(0000)
knlGS:0000000000000000
[ 911.742459] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 911.749743] CR2: 0000000000000028 CR3: 0000000047295000 CR4: 0000000000000660
[ 911.756818] Stack:
[ 911.763909] ffff88004d44b908 ffffffff810b25e0 ffff88004d44b908
ffff88005f713700
[ 911.771115] ffff88004d44ba38 ffffffff81a48237 ffff88004d44b938
ffffffff8111b1bc
[ 911.778316] ffff88004eef2120 ffff88004d44bfd8 ffff88004d44bfd8
ffff88004d44bfd8
[ 911.785588] Call Trace:
[ 911.792734] [<ffffffff810b25e0>] wq_worker_sleeping+0x10/0x90
[ 911.799936] [<ffffffff81a48237>] __schedule+0x647/0x8e0
[ 911.807193] [<ffffffff8111b1bc>] ? rcu_is_cpu_idle+0x3c/0x60
[ 911.814320] [<ffffffff814157f6>] ? debug_smp_processor_id+0x56/0x100
[ 911.821352] [<ffffffff81090203>] ? __serpent_dec_blk8_avx+0x91b/0x3cc8
[ 911.828385] [<ffffffff810e989c>] ? lock_acquire+0xdc/0x100
[ 911.835244] [<ffffffff8109ad03>] ? do_exit+0x5d3/0xa20
[ 911.842019] [<ffffffff8111d582>] ? call_rcu+0x12/0x20
[ 911.848682] [<ffffffff810e96a3>] ? lock_release+0x133/0x250
[ 911.855278] [<ffffffff81a485f4>] schedule+0x24/0x60
[ 911.861884] [<ffffffff8109add2>] do_exit+0x6a2/0xa20
[ 911.868489] [<ffffffff810e96a3>] ? lock_release+0x133/0x250
[ 911.875031] [<ffffffff810114f6>] oops_end+0xa6/0xf0
[ 911.881646] [<ffffffff81011693>] die+0x53/0x80
[ 911.888163] [<ffffffff8100df79>] do_trap+0x69/0x160
[ 911.894532] [<ffffffff8100e292>] ? do_invalid_op+0x72/0xc0
[ 911.900945] [<ffffffff8100e2c6>] do_invalid_op+0xa6/0xc0
[ 911.907225] [<ffffffff814a7264>] ? balloon_process+0x2f4/0x300
[ 911.913448] [<ffffffff810e391f>] ? trace_hardirqs_off_caller+0x8f/0x160
[ 911.919774] [<ffffffff8140debd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 911.926001] [<ffffffff81a4aa67>] ? restore_args+0x30/0x30
[ 911.932120] [<ffffffff81a4c5ae>] invalid_op+0x1e/0x30
[ 911.938231] [<ffffffff814a7264>] ? balloon_process+0x2f4/0x300
[ 911.944381] [<ffffffff814a725b>] ? balloon_process+0x2eb/0x300
[ 911.950474] [<ffffffff810b0e1d>] process_one_work+0x1bd/0x430
[ 911.956583] [<ffffffff810b0db7>] ? process_one_work+0x157/0x430
[ 911.962660] [<ffffffff810b2012>] worker_thread+0x112/0x360
[ 911.968630] [<ffffffff810b1f00>] ? manage_workers.isra.23+0x290/0x290
[ 911.974632] [<ffffffff810b8c86>] kthread+0xd6/0xe0
[ 911.980656] [<ffffffff81a4a23b>] ? _raw_spin_unlock_irq+0x2b/0x70
[ 911.986807] [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
[ 911.992676] [<ffffffff81a4b2bc>] ret_from_fork+0x7c/0xb0
[ 911.998346] [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
[ 912.004019] Code: 00 48 89 e5 5d 48 8b 40 98 48 c1 e8 02 83 e0 01 c3 66 66
66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 87 98 03 00 00 55 48 89 e5 <48> 8b
40 a8 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
[ 912.016141] RIP [<ffffffff810b8f4b>] kthread_data+0xb/0x20
[ 912.022115] RSP <ffff88004d44b8e8>
[ 912.027951] CR2: ffffffffffffffa8
[ 912.033744] ---[ end trace 04f1bc1297ee6628 ]---
[ 912.039588] Fixing recursive fault but reboot is needed!
> Thx,
> Sander
>> 8<---------------------------------------------------
>> xen/balloon: ensure preemption is disabled when using a scratch page
>> In decrease_reservation(), if the kernel is preempted between updating
>> the mapping and updating the p2m then they may end up using different
>> scratch pages.
>> Use get_balloon_scratch_page() and put_balloon_scratch_page() which use
>> get_cpu_var() and put_cpu_var() to correctly disable preemption.
>> Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
>> ---
>> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
>> index 3101cf6..a74647b 100644
>> --- a/drivers/xen/balloon.c
>> +++ b/drivers/xen/balloon.c
>> @@ -305,6 +305,18 @@ static enum bp_state reserve_additional_memory(long
>> credit)
>> }
>> #endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */
>>
>> +struct page *get_balloon_scratch_page(void)
>> +{
>> + struct page *ret = get_cpu_var(balloon_scratch_page);
>> + BUG_ON(ret == NULL);
>> + return ret;
>> +}
>> +
>> +void put_balloon_scratch_page(void)
>> +{
>> + put_cpu_var(balloon_scratch_page);
>> +}
>> +
>> static enum bp_state increase_reservation(unsigned long nr_pages)
>> {
>> int rc;
>> @@ -380,6 +392,7 @@ static enum bp_state decrease_reservation(unsigned long
>> nr_pages, gfp_t gfp)
>> enum bp_state state = BP_DONE;
>> unsigned long pfn, i;
>> struct page *page;
>> + struct page *scratch_page;
>> int ret;
>> struct xen_memory_reservation reservation = {
>> .address_bits = 0,
>> @@ -399,6 +412,8 @@ static enum bp_state decrease_reservation(unsigned long
>> nr_pages, gfp_t gfp)
>> if (nr_pages > ARRAY_SIZE(frame_list))
>> nr_pages = ARRAY_SIZE(frame_list);
>>
>> + scratch_page = get_balloon_scratch_page();
>> +
>> for (i = 0; i < nr_pages; i++) {
>> page = alloc_page(gfp);
>> if (page == NULL) {
>> @@ -416,7 +431,7 @@ static enum bp_state decrease_reservation(unsigned long
>> nr_pages, gfp_t gfp)
>> if (xen_pv_domain() && !PageHighMem(page)) {
>> ret = HYPERVISOR_update_va_mapping(
>> (unsigned long)__va(pfn << PAGE_SHIFT),
>> -
>> pfn_pte(page_to_pfn(__get_cpu_var(balloon_scratch_page)),
>> + pfn_pte(page_to_pfn(scratch_page),
>> PAGE_KERNEL_RO), 0);
>> BUG_ON(ret);
>> }
>> @@ -432,14 +447,14 @@ static enum bp_state decrease_reservation(unsigned
>> long nr_pages, gfp_t gfp)
>> pfn = mfn_to_pfn(frame_list[i]);
>> if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>> unsigned long p;
>> - struct page *pg;
>> - pg = __get_cpu_var(balloon_scratch_page);
>> - p = page_to_pfn(pg);
>> + p = page_to_pfn(scratch_page);
>> __set_phys_to_machine(pfn, pfn_to_mfn(p));
>> }
>> balloon_append(pfn_to_page(pfn));
>> }
>>
>> + put_balloon_scratch_page();
>> +
>> set_xen_guest_handle(reservation.extent_start, frame_list);
>> reservation.nr_extents = nr_pages;
>> ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
>> &reservation);
>> @@ -491,18 +506,6 @@ static void balloon_process(struct work_struct *work)
>> mutex_unlock(&balloon_mutex);
>> }
>>
>> -struct page *get_balloon_scratch_page(void)
>> -{
>> - struct page *ret = get_cpu_var(balloon_scratch_page);
>> - BUG_ON(ret == NULL);
>> - return ret;
>> -}
>> -
>> -void put_balloon_scratch_page(void)
>> -{
>> - put_cpu_var(balloon_scratch_page);
>> -}
>> -
>> /* Resets the Xen limit, sets new target, and kicks off processing. */
>> void balloon_set_new_target(unsigned long target)
>> {
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |