[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V4 3/3] x86/altp2m: fix display frozen when switching to a new view early



On 11/8/18 8:14 PM, George Dunlap wrote:
> On 11/01/2018 02:45 PM, Razvan Cojocaru wrote:
> ...here and...
> 
>> +
>>  int p2m_set_ioreq_server(struct domain *d,
>>                           unsigned int flags,
>>                           struct hvm_ioreq_server *s)
>> @@ -994,12 +1033,12 @@ int p2m_change_type_one(struct domain *d, unsigned 
>> long gfn_l,
>>  }
>>  
>>  /* Modify the p2m type of a range of gfns from ot to nt. */
>> -void p2m_change_type_range(struct domain *d, 
>> -                           unsigned long start, unsigned long end,
>> -                           p2m_type_t ot, p2m_type_t nt)
>> +static void change_type_range(struct p2m_domain *p2m,
>> +                              unsigned long start, unsigned long end,
>> +                              p2m_type_t ot, p2m_type_t nt)
>>  {
>>      unsigned long gfn = start;
>> -    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    struct domain *d = p2m->domain;
>>      int rc = 0;
>>  
>>      ASSERT(ot != nt);
>> @@ -1052,6 +1091,24 @@ void p2m_change_type_range(struct domain *d,
>>      p2m_unlock(p2m);
>>  }
>>  
>> +void p2m_change_type_range(struct domain *d,
>> +                           unsigned long start, unsigned long end,
>> +                           p2m_type_t ot, p2m_type_t nt)
>> +{
>> +    change_type_range(p2m_get_hostp2m(d), start, end, ot, nt);
>> +
>> +#ifdef CONFIG_HVM
>> +    if ( unlikely(altp2m_active(d)) )
>> +    {
>> +        unsigned int i;
>> +
>> +        for ( i = 0; i < MAX_ALTP2M; i++ )
>> +            if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) )
>> +                change_type_range(d->arch.altp2m_p2m[i], start, end, ot, 
>> nt);
>> +    }
>> +#endif
>> +}
>> +
> 
> ...here you grab & release each lock separately, inside the update
> function.  memory_type_changed is probably more or less idempotent, so
> won't matter if two different calls race; but it seems likely that if
> two p2m_change_type_range() calls happen concurrently, the various
> altp2ms will get different results.  Is it worth refactoring both of
> these so that, like change_entry_type_global, you hold the host p2m lock
> while you change the individual altp2m locks?

I have changed the code to do all the modifications under hostp2m lock,
and on changing the resolution on a host with three altp2ms I get a
"Watchdog timer detects that CPU3 is stuck!" hypervisor crash:

(XEN) stdvga.c:178:d1v0 leaving stdvga mode
(XEN) 1 p2m_lock(hostp2m)
(XEN) 1 change_type_range(hostp2m)
(XEN) 1 p2m_lock(altp2m)
(XEN) 1 change_type_range(altp2m)
(XEN) 1 p2m_unlock(altp2m)
(XEN) 1 p2m_lock(altp2m)
(XEN) 1 change_type_range(altp2m)
(XEN) 1 p2m_unlock(altp2m)
(XEN) 1 p2m_lock(altp2m)
(XEN) 1 change_type_range(altp2m)
(XEN) 1 p2m_unlock(altp2m)
(XEN) 1 p2m_unlock(hostp2m)
(XEN) Watchdog timer detects that CPU3 is stuck!
(XEN) ----[ Xen-4.12-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82d080239107>] vcpu_sleep_sync+0x40/0x71
(XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d0v0)
(XEN) rax: 0000000000000001   rbx: ffff8308dc1c3000   rcx: ffff8308dc1c3130
(XEN) rdx: 0000000000000000   rsi: 0000000000000292   rdi: ffff830c529f1010
(XEN) rbp: ffff830c52987c88   rsp: ffff830c52987c78   r8:  ffff830ae7a1dfd0
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000001c80
(XEN) r12: ffff82d0802390c7   r13: ffff8308d5222000   r14: ffff82c000225000
(XEN) r15: 00000000000f0000   cr0: 0000000080050033   cr4: 0000000000372660
(XEN) cr3: 0000000856c56000   cr2: 00007f8d36195000
(XEN) fsb: 00007f8d3fffe8c0   gsb: ffff880276c00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d080239107> (vcpu_sleep_sync+0x40/0x71):
(XEN)  01 00 00 00 74 24 f3 90 <8b> 11 48 8b 43 10 8b 80 cc 01 00 00 09
d0 48 98
(XEN) Xen stack trace from rsp=ffff830c52987c78:
(XEN)    0000000000000240 ffff8308dc1c3000 ffff830c52987cb8 ffff82d0802070f1
(XEN)    ffff830c52987cc8 ffff8308d5222000 0000000000000240 0000000000000048
(XEN)    ffff830c52987cc8 ffff82d0802084bd ffff830c52987d38 ffff82d08036861e
(XEN)    8c00000000000001 ffff8308d5222648 ffff830c52987d28 00000000000f0000
(XEN)    00007f8d400a7010 0000000000000048 0000000000856c23 0000000000000000
(XEN)    ffff830c52987e00 ffffffffffffffea deadbeefdeadf00d ffff82d0802eebef
(XEN)    ffff830c52987de8 ffff82d0802ee217 01ff830c00000001 ffff82e011b4e6e0
(XEN)    ffff830c52987d98 0000000000000000 ffff830c52971000 ffff830c52959000
(XEN)    0000000000000001 ffff830c52971000 ffff830c52987d98 0000000000000007
(XEN)    0000000000000240 00000000000f0000 ffff830c52987fff ffff8308d5222000
(XEN)    ffff830c52987dc8 0000000000000002 0000000000000001 00007f8d400af010
(XEN)    deadbeefdeadf00d ffff82d0802eebef ffff830c52987e48 ffff82d0802eec73
(XEN)    ffff82d08037a3c4 0000000280370001 00007f8d400ae010 0000000000000020
(XEN)    00007f8d400a7010 0000000000000048 ffff82d08037a3c4 ffff830c52987ef8
(XEN)    ffff830c52959000 0000000000000029 ffff830c52987ee8 ffff82d080373722
(XEN)    03ff82d08037a3c4 0000000000000001 0000000000000002 00007f8d400af010
(XEN)    deadbeefdeadf00d deadbeefdeadf00d ffff82d08037a3c4 ffff82d08037a3b8
(XEN)    ffff82d08037a3c4 ffff82d08037a3b8 ffff82d08037a3c4 ffff82d08037a3b8
(XEN)    ffff82d08037a3c4 ffff830c52959000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 00007cf3ad6780e7 ffff82d08037a422
(XEN) Xen call trace:
(XEN)    [<ffff82d080239107>] vcpu_sleep_sync+0x40/0x71
(XEN)    [<ffff82d0802070f1>] domain.c#do_domain_pause+0x33/0x4f
(XEN)    [<ffff82d0802084bd>] domain_pause+0x25/0x27
(XEN)    [<ffff82d08036861e>] hap_track_dirty_vram+0x2b3/0x491
(XEN)    [<ffff82d0802ee217>] dm.c#dm_op+0x472/0xd46
(XEN)    [<ffff82d0802eec73>] do_dm_op+0x84/0xba
(XEN)    [<ffff82d080373722>] pv_hypercall+0x1af/0x4cd
(XEN)    [<ffff82d08037a422>] lstar_enter+0x112/0x120
(XEN)
(XEN) CPU1 @ e008:ffff82d0802a853f
(time.c#time_calibration_std_rendezvous+0x62/0x77)
(XEN) CPU2 @ e008:ffff82d0802a853f
(time.c#time_calibration_std_rendezvous+0x62/0x77)
(XEN) CPU0 @ e008:ffff82d0802a8514
(time.c#time_calibration_std_rendezvous+0x37/0x77)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) FATAL TRAP: vector = 2 (nmi)
(XEN) [error_code=0000]
(XEN) ****************************************

AFAICT, it just takes longer than the timeout to do the sync. The code
changes have been simple (patch attached), I don't think this is an
actual deadlock but of course I could be wrong (at least printing out
all of the lock / unlock calls in the new code looks like everything is
being properly unlocked at the end).

Any suggestion, as usual, appreciated. :)


Thanks,
Razvan

Attachment: change_altp2ms_under_hostp2m_lock.patch
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.