[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

To: George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>
From: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
Date: Fri, 16 Nov 2018 21:50:47 +0200
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 16 Nov 2018 19:51:22 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 11/16/18 7:59 PM, George Dunlap wrote:
> On 11/16/18 2:10 PM, Razvan Cojocaru wrote:
>> On 11/16/18 2:03 PM, George Dunlap wrote:
>>> The code is definitely complicated enough, though, that I may have
>>> missed something, which is why I asked Razvan if there was a reason he
>>> changed it.
>>>
>>> For the purposes of this patch, I propose having p2m_altp2m_init_ept()
>>> set max_mapped_pfn to 0 (if that works), and leaving "get rid of
>>> max_remapped_pfn" for a future clean-up series.
>>
>> I've retraced my previous analysis and re-ran some tests, and I now
>> remember (sorry it took a while) why the p2m->max_mapped_pfn =
>> hostp2m->max_mapped_pfn was both necessary and not accidental.
>>
>> Let's say we set it to 0 in p2m_altp2m_init_ept(). Then,
>> hap_track_dirty_vram() calls p2m_change_type_range(), which calls the
>> newly added change_type_range().
>>
>> Change_type_range() looks like this:
>>
>> static void change_type_range(struct p2m_domain *p2m,
>>                               unsigned long start, unsigned long end,
>>                               p2m_type_t ot, p2m_type_t nt)
>> {
>>     unsigned long gfn = start;
>>     struct domain *d = p2m->domain;
>>     int rc = 0;
>>
>>     p2m->defer_nested_flush = 1;
>>
>>     if ( unlikely(end > p2m->max_mapped_pfn) )
>>     {
>>         if ( !gfn )
>>         {
>>             p2m->change_entry_type_global(p2m, ot, nt);
>>             gfn = end;
>>         }
>>         end = p2m->max_mapped_pfn + 1;
>>     }
>>     if ( gfn < end )
>>         rc = p2m->change_entry_type_range(p2m, ot, nt, gfn, end - 1);
>>     if ( rc )
>>     {
>>         printk(XENLOG_G_ERR "Error %d changing Dom%d GFNs [%lx,%lx] from
>> %d to %d\n",
>>                rc, d->domain_id, start, end - 1, ot, nt);
>>         domain_crash(d);
>>     }
>>
>>     switch ( nt )
>>     {
>>     case p2m_ram_rw:
>>         if ( ot == p2m_ram_logdirty )
>>             rc = rangeset_remove_range(p2m->logdirty_ranges, start, end
>> - 1);
>>         break;
>>     case p2m_ram_logdirty:
>>         if ( ot == p2m_ram_rw )
>>             rc = rangeset_add_range(p2m->logdirty_ranges, start, end - 1);
>>         break;
>>     default:
>>         break;
>>     }
>>     if ( rc )
>>     {
>>         printk(XENLOG_G_ERR "Error %d manipulating Dom%d's log-dirty
>> ranges\n",
>>                rc, d->domain_id);
>>         domain_crash(d);
>>     }
>>
>>     p2m->defer_nested_flush = 0;
>>     if ( nestedhvm_enabled(d) )
>>         p2m_flush_nestedp2m(d);
>> }
>>
>> If we set p2m->max_mapped_pfn to 0, we're guaranteed to run into the if
>> ( unlikely(end > p2m->max_mapped_pfn) ) body, where end =
>> p2m->max_mapped_pfn + 1; will make end 1.
>>
>> Then, we will crash the hypervisor in rangeset_add_range(), where
>> there's an ASSERT() stating that start <= end.
> 
> Ah, right, this was the original crash that you ran into several months
> ago, which flagged up the whole logdirty range synchronization issue.
> 
> But that's partly a logic hole in change_entry_type_range(), which
> assumes that start < p2m->max_mapped_pfn.  It would be better to fix
> that than to work around it by changing the meaning of max_mapped_pfn.
> 
> On the other hand, we want the logdirty rangesets to actually match the
> host's rangesets; using altp2m->max_mapped_pfn for this is clearly
> wrong. The easiest fix would be just to explicitly use the host's
> max_mapped_pfn when calculating the clipping.  A more complete fix would
> involve calculating two different ranges -- a "rangeset" range and a
> "invalidate" range, the second of which would be clipped on altp2ms by
> {min,max}_remapped_gfn.
> 
> Something like the attached (compile-tested only).  I'm partial to
> having both patches applied, but I'd be open to arguments that we should
> only use the first.

Thanks! I haven't yet been able to think in depth about the logic, but I
did manage to apply them. Just applying the first one allows me to set
p2m->max_mapped_pfn = 0; without the ASSERT() crashing the hypervisor,
and everything appears to work well.

With both patches applies, the display remains frozen (things appear to
behave - externally - in the same way as before the series).


Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early
  - From: Razvan Cojocaru

References:
- [Xen-devel] [PATCH V6 0/4] Fix VGA logdirty related display freezes with altp2m
  - From: Razvan Cojocaru
- [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early
  - From: Razvan Cojocaru
- Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early
  - From: George Dunlap
- Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early
  - From: Razvan Cojocaru
- Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early
  - From: George Dunlap
- Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early
  - From: Razvan Cojocaru
- Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early
  - From: George Dunlap

Prev by Date: Re: [Xen-devel] [PATCH 17/18] xen/arm: Resume Dom0 after Xen resumes
Next by Date: Re: [Xen-devel] [PATCH 0/4] x86: Drop PVRDTSCP and fix MSR_TSC_AUX handling for guests
Previous by thread: Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early
Next by thread: Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.