Xen project Mailing List

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

To: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

From: George Dunlap <george.dunlap@xxxxxxxxxx>

Date: Fri, 16 Nov 2018 17:59:15 +0000

Autocrypt: addr=george.dunlap@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFPqG+MBEACwPYTQpHepyshcufo0dVmqxDo917iWPslB8lauFxVf4WZtGvQSsKStHJSj 92Qkxp4CH2DwudI8qpVbnWCXsZxodDWac9c3PordLwz5/XL41LevEoM3NWRm5TNgJ3ckPA+J K5OfSK04QtmwSHFP3G/SXDJpGs+oDJgASta2AOl9vPV+t3xG6xyfa2NMGn9wmEvvVMD44Z7R W3RhZPn/NEZ5gaJhIUMgTChGwwWDOX0YPY19vcy5fT4bTIxvoZsLOkLSGoZb/jHIzkAAznug Q7PPeZJ1kXpbW9EHHaUHiCD9C87dMyty0N3TmWfp0VvBCaw32yFtM9jUgB7UVneoZUMUKeHA fgIXhJ7I7JFmw3J0PjGLxCLHf2Q5JOD8jeEXpdxugqF7B/fWYYmyIgwKutiGZeoPhl9c/7RE Bf6f9Qv4AtQoJwtLw6+5pDXsTD5q/GwhPjt7ohF7aQZTMMHhZuS52/izKhDzIufl6uiqUBge 0lqG+/ViLKwCkxHDREuSUTtfjRc9/AoAt2V2HOfgKORSCjFC1eI0+8UMxlfdq2z1AAchinU0 eSkRpX2An3CPEjgGFmu2Je4a/R/Kd6nGU8AFaE8ta0oq5BSFDRYdcKchw4TSxetkG6iUtqOO ZFS7VAdF00eqFJNQpi6IUQryhnrOByw+zSobqlOPUO7XC5fjnwARAQABzSRHZW9yZ2UgVy4g RHVubGFwIDxkdW5sYXBnQHVtaWNoLmVkdT7CwYAEEwEKACoCGwMFCwkIBwMFFQoJCAsFFgID AQACHgECF4ACGQEFAlpk2IEFCQo9I54ACgkQpjY8MQWQtG1A1BAAnc0oX3+M/jyv4j/ESJTO U2JhuWUWV6NFuzU10pUmMqpgQtiVEVU2QbCvTcZS1U/S6bqAUoiWQreDMSSgGH3a3BmRNi8n HKtarJqyK81aERM2HrjYkC1ZlRYG+jS8oWzzQrCQiTwn3eFLJrHjqowTbwahoiMw/nJ+OrZO /VXLfNeaxA5GF6emwgbpshwaUtESQ/MC5hFAFmUBZKAxp9CXG2ZhTP6ROV4fwhpnHaz8z+BT NQz8YwA4gkmFJbDUA9I0Cm9D/EZscrCGMeaVvcyldbMhWS+aH8nbqv6brhgbJEQS22eKCZDD J/ng5ea25QnS0fqu3bMrH39tDqeh7rVnt8Yu/YgOwc3XmgzmAhIDyzSinYEWJ1FkOVpIbGl9 uR6seRsfJmUK84KCScjkBhMKTOixWgNEQ/zTcLUsfTh6KQdLTn083Q5aFxWOIal2hiy9UyqR VQydowXy4Xx58rqvZjuYzdGDdAUlZ+D2O3Jp28ez5SikA/ZaaoGI9S1VWvQsQdzNfD2D+xfL qfd9yv7gko9eTJzv5zFr2MedtRb/nCrMTnvLkwNX4abB5+19JGneeRU4jy7yDYAhUXcI/waS /hHioT9MOjMh+DoLCgeZJYaOcgQdORY/IclLiLq4yFnG+4Ocft8igp79dbYYHkAkmC9te/2x Kq9nEd0Hg288EO/OwE0EVFq6vQEIAO2idItaUEplEemV2Q9mBA8YmtgckdLmaE0uzdDWL9To 1PL+qdNe7tBXKOfkKI7v32fe0nB4aecRlQJOZMWQRQ0+KLyXdJyHkq9221sHzcxsdcGs7X3c 17ep9zASq+wIYqAdZvr7pN9a3nVHZ4W7bzezuNDAvn4EpOf/o0RsWNyDlT6KECs1DuzOdRqD oOMJfYmtx9hMzqBoTdr6U20/KgnC/dmWWcJAUZXaAFp+3NYRCkk7k939VaUpoY519CeLrymd Vdke66KCiWBQXMkgtMGvGk5gLQLy4H3KXvpXoDrYKgysy7jeOccxI8owoiOdtbfM8TTDyWPR Ygjzb9LApA8AEQEAAcLBZQQYAQoADwIbDAUCWmTXMwUJB+tP9gAKCRCmNjwxBZC0bb+2D/9h jn1k5WcRHlu19WGuH6q0Kgm1LRT7PnnSz904igHNElMB5a7wRjw5kdNwU3sRm2nnmHeOJH8k Yj2Hn1QgX5SqQsysWTHWOEseGeoXydx9zZZkt3oQJM+9NV1VjK0bOXwqhiQyEUWz5/9l467F S/k4FJ5CHNRumvhLa0l2HEEu5pxq463HQZHDt4YE/9Y74eXOnYCB4nrYxQD/GSXEZvWryEWr eDoaFqzq1TKtzHhFgQG7yFUEepxLRUUtYsEpT6Rks2l4LCqG3hVD0URFIiTyuxJx3VC2Ta4L H3hxQtiaIpuXqq2D4z63h6vCx2wxfZc/WRHGbr4NAlB81l35Q/UHyMocVuYLj0llF0rwU4Aj iKZ5qWNSEdvEpL43fTvZYxQhDCjQTKbb38omu5P4kOf1HT7s+kmQKRtiLBlqHzK17D4K/180 ADw7a3gnmr5RumcZP3NGSSZA6jP5vNqQpNu4gqrPFWNQKQcW8HBiYFgq6SoLQQWbRxJDHvTR YJ2ms7oCe870gh4D1wFFqTLeyXiVqjddENGNaP8ZlCDw6EU82N8Bn5LXKjR1GWo2UK3CjrkH pTt3YYZvrhS2MO2EYEcWjyu6LALF/lS6z6LKeQZ+t9AdQUcILlrx9IxqXv6GvAoBLJY1jjGB q+/kRPrWXpoaQn7FXWGfMqU+NkY9enyrlw==

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 16 Nov 2018 17:59:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 11/16/18 2:10 PM, Razvan Cojocaru wrote: > On 11/16/18 2:03 PM, George Dunlap wrote: >> The code is definitely complicated enough, though, that I may have >> missed something, which is why I asked Razvan if there was a reason he >> changed it. >> >> For the purposes of this patch, I propose having p2m_altp2m_init_ept() >> set max_mapped_pfn to 0 (if that works), and leaving "get rid of >> max_remapped_pfn" for a future clean-up series. > > I've retraced my previous analysis and re-ran some tests, and I now > remember (sorry it took a while) why the p2m->max_mapped_pfn = > hostp2m->max_mapped_pfn was both necessary and not accidental. > > Let's say we set it to 0 in p2m_altp2m_init_ept(). Then, > hap_track_dirty_vram() calls p2m_change_type_range(), which calls the > newly added change_type_range(). > > Change_type_range() looks like this: > > static void change_type_range(struct p2m_domain *p2m, > unsigned long start, unsigned long end, > p2m_type_t ot, p2m_type_t nt) > { > unsigned long gfn = start; > struct domain *d = p2m->domain; > int rc = 0; > > p2m->defer_nested_flush = 1; > > if ( unlikely(end > p2m->max_mapped_pfn) ) > { > if ( !gfn ) > { > p2m->change_entry_type_global(p2m, ot, nt); > gfn = end; > } > end = p2m->max_mapped_pfn + 1; > } > if ( gfn < end ) > rc = p2m->change_entry_type_range(p2m, ot, nt, gfn, end - 1); > if ( rc ) > { > printk(XENLOG_G_ERR "Error %d changing Dom%d GFNs [%lx,%lx] from > %d to %d\n", > rc, d->domain_id, start, end - 1, ot, nt); > domain_crash(d); > } > > switch ( nt ) > { > case p2m_ram_rw: > if ( ot == p2m_ram_logdirty ) > rc = rangeset_remove_range(p2m->logdirty_ranges, start, end > - 1); > break; > case p2m_ram_logdirty: > if ( ot == p2m_ram_rw ) > rc = rangeset_add_range(p2m->logdirty_ranges, start, end - 1); > break; > default: > break; > } > if ( rc ) > { > printk(XENLOG_G_ERR "Error %d manipulating Dom%d's log-dirty > ranges\n", > rc, d->domain_id); > domain_crash(d); > } > > p2m->defer_nested_flush = 0; > if ( nestedhvm_enabled(d) ) > p2m_flush_nestedp2m(d); > } > > If we set p2m->max_mapped_pfn to 0, we're guaranteed to run into the if > ( unlikely(end > p2m->max_mapped_pfn) ) body, where end = > p2m->max_mapped_pfn + 1; will make end 1. > > Then, we will crash the hypervisor in rangeset_add_range(), where > there's an ASSERT() stating that start <= end. Ah, right, this was the original crash that you ran into several months ago, which flagged up the whole logdirty range synchronization issue. But that's partly a logic hole in change_entry_type_range(), which assumes that start < p2m->max_mapped_pfn. It would be better to fix that than to work around it by changing the meaning of max_mapped_pfn. On the other hand, we want the logdirty rangesets to actually match the host's rangesets; using altp2m->max_mapped_pfn for this is clearly wrong. The easiest fix would be just to explicitly use the host's max_mapped_pfn when calculating the clipping. A more complete fix would involve calculating two different ranges -- a "rangeset" range and a "invalidate" range, the second of which would be clipped on altp2ms by {min,max}_remapped_gfn. Something like the attached (compile-tested only). I'm partial to having both patches applied, but I'd be open to arguments that we should only use the first. -George

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.