[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Domain reference counting breakage
On 22.02.2021 18:11, Andrew Cooper wrote: > On 22/02/2021 16:49, Jan Beulich wrote: >> On 22.02.2021 16:26, Andrew Cooper wrote: >>> At the moment, attempting to create an HVM guest with max_gnttab_frames of 0 >>> causes Xen to explode on the: >>> >>> BUG_ON(atomic_read(&d->refcnt) != DOMAIN_DESTROYED); >>> >>> in _domain_destroy(). Intrumenting Xen a little more to highlight where the >>> modifcations to d->refcnt occur: >>> >>> (d6) --- Xen Test Framework --- >>> (d6) Environment: PV 64bit (Long mode 4 levels) >>> (d6) Testing domain create: >>> (d6) Testing x86 PVH Shadow >>> (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402046b5 >>> [domain_create+0x1c3/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1 >>> (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040321b11 >>> [share_xen_page_with_guest+0x175/0x190], stk e010:ffff83003fea7ce8, dr6 >>> ffff0ff1 >>> (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d04022595b >>> [assign_pages+0x223/0x2b7], stk e010:ffff83003fea7c68, dr6 ffff0ff1 >>> (d6) (XEN) grant_table.c:1934: Bad grant table sizes: grant 0, maptrack 0 >>> (d6) (XEN) *** d1 ref 3 >>> (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402048bc >>> [domain_create+0x3ca/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1 >>> (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040225e11 >>> [free_domheap_pages+0x422/0x44a], stk e010:ffff83003fea7c38, dr6 ffff0ff1 >>> (d6) (XEN) Xen BUG at domain.c:450 >>> (d6) (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Not tainted ]---- >>> (d6) (XEN) CPU: 0 >>> (d6) (XEN) RIP: e008:[<ffff82d040204366>] >>> common/domain.c#_domain_destroy+0x69/0x6b >>> >>> the problem becomes apparent. >>> >>> First of all, there is a reference count leak - >>> share_xen_page_with_guest()'s >>> reference isn't freed anywhere. >>> >>> However, the main problem is the 4th #DB above is this atomic_set() >>> >>> d->is_dying = DOMDYING_dead; >>> if ( hardware_domain == d ) >>> hardware_domain = old_hwdom; >>> printk("*** %pd ref %d\n", d, atomic_read(&d->refcnt)); >>> atomic_set(&d->refcnt, DOMAIN_DESTROYED); >>> >>> in the domain_create() error path, which happens before free_domheap_pages() >>> drops the ref acquired assign_pages(), and destroys still-relevant >>> information >>> pertaining to the guest. >> I think the original idea was that by the time of the atomic_set() >> all operations potentially altering the refcount are done. This >> then allowed calling free_xenheap_pages() on e.g. the shared info >> page without regard to whether the page's reference (installed by >> share_xen_page_with_guest()) was actually dropped (i.e. >> regardless of whether it's the domain create error path or proper >> domain cleanup). With this assumption, no actual leak of anything >> would occur, but of course this isn't the "natural" way of how >> things should get cleaned up. >> >> According to this original model, free_domheap_pages() may not be >> called anymore past that point (for domain owned pages, which >> really means put_page() then; anonymous pages are of course fine >> to be freed late). > > And I presume this is written down in the usual place? (I.e. nowhere) I'm afraid so, hence me starting the explanation with "I think ...". >>> The best options is probably to use atomic_sub() to subtract >>> (DOMAIN_DESTROYED >>> + 1) from the current refcount, which preserves the extra refs taken by >>> share_xen_page_with_guest() and assign_pages() until they can be freed >>> appropriately. >> First of all - why DOMAIN_DESTROYED+1? There's no extra reference >> you ought to be dropping here. Or else what's the counterpart of >> acquiring the respective reference? > > The original atomic_set(1) needs dropping (somewhere around) here. Ah, right. >> And then of course this means Xen heap pages cannot be cleaned up >> anymore by merely calling free_xenheap_pages() - to get rid of >> the associated reference, all of them would need to undergo >> put_page_alloc_ref(), which in turn requires obtaining an extra >> reference, which in turn introduces another of these ugly >> theoretical error cases (because get_page() can in principle fail). >> >> Therefore I wouldn't outright discard the option of sticking to >> the original model. It would then better be properly described >> somewhere, and we would likely want to put some check in place to >> make sure such put_page() can't go unnoticed anymore when sitting >> too late on the cleanup path (which may be difficult to arrange >> for). > > I agree that some rules are in desperate need of writing down. > > However, given the catastrophic mess that is our reference counting and > cleanup paths, and how easy it is to get things wrong, I'm very > disinclined to permit a rule which forces divergent cleanup logic. > > Making the cleanup paths non-divergent is the fix to removing swathes of > dubious/buggy logic, and removing a steady stream of memory leaks, etc. > > In particular, I don't think its acceptable to special case the cleanup > rules in the domain_create() error path simply because we blindly reset > the reference count when it still contains real information. Of course I agree in principle. The question is whether this is a good time for such a more extensive rework. IOW I wonder if there's an immediate bug to be fixed now and then some rework to be done for 4.16. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |