Xen project Mailing List

Re: [Xen-devel] [PATCH v2] x86/hvm: finish IOREQ correctly on completion path

To: Jan Beulich <JBeulich@xxxxxxxx>, Paul Durrant <paul.durrant@xxxxxxxxxx>

From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>

Date: Wed, 13 Mar 2019 12:47:30 +0000

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Wed, 13 Mar 2019 12:47:41 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13/03/2019 08:53, Jan Beulich wrote: >>>> On 12.03.19 at 17:23, <igor.druzhinin@xxxxxxxxxx> wrote: >> Since the introduction of linear_{read,write}() helpers in 3bdec530a5 >> (x86/HVM: split page straddling emulated accesses in more cases) the >> completion path for IOREQs has been broken: if there is an IOREQ in >> progress but hvm_copy_{to,from}_guest_linear() returns HVMTRANS_okay >> (e.g. when P2M type of source/destination has been changed by IOREQ >> handler) the execution will never re-enter hvmemul_do_io() where >> IOREQs are completed. This usually results in a domain crash upon >> the execution of the next IOREQ entering hvmemul_do_io() and finding >> the remnants of the previous IOREQ in the state machine. >> >> This particular issue has been discovered in relation to p2m_ioreq_server >> type where an emulator changed the memory type between p2m_ioreq_server >> and p2m_ram_rw in process of responding to IOREQ which made hvm_copy_..() >> to behave differently on the way back. But could be also applied >> to a case where e.g. an emulator balloons memory to/from the guest in >> response to MMIO read/write, etc. > > An emulator ballooning memory? I think Andrew was hinting towards > another vCPU of the guest doing some ballooning work in parallel to > the insn emulation. Yes, this is another example of what emulator might be doing with P2M type in the process of IOREQ handling. Has nothing to do with what Andrew mentioned. >> @@ -1089,7 +1092,19 @@ static int linear_read(unsigned long addr, unsigned >> int bytes, void *p_data, >> uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt) >> { >> pagefault_info_t pfinfo; >> - int rc = hvm_copy_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo); >> + struct hvm_vcpu_io *vio = &current->arch.hvm.hvm_io; >> + struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(vio, addr, >> + IOREQ_READ, >> false); > > const (and I wonder whether the local variable it worth it in the first > place) > >> + int rc = HVMTRANS_bad_gfn_to_mfn; >> + >> + /* >> + * If there is an MMIO cache entry for that access then we must be >> re-issuing >> + * an access that was previously handed as MMIO. Thus it is imperative >> that > > ... handled ... ? > >> + * we handle this access in the same way to guarantee completion and >> hence >> + * clean up any interim state. >> + */ >> + if ( !cache ) >> + rc = hvm_copy_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo); > > This is certainly better than the previous version, but imo still > doesn't address the general case. But I'd be willing to accept > it as a temporary workaround for the specific case of a write > to an (aligned) page table entry leading to a page type change > from p2m_ioreq_server to p2m_ram_rw, as it at least doesn't > seem to break the case anymore that the introduction of > linear_{read,write}() was meant to address. I don't think it's feasible (and makes sense) to specify P2M type on that (linear) level. We only have access to P2M type on physical layer which is several calls down the stack. And I cannot see now how we can query what the type actually *was* before without introducing some additional stashing array of p2mt somewhere (mmio_cache?). I agree this is more useful as a temporary workaround - I'll put it into the commit description of the next version. > The more general case that still won't work (afaict) is an > access crossing a page boundary, where the second page's > type changes behind our backs. The first part of the access > won't find a cache entry here, and hence would still go the > hvm_copy_{from,to}_guest_linear() path above. This could be solved by splitting *all* (not only MMIO as it's done now) linear accesses on page boundary and checking each page separately. Would you consider this for v3? This shouldn't introduce any changes in behavior afaics and is actually more correct imo. > And then there's the more general issue with this caching using > (only) guest (linear) addresses as tags: While thinking through > the implications here, it became pretty obvious that > hvmemul_phys_mmio_access() doing > > if ( dir == IOREQ_READ ) > memcpy(&buffer[offset], &cache->buffer[offset], chunk); > else if ( memcmp(&buffer[offset], &cache->buffer[offset], chunk) > != 0 ) > domain_crash(current->domain); > > will wrongly crash the domain if a single insn writes the same > linear address more than once with different data. Such a case > is pretty simple to construct with AVX512's scatter insns, but > even without I'm not sure bad scenarios couldn't be found. > Obviously in such a case your new use of the cached info is > liable to go wrong, too, albeit possibly in a benign way. > > Hence this new use of caching that needs to go away (read: > needs to be replaced) is at best sub-optimal, and may further > complicate that future work. As I agreed I'm not thinking this is an ultimate solution but seems good enough for stable-4.12 as we need something to unblock XenGT platform update for Intel. Igor _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.