[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] x86/hvm: finish IOREQ correctly on completion path



>>> On 12.03.19 at 17:23, <igor.druzhinin@xxxxxxxxxx> wrote:
> Since the introduction of linear_{read,write}() helpers in 3bdec530a5
> (x86/HVM: split page straddling emulated accesses in more cases) the
> completion path for IOREQs has been broken: if there is an IOREQ in
> progress but hvm_copy_{to,from}_guest_linear() returns HVMTRANS_okay
> (e.g. when P2M type of source/destination has been changed by IOREQ
> handler) the execution will never re-enter hvmemul_do_io() where
> IOREQs are completed. This usually results in a domain crash upon
> the execution of the next IOREQ entering hvmemul_do_io() and finding
> the remnants of the previous IOREQ in the state machine.
> 
> This particular issue has been discovered in relation to p2m_ioreq_server
> type where an emulator changed the memory type between p2m_ioreq_server
> and p2m_ram_rw in process of responding to IOREQ which made hvm_copy_..()
> to behave differently on the way back. But could be also applied
> to a case where e.g. an emulator balloons memory to/from the guest in
> response to MMIO read/write, etc.

An emulator ballooning memory? I think Andrew was hinting towards
another vCPU of the guest doing some ballooning work in parallel to
the insn emulation.

> @@ -1089,7 +1092,19 @@ static int linear_read(unsigned long addr, unsigned 
> int bytes, void *p_data,
>                         uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt)
>  {
>      pagefault_info_t pfinfo;
> -    int rc = hvm_copy_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo);
> +    struct hvm_vcpu_io *vio = &current->arch.hvm.hvm_io;
> +    struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(vio, addr,
> +                                                           IOREQ_READ, 
> false);

const (and I wonder whether the local variable it worth it in the first
place)

> +    int rc = HVMTRANS_bad_gfn_to_mfn;
> +
> +    /*
> +     * If there is an MMIO cache entry for that access then we must be 
> re-issuing
> +     * an access that was previously handed as MMIO. Thus it is imperative 
> that

... handled ... ?

> +     * we handle this access in the same way to guarantee completion and 
> hence
> +     * clean up any interim state.
> +     */
> +    if ( !cache )
> +        rc = hvm_copy_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo);

This is certainly better than the previous version, but imo still
doesn't address the general case. But I'd be willing to accept
it as a temporary workaround for the specific case of a write
to an (aligned) page table entry leading to a page type change
from p2m_ioreq_server to p2m_ram_rw, as it at least doesn't
seem to break the case anymore that the introduction of
linear_{read,write}() was meant to address.

The more general case that still won't work (afaict) is an
access crossing a page boundary, where the second page's
type changes behind our backs. The first part of the access
won't find a cache entry here, and hence would still go the
hvm_copy_{from,to}_guest_linear() path above.

And then there's the more general issue with this caching using
(only) guest (linear) addresses as tags: While thinking through
the implications here, it became pretty obvious that
hvmemul_phys_mmio_access() doing

            if ( dir == IOREQ_READ )
                memcpy(&buffer[offset], &cache->buffer[offset], chunk);
            else if ( memcmp(&buffer[offset], &cache->buffer[offset], chunk) != 
0 )
                domain_crash(current->domain);

will wrongly crash the domain if a single insn writes the same
linear address more than once with different data. Such a case
is pretty simple to construct with AVX512's scatter insns, but
even without I'm not sure bad scenarios couldn't be found.
Obviously in such a case your new use of the cached info is
liable to go wrong, too, albeit possibly in a benign way.

Hence this new use of caching that needs to go away (read:
needs to be replaced) is at best sub-optimal, and may further
complicate that future work.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.