[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: 19 March 2018 15:12
> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> Cc: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>; Wei Liu
> <wei.liu2@xxxxxxxxxx>; George Dunlap <George.Dunlap@xxxxxxxxxx>; Ian
> Jackson <Ian.Jackson@xxxxxxxxxx>; Stefano Stabellini
> <sstabellini@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx; Konrad Rzeszutek
> Wilk <konrad.wilk@xxxxxxxxxx>; Tim (Xen.org) <tim@xxxxxxx>
> Subject: Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and
> also to flush the IOTLB
> 
> >>> On 12.02.18 at 11:47, <paul.durrant@xxxxxxxxxx> wrote:
> > This patch adds iommu_ops to allow a domain with control_iommu
> privilege
> > to map and unmap pages from any guest over which it has mapping
> privilege
> > in the IOMMU.
> > These operations implicitly disable IOTLB flushing so that the caller can
> > batch operations and then explicitly flush the IOTLB using the iommu_op
> > also added by this patch.
> 
> Can't this be abused for unmaps?

Hmm. I think we're ok. The calls just play with the CPU local flush disable 
flag so they should only disable anything resulting from the current hypercall. 
Manipulation of other IOMMU page tables (on behalf of other domains) should not 
be affected AFAICT. I'll double check though.

> 
> > --- a/xen/arch/x86/iommu_op.c
> > +++ b/xen/arch/x86/iommu_op.c
> > @@ -24,6 +24,174 @@
> >  #include <xen/hypercall.h>
> >  #include <xen/iommu.h>
> >
> > +/* Override macros from asm/page.h to make them work with mfn_t */
> > +#undef mfn_to_page
> > +#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
> > +#undef page_to_mfn
> > +#define page_to_mfn(page) _mfn(__page_to_mfn(page))
> 
> I guess with Julien's this needs to go away, but it looks like his
> series hasn't landed yet.
> 

Yes, I'll remove this once that happens.

> > +struct check_rdm_ctxt {
> > +    bfn_t bfn;
> > +};
> > +
> > +static int check_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
> 
> uint32_t
> 

Yep.

> > +{
> > +    struct check_rdm_ctxt *ctxt = arg;
> > +
> > +    if ( bfn_x(ctxt->bfn) >= start &&
> > +         bfn_x(ctxt->bfn) < start + nr )
> > +        return -EINVAL;
> 
> Something more distinguishable than EINVAL would certainly be
> nice here. Also how come this check does not depend on the
> domain? Only RMRRs of devices owned by a domain are relevant
> in the BFN range (unless I still didn't fully understand how BFN is
> meant to be different from GFN and MFN).
> 

I thought that the reserved range check was only for the current domain's 
mappings (optionally limited to a single initiator), but I could be wrong. I'll 
check.

> > +static int iommuop_map(struct xen_iommu_op_map *op, unsigned int
> flags)
> > +{
> > +    struct domain *d, *od, *currd = current->domain;
> > +    struct domain_iommu *iommu = dom_iommu(currd);
> > +    const struct iommu_ops *ops = iommu->platform_ops;
> > +    domid_t domid = op->domid;
> > +    gfn_t gfn = _gfn(op->gfn);
> > +    bfn_t bfn = _bfn(op->bfn);
> > +    mfn_t mfn;
> > +    struct check_rdm_ctxt ctxt = {
> > +        .bfn = bfn,
> > +    };
> > +    p2m_type_t p2mt;
> > +    p2m_query_t p2mq;
> > +    struct page_info *page;
> > +    unsigned int prot;
> > +    int rc;
> > +
> > +    if (op->pad0 != 0 || op->pad1 != 0)
> 
> Missing blanks again (and please again consider dropping the " != 0").
> 
> > +        return -EINVAL;
> > +
> > +    /*
> > +     * Both map_page and lookup_page operations must be implemented.
> > +     * The lookup_page method is not used here but is relied upon by
> > +     * iommuop_unmap() to drop the page reference taken here.
> > +     */
> > +    if ( !ops->map_page || !ops->lookup_page )
> > +        return -ENOSYS;
> 
> EOPNOTSUPP (also further down)
> 

I wanted the 'not implemented' case to be distinct from the 'not supported 
because of some configuration detail' case, which is why I chose ENOSYS. I'll 
change it if you don't think that matters though.

> Also how about the unmap hook? If that's not implemented, how
> would the page ref obtained below ever be dropped again? Or
> you may need to re-order the unmap side code.

Ok. I'll just check for all map, unmap and lookup in both cases.

> 
> > +    /* Check whether the specified BFN falls in a reserved region */
> > +    rc = iommu_get_reserved_device_memory(check_rdm, &ctxt);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    d = rcu_lock_domain_by_any_id(domid);
> > +    if ( !d )
> > +        return -ESRCH;
> > +
> > +    p2mq = (flags & XEN_IOMMUOP_map_readonly) ?
> > +        P2M_UNSHARE : P2M_ALLOC;
> 
> Isn't this the wrong way round?
> 

I don't think so. If we're doing a readonly mapping then the page should not be 
forcibly populated, right?

> > +    page = get_page_from_gfn(d, gfn_x(gfn), &p2mt, p2mq);
> > +
> > +    rc = -ENOENT;
> > +    if ( !page )
> > +        goto unlock;
> > +
> > +    if ( p2m_is_paged(p2mt) )
> > +    {
> > +        p2m_mem_paging_populate(d, gfn_x(gfn));
> > +        goto release;
> > +    }
> > +
> > +    if ( (p2mq & P2M_UNSHARE) && p2m_is_shared(p2mt) )
> > +        goto release;
> 
> Same for this check then?
> 

I'm confused.

> > +    /*
> > +     * Make sure the page is RAM and, if it is read-only, that the
> > +     * read-only flag is present.
> > +     */
> > +    rc = -EPERM;
> > +    if ( !p2m_is_any_ram(p2mt) ||
> > +         (p2m_is_readonly(p2mt) && !(flags &
> XEN_IOMMUOP_map_readonly)) )
> > +        goto release;
> 
> Don't you also need to obtain a PGT_writable reference in the
> "not r/o" case?
> 

I'll check the logic again.

> > +    /*
> > +     * If the calling domain does not own the page then make sure it
> > +     * has mapping privilege over the page owner.
> > +     */
> > +    od = page_get_owner(page);
> > +    if ( od != currd )
> > +    {
> > +        rc = xsm_domain_memory_map(XSM_TARGET, od);
> > +        if ( rc )
> > +            goto release;
> > +    }
> 
> With XSM_TARGET I don't see the point of the if() around here.
> Perhaps simply
> 
>         rc = xsm_domain_memory_map(XSM_TARGET,
> page_get_owner(page));
> 
> ?

I wasn't sure the test was valid if the current domain was owner.

> 
> > +static int iommuop_unmap(struct xen_iommu_op_unmap *op)
> > +{
> > +    struct domain *currd = current->domain;
> > +    struct domain_iommu *iommu = dom_iommu(currd);
> > +    const struct iommu_ops *ops = iommu->platform_ops;
> > +    bfn_t bfn = _bfn(op->bfn);
> > +    mfn_t mfn;
> > +    struct check_rdm_ctxt ctxt = {
> > +        .bfn = bfn,
> > +    };
> > +    unsigned int flags;
> > +    struct page_info *page;
> > +    int rc;
> > +
> > +    /*
> > +     * Both unmap_page and lookup_page operations must be
> implemented.
> > +     */
> 
> Single line comment (there are more below).
> 

Ok.

> > +    if ( !ops->unmap_page || !ops->lookup_page )
> > +        return -ENOSYS;
> > +
> > +    /* Check whether the specified BFN falls in a reserved region */
> > +    rc = iommu_get_reserved_device_memory(check_rdm, &ctxt);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( ops->lookup_page(currd, bfn, &mfn, &flags) ||
> > +         !mfn_valid(mfn) )
> > +        return -ENOENT;
> > +
> > +    page = mfn_to_page(mfn);
> > +
> > +    if ( ops->unmap_page(currd, bfn) )
> > +        return -EIO;
> 
> How are you making sure this is a mapping that was established via
> the map op? Without that this can be (ab)used to ...
> 
> > +    put_page(page);
> 
> ... underflow the refcount of a page.
> 

Yes, I guess I need to ensure that only non-RAM (i.e. RMRR and E820 reserved 
areas) are mapped through the IOMMU or this could indeed be abused.

> > +    return 0;
> 
> Blank line above here please.
> 

Ok.

> > @@ -101,6 +269,22 @@ static void iommu_op(xen_iommu_op_t *op)
> >          op->status = iommuop_query_reserved(&op->u.query_reserved);
> >          break;
> >
> > +    case XEN_IOMMUOP_map:
> > +        this_cpu(iommu_dont_flush_iotlb) = 1;
> > +        op->status = iommuop_map(&op->u.map, op->flags);
> > +        this_cpu(iommu_dont_flush_iotlb) = 0;
> 
> true/false would be better in new code, even if the type of the
> variable is still bool_t.

Ok.

> 
> > --- a/xen/include/public/iommu_op.h
> > +++ b/xen/include/public/iommu_op.h
> > @@ -57,13 +57,50 @@ struct xen_iommu_op_query_reserved {
> >      XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
> >  };
> >
> > +/*
> > + * XEN_IOMMUOP_map: Map a page in the IOMMU.
> > + */
> > +#define XEN_IOMMUOP_map 2
> > +
> > +struct xen_iommu_op_map {
> > +    /* IN - The IOMMU frame number which will hold the new mapping */
> > +    xen_bfn_t bfn;
> > +    /* IN - The guest frame number of the page to be mapped */
> > +    xen_pfn_t gfn;
> > +    /* IN - The domid of the guest */
> 
> "... owning the page"
> 

Not necessarily. If the page has been grant or foreign mapped by the domain 
then I need this work.

> > +    domid_t domid;
> > +    unsigned short pad0;
> > +    unsigned int pad1;
> > +};
> 
> No built in batching here? Also fixed width types again please.
> 

It's a multi-op hypercall so the batching is done at that level.

> > +/*
> > + * XEN_IOMMUOP_flush: Flush the IOMMU TLB.
> > + */
> > +#define XEN_IOMMUOP_flush 4
> 
> No inputs here at all makes this a rather simple interface, but makes
> single-page updates quite expensive.
> 

Ok. I guess I could make it specific even if the underlying implementation 
doesn't support that.

> >  struct xen_iommu_op {
> >      uint16_t op;
> >      uint16_t flags; /* op specific flags */
> > +
> > +#define _XEN_IOMMUOP_map_readonly 0
> > +#define XEN_IOMMUOP_map_readonly (1 <<
> (_XEN_IOMMUOP_map_readonly))
> 
> Perhaps better have this next to the map op?
> 

Ok.

  Paul

> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.