[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.





On 3/10/2017 11:29 PM, Jan Beulich wrote:
On 08.03.17 at 16:33, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
changes in v7:
   - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server.
   - According to comments from George: removed domain_pause/unpause() in
     hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
     and we can avoid the:
     a> deadlock between p2m lock and ioreq server lock by using these locks
        in the same order - solved in patch 4;
That is, until patch 4 there is deadlock potential? I think you want to
re-order the patches if so. Or was it that the type can't really be used
until the last patch of the series? (I'm sorry, it's been quite a while
since the previous version.)

Oh. There's no deadlock potential in this version patch set. But in v6, there was, and I used domain_pause/unpause() to avoid this. Later on, I realized that if I use different locks in the same order, the deadlock potential can be avoid and we do not need domain_pause/unpause
in this version.

@@ -365,6 +383,24 @@ static int dm_op(domid_t domid,
          break;
      }
+ case XEN_DMOP_map_mem_type_to_ioreq_server:
+    {
+        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
+            &op.u.map_mem_type_to_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        /* Only support for HAP enabled hvm. */
+        if ( !hap_enabled(d) )
+            break;
Perhaps better to give an error other than -EINVAL in this case?
If so, then the same error should likely also be used in your
set_mem_type() addition.
How about -ENOTSUP?

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -99,6 +99,7 @@ static int hvmemul_do_io(
      uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
  {
      struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
      struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
      ioreq_t p = {
          .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
@@ -140,7 +141,7 @@ static int hvmemul_do_io(
               (p.dir != dir) ||
               (p.df != df) ||
               (p.data_is_ptr != data_is_addr) )
-            domain_crash(curr->domain);
+            domain_crash(currd);
If you mean to do this transformation here, then please do so
consistently for the entire function.

OK. Thanks.

@@ -177,8 +178,65 @@ static int hvmemul_do_io(
          break;
      case X86EMUL_UNHANDLEABLE:
      {
-        struct hvm_ioreq_server *s =
-            hvm_select_ioreq_server(curr->domain, &p);
+        /*
+         * Xen isn't emulating the instruction internally, so see if
+         * there's an ioreq server that can handle it. Rules:
+         *
+         * - PIO and "normal" mmio run through hvm_select_ioreq_server()
+         * to choose the ioreq server by range. If no server is found,
+         * the access is ignored.
+         *
+         * - p2m_ioreq_server accesses are handled by the current
+         * ioreq_server for the domain, but there are some corner
+         * cases:
Who or what is "the current ioreq_server for the domain"?
It means "the current ioreq_server which maps the p2m_ioreq_server type for this domain"... I'd like to use a succinct phrase, but now seems not accurate enough. Any preference?

+         *   - If the domain ioreq_server is NULL, assume this is a
+         *   race between the unbinding of ioreq server and guest fault
+         *   so re-try the instruction.
+         *
+         *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
+         *   like a normal PIO or MMIO that doesn't have an ioreq
+         *   server (i.e., by ignoring it).
+         */
+        struct hvm_ioreq_server *s = NULL;
+        p2m_type_t p2mt = p2m_invalid;
+
+        if ( is_mmio )
+        {
+            unsigned long gmfn = paddr_to_pfn(addr);
+
+            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
Stray cast.

OK. Will remove it.


+            if ( p2mt == p2m_ioreq_server )
+            {
+                unsigned int flags;
+
+                s = p2m_get_ioreq_server(currd, &flags);
+
+                /*
+                 * If p2mt is ioreq_server but ioreq_server is NULL,
+                 * we probably lost a race with unbinding of ioreq
+                 * server, just retry the access.
+                 */
+                if ( s == NULL )
+                {
+                    rc = X86EMUL_RETRY;
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
+
+                /*
+                 * If the IOREQ_MEM_ACCESS_WRITE flag is not set,
+                 * we should set s to NULL, and just ignore such
+                 * access.
+                 */
+                if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
+                    s = NULL;
What is this about? You only allow WRITE registrations, so this looks
to be dead code. Yet if it is meant to guard against future enabling
of READ, then this clearly should not be done for reads.

It's to guard against future emulation of READ. We can remove it for now.

--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, 
ept_entry_t *entry,
              entry->r = entry->w = entry->x = 1;
              entry->a = entry->d = !!cpu_has_vmx_ept_ad;
              break;
+        case p2m_ioreq_server:
+            entry->r = 1;
+            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
Along the lines of the previous comment - if you mean to have the
code cope with READ, please also set ->r accordingly, or add a
comment why this won't have the intended effect (yielding a not
present EPTE).

How about we keep this code and do not support READ? I'll remove above dead code in hvmemul_do_io().

@@ -92,8 +94,13 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t 
mfn,
      default:
          return flags | _PAGE_NX_BIT;
      case p2m_grant_map_ro:
-    case p2m_ioreq_server:
          return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
+    case p2m_ioreq_server:
+        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
+        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
+            return flags & ~_PAGE_RW;
+        else
+            return flags;
Stray else. But even better would imo be

         if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
             flags &= ~_PAGE_RW;
         return flags;
Oh. Thanks. :)

+struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                              unsigned int *flags)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&p2m->ioreq.lock);
+
+    s = p2m->ioreq.server;
+    *flags = p2m->ioreq.flags;
+
+    spin_unlock(&p2m->ioreq.lock);
+    return s;
+}
I'm afraid this question was asked before, but since there's no
comment here or anywhere, I can't recall if there was a reason why
s potentially being stale by the time the caller looks at it is not a
problem.

Well, it is possibe that s is stale. I did not take it as a problem because the device model will later discard such io request. And I believe current hvm_select_ioreq_server() also has the same issue - the returned s should be considered to be stale, if the MMIO/PIO
address is removed from the ioreq server's rangeset.

Another thought is, if you think it is inappropriate for device model to do the check, we can use spin_lock_recursive on ioreq_server.lock to protect all the ioreq server select
and release the lock after the ioreq server is sent out.

--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -318,6 +318,30 @@ struct xen_dm_op_inject_msi {
      uint64_aligned_t addr;
  };
+/*
+ * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
+ *                                      to specific memroy type <type>
+ *                                      for specific accesses <flags>
+ *
+ * For now, flags only accept the value of XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
+ * which means only write operations are to be forwarded to an ioreq server.
+ * Support for the emulation of read operations can be added when an ioreq
+ * server has such requirement in future.
+ */
+#define XEN_DMOP_map_mem_type_to_ioreq_server 15
+
+struct xen_dm_op_map_mem_type_to_ioreq_server {
+    ioservid_t id;      /* IN - ioreq server id */
+    uint16_t type;      /* IN - memory type */
+    uint16_t pad;
Perhaps there was padding needed when this was a hvmop, but
now the padding does exactly the wrong thing.

Right, padding is useless in this new structure. I will remove it if other field does not change(I proposed a change for flag field in reply to Andrew's comments on patch 5). Thank you, Jan.

Yu

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.