[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0


  • To: G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 27 Dec 2021 20:04:51 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=xAk4XF8tvVh9xBscIEFUx52+6Fp2zsBn90ubOGdTIlU=; b=n31FWspLXqbYaHNM6XjYsh5UOAQzMhfghXLGeSocCLlBq1np24FOSe1YSYZo17NtS/mkKD2aXv7qCXbeZAQgRnjPEkm663sZCZd5y0A1SDIcfYJvmGhyI0K0j9m5oEGtJ1aO69ylefXf4EXAJr3CdiAYso40lnbBGI/5qfL42qeMngSMQCukHViQBv3sRdW84/hzj+mCCuL2GzTIF/WUQQ74jLgBjFyLZGEKLQzoduA4nNFDTSyHcbxIlOjZg35SC0QUW5z/KtBbI5aBgx+jFslyCqUiUATiy2BJe85h1TuRiuiqpuy7GDhnCy1oMdHWsmaFKRgocJ6U47OLc4RJPA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Rg8M7blBKPKWWu5dMvhpRirHR63b2XxdSrZ9q3YuIjWB8K2gThAgwD5ppBc4NClojc+kEIFv4o2Jb0R+Q7m6nAakROHLhVfEP8ldHlaFJE+b5lnuTjeCI2UtLQuY53Kfi8Z8vD0nQVQ0sQHDUKp9ukqu+E4inPE3woeBJJaVcyczgj6Wal4y4DJ+Bw1dv8on1utVDvMhda33TVmqCHRneS25lRUvyva5whWO9nVRbdtaUbuzIAu3sPuBgwRYma/QlcSu22xRbpPsjnfDJu5bw+XT9wbA/ZEqwbmu0SYXwhOWzK7DKuJNM0t5KI+Ypfh61B7vomWSN6G3RafcLLwb4Q==
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Mon, 27 Dec 2021 19:05:33 +0000
  • Ironport-data: A9a23:OPnD6asF1/ScabvADyXyDZfya+fnVJlZMUV32f8akzHdYApBsoF/q tZmKWGOMveNY2PyLdhwbYqw9BlUvJDVmN5nGgZkrigyQiJD+JbJXdiXEBz9bniYRiHhoOOLz Cm8hv3odp1coqr0/0/1WlTZQP0VOZigHtIQMsadUsxKbVIiGHdJZS5LwbZj29cy24fhWWthh PupyyHhEA79s9JLGjp8B5Kr8HuDa9yr5Vv0FnRnDRx6lAe2e0s9VfrzFonoR5fMeaFGH/bSe gr25OrRElU1XfsaIojNfr7TKiXmS1NJVOSEoiI+t6OK2nCuqsGuu0qS2TV1hUp/0l20c95NJ NpliMW+ESxwb4D2wuEWTT1zAxtGL/BrweqSSZS/mZT7I0zudnLtx7NlDV0sPJ1e8eFyaY1M3 aVGcnZXNEnF3r/ohuLgIgVvrp1LwM3DJoQQt2sm1TjEJf0nXYrCU+PB4towMDIY2JofQKiGN 5BxhTxHZTGdXixFYVcrOJMGueyFrEj/eCcCpwfAzUYwyzeKl1EguFT3C/LFd9rPSchLk0Kwo mPd43+/EhwcctuFxlKt/HO2i+rCgS78QqoPD7Ci7bhxh0CJzWEdDwcZWB29rOXRt6Klc4sBc QpOoHNo9PVsshzwJjXgY/GmiEPdsEY3Aot9Kbc/4x+p4bru2xTEK1FRG1atd+canMMxQDUr0 HqAkNXoGSFjvdWpdJ6NyluHhWjsYHZIdAfucQdBFFJYuIe7/OnfmzqWFo47eJNZmOEZDt0ZL 9qiiCElz4segscQv0lQ1QCW2mn8znQlo+Nc2+k2Yo5Hxl4gDGJGT9bxgbQ+0RqnBNzHJmRtR FBex6CjABkmVPlhbhClTuQXB62O7P2YKjDailMHN8B/q2zwpyL5It0PuGkWyKJV3iEsI2CBX aMukVkJuM870IWCM8ebnL5d++x1lPO9RLwJp9jfb8ZUY4gZSeN01HoGWKJk5Ei0yBJEufhmY f+zKJ/wZV5HWfUP5GfnHI81jO50rh3SMEuOHPgXOTz8iuHADJNUIJ9YWGazghcRsPnZ/V6Lq okHbKNnCXx3CYXDX8UeyqZKRXgiJnknH5Hm7ctRc++IOA19H289TfTWxNscl0ZNxsy5T8/Eo SOwXFF20l36iSGVIAmGcCk7OrjuQYx+vTQwOil1ZQSk3H0qYICO6qYDdsRoIel7pbI7lfMkH eMYf8igA+hUTmiV8ToqcpSg/pdpcw6mhFzSMnP9MiQ/ZZNpWyfA5sTgIln07CALAyfu7Zk+r rSs2xn1W50GQwg+Xs/aZOj2lwG6vGQHmfI0VEzNe4EBdELp+YlsCirwkv5ofJ1cdUSdnmOXj l/EDw0ZqO/Bp54O3OPI3a3U/Z20F+ZeH1ZBGzWJ57iBKiSHrHGoxpVNUbjUcGmFBn/04qire c5c0+r4bK8chF9PvodxT+RrwKY564e9rrNW1F05TnDCblDtAbJ8OHiWm8JIs/QVlLNevAK3X GOJ+8VbZurVaJ+0TgZJKVp3dPmH2NEVhiLWvKY8L0jN7SNq+KaKDBdJNB6WhS0BdLZ4PevJG wv6VBL6P+BnticXDw==
  • Ironport-hdrordr: A9a23:BY0XU6CxqHNOoCTlHemi55DYdb4zR+YMi2TDtnocdfUxSKelfq +V88jzuSWbtN9yYhEdcKG7WZVoKEm0nfQZ3WB7B8bAYOCJghrMEKhSqafk3j38C2nf24dmpM NdmnFFeb/NMWQ=
  • Ironport-sdr: lRXa0Z4RaBCuuZ7TmRXf43/D3R5q37Aky3qV8V6bEXBr1VokA+SjUGfUFj5cQOzqMNDSagbust QAL8J4IZRHzwQBbziCLgxG3biz+1/lNWINecp1xqayzAY2fESs4n4E33+RNNGEGNrySZhMgLzN CfVq3DPfzR4dPmq1b9yaxaUOfF+sWuZ3HAszlQVttbnl93pHIfAlWw4ggf9dbX4muXX9hJHUeC y3rFg4LdPAw8mNm4udXEW2OuBn9f188hl3eNeWhbtbBy3PO5zJXz6OXMWfVSzWZzA6fAD3QLmu UQl0uqjJ1Q3b45BMg3V/JrqV
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Sun, Dec 26, 2021 at 02:06:55AM +0800, G.R. wrote:
> > > Thanks. I've raised this on freensd-net for advice [0]. IMO netfront
> > > shouldn't receive an mbuf that crosses a page boundary, but if that's
> > > indeed a legit mbuf I will figure out the best way to handle it.
> > >
> > > I have a clumsy patch (below) that might solve this, if you want to
> > > give it a try.
> >
> > Applied the patch and it worked like a charm!
> > Thank you so much for your quick help!
> > Wish you a wonderful holiday!
> 
> I may have said too quickly...
> With the patch I can attach the iscsi disk and neither the dom0 nor
> the NAS domU complains this time.
> But when I attempt to mount the attached disk it reports I/O errors randomly.
> By randomly I mean different disks behave differently...
> I don't see any error logs from kernels this time.
> (most of the iscsi disks are NTFS FS and mounted through the user mode
> fuse library)
> But since I have a local backup copy of the image, I can confirm that
> mounting that backup image does not result in any I/O error.
> Looks like something is still broken here...

Indeed. That patch was likely too simple, and didn't properly handle
the split of mbuf data buffers.

I have another version based on using sglist, which I think it's also
a worthwhile change for netfront. Can you please give it a try? I've
done a very simple test and seems fine, but you certainly have more
interesting cases.

You will have to apply it on top of a clean tree, without any of the
other patches applied.

Thanks, Roger.
---
diff --git a/sys/dev/xen/netfront/netfront.c b/sys/dev/xen/netfront/netfront.c
index 8dba5a8dc6d5..37ea7b1fa059 100644
--- a/sys/dev/xen/netfront/netfront.c
+++ b/sys/dev/xen/netfront/netfront.c
@@ -33,6 +33,8 @@ __FBSDID("$FreeBSD$");
 #include "opt_inet.h"
 #include "opt_inet6.h"
 
+#include <sys/types.h>
+
 #include <sys/param.h>
 #include <sys/sockio.h>
 #include <sys/limits.h>
@@ -40,6 +42,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/malloc.h>
 #include <sys/module.h>
 #include <sys/kernel.h>
+#include <sys/sglist.h>
 #include <sys/socket.h>
 #include <sys/sysctl.h>
 #include <sys/taskqueue.h>
@@ -199,6 +202,12 @@ struct netfront_txq {
        struct taskqueue        *tq;
        struct task             defrtask;
 
+       struct sglist           *segments;
+       struct mbuf_refcount {
+               struct m_tag    tag;
+               u_int           count;
+       }                       refcount_tag[NET_TX_RING_SIZE + 1];
+
        bool                    full;
 };
 
@@ -301,6 +310,38 @@ xn_get_rx_ref(struct netfront_rxq *rxq, RING_IDX ri)
        return (ref);
 }
 
+#define MTAG_REFCOUNT 0
+
+static void mbuf_grab(uint32_t cookie, struct mbuf *m)
+{
+       struct mbuf_refcount *ref;
+
+       ref = (struct mbuf_refcount *)m_tag_locate(m, cookie, MTAG_REFCOUNT,
+           NULL);
+       KASSERT(ref != NULL, ("Cannot find refcount"));
+       ref->count++;
+}
+
+static void mbuf_release(uint32_t cookie, struct mbuf *m)
+{
+       struct mbuf_refcount *ref;
+
+       ref = (struct mbuf_refcount *)m_tag_locate(m, cookie, MTAG_REFCOUNT,
+           NULL);
+       KASSERT(ref != NULL, ("Cannot find refcount"));
+       KASSERT(ref->count > 0, ("Invalid reference count"));
+
+       if(--ref->count == 0)
+               m_freem(m);
+}
+
+static void tag_free(struct m_tag *t)
+{
+       struct mbuf_refcount *ref = (struct mbuf_refcount *)t;
+
+       KASSERT(ref->count == 0, ("Free mbuf tag with pending refcnt"));
+}
+
 #define IPRINTK(fmt, args...) \
     printf("[XEN] " fmt, ##args)
 #ifdef INVARIANTS
@@ -778,7 +819,7 @@ disconnect_txq(struct netfront_txq *txq)
 static void
 destroy_txq(struct netfront_txq *txq)
 {
-
+       sglist_free(txq->segments);
        free(txq->ring.sring, M_DEVBUF);
        buf_ring_free(txq->br, M_DEVBUF);
        taskqueue_drain_all(txq->tq);
@@ -826,6 +867,11 @@ setup_txqs(device_t dev, struct netfront_info *info,
                for (i = 0; i <= NET_TX_RING_SIZE; i++) {
                        txq->mbufs[i] = (void *) ((u_long) i+1);
                        txq->grant_ref[i] = GRANT_REF_INVALID;
+                       m_tag_setup(&txq->refcount_tag[i].tag,
+                           (unsigned long)txq, MTAG_REFCOUNT,
+                           sizeof(txq->refcount_tag[i]) -
+                           sizeof(txq->refcount_tag[i].tag));
+                       txq->refcount_tag[i].tag.m_tag_free = &tag_free;
                }
                txq->mbufs[NET_TX_RING_SIZE] = (void *)0;
 
@@ -874,10 +920,18 @@ setup_txqs(device_t dev, struct netfront_info *info,
                        device_printf(dev, "xen_intr_alloc_and_bind_local_port 
failed\n");
                        goto fail_bind_port;
                }
+
+               txq->segments = sglist_alloc(MAX_TX_REQ_FRAGS, M_WAITOK);
+               if (txq->segments == NULL) {
+                       device_printf(dev, "failed to allocate sglist\n");
+                       goto fail_sglist;
+               }
        }
 
        return (0);
 
+fail_sglist:
+       xen_intr_unbind(&txq->xen_intr_handle);
 fail_bind_port:
        taskqueue_drain_all(txq->tq);
 fail_start_thread:
@@ -1041,7 +1095,7 @@ xn_release_tx_bufs(struct netfront_txq *txq)
                if (txq->mbufs_cnt < 0) {
                        panic("%s: tx_chain_cnt must be >= 0", __func__);
                }
-               m_free(m);
+               mbuf_release((unsigned long)txq, m);
        }
 }
 
@@ -1311,7 +1365,7 @@ xn_txeof(struct netfront_txq *txq)
                        txq->mbufs[id] = NULL;
                        add_id_to_freelist(txq->mbufs, id);
                        txq->mbufs_cnt--;
-                       m_free(m);
+                       mbuf_release((unsigned long)txq, m);
                        /* Only mark the txq active if we've freed up at least 
one slot to try */
                        ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
                }
@@ -1507,22 +1561,6 @@ xn_get_responses(struct netfront_rxq *rxq,
        return (err);
 }
 
-/**
- * \brief Count the number of fragments in an mbuf chain.
- *
- * Surprisingly, there isn't an M* macro for this.
- */
-static inline int
-xn_count_frags(struct mbuf *m)
-{
-       int nfrags;
-
-       for (nfrags = 0; m != NULL; m = m->m_next)
-               nfrags++;
-
-       return (nfrags);
-}
-
 /**
  * Given an mbuf chain, make sure we have enough room and then push
  * it onto the transmit ring.
@@ -1530,16 +1568,22 @@ xn_count_frags(struct mbuf *m)
 static int
 xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 {
-       struct mbuf *m;
        struct netfront_info *np = txq->info;
        struct ifnet *ifp = np->xn_ifp;
-       u_int nfrags;
-       int otherend_id;
+       u_int nfrags, i;
+       int otherend_id, rc;
+
+       sglist_reset(txq->segments);
+       rc = sglist_append_mbuf(txq->segments, m_head);
+       if (rc != 0) {
+               m_freem(m_head);
+               return (rc);
+       }
 
        /**
         * Defragment the mbuf if necessary.
         */
-       nfrags = xn_count_frags(m_head);
+       nfrags = txq->segments->sg_nseg;
 
        /*
         * Check to see whether this request is longer than netback
@@ -1551,6 +1595,8 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct 
mbuf *m_head)
         * the Linux network stack.
         */
        if (nfrags > np->maxfrags) {
+               struct mbuf *m;
+
                m = m_defrag(m_head, M_NOWAIT);
                if (!m) {
                        /*
@@ -1561,11 +1607,15 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct 
mbuf *m_head)
                        return (EMSGSIZE);
                }
                m_head = m;
+               sglist_reset(txq->segments);
+               rc = sglist_append_mbuf(txq->segments, m_head);
+               if (rc != 0) {
+                       m_freem(m_head);
+                       return (rc);
+               }
+               nfrags = txq->segments->sg_nseg;
        }
 
-       /* Determine how many fragments now exist */
-       nfrags = xn_count_frags(m_head);
-
        /*
         * Check to see whether the defragmented packet has too many
         * segments for the Linux netback driver.
@@ -1604,14 +1654,15 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct 
mbuf *m_head)
         * the fragment pointers. Stop when we run out
         * of fragments or hit the end of the mbuf chain.
         */
-       m = m_head;
        otherend_id = xenbus_get_otherend_id(np->xbdev);
-       for (m = m_head; m; m = m->m_next) {
+       for (i = 0; i < nfrags; i++) {
                netif_tx_request_t *tx;
                uintptr_t id;
                grant_ref_t ref;
                u_long mfn; /* XXX Wrong type? */
+               struct sglist_seg *seg;
 
+               seg = &txq->segments->sg_segs[i];
                tx = RING_GET_REQUEST(&txq->ring, txq->ring.req_prod_pvt);
                id = get_id_from_freelist(txq->mbufs);
                if (id == 0)
@@ -1621,17 +1672,22 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct 
mbuf *m_head)
                if (txq->mbufs_cnt > NET_TX_RING_SIZE)
                        panic("%s: tx_chain_cnt must be <= NET_TX_RING_SIZE\n",
                            __func__);
-               txq->mbufs[id] = m;
+               if (i == 0)
+                       m_tag_prepend(m_head, &txq->refcount_tag[id].tag);
+               mbuf_grab((unsigned long)txq, m_head);
+               txq->mbufs[id] = m_head;
                tx->id = id;
                ref = gnttab_claim_grant_reference(&txq->gref_head);
                KASSERT((short)ref >= 0, ("Negative ref"));
-               mfn = virt_to_mfn(mtod(m, vm_offset_t));
+               mfn = atop(seg->ss_paddr);
                gnttab_grant_foreign_access_ref(ref, otherend_id,
                    mfn, GNTMAP_readonly);
                tx->gref = txq->grant_ref[id] = ref;
-               tx->offset = mtod(m, vm_offset_t) & (PAGE_SIZE - 1);
+               tx->offset = seg->ss_paddr & PAGE_MASK;
+               KASSERT(tx->offset + seg->ss_len <= PAGE_SIZE,
+                   ("mbuf segment crosses a page boundary"));
                tx->flags = 0;
-               if (m == m_head) {
+               if (i == 0) {
                        /*
                         * The first fragment has the entire packet
                         * size, subsequent fragments have just the
@@ -1640,7 +1696,7 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct 
mbuf *m_head)
                         * subtracting the sizes of the other
                         * fragments.
                         */
-                       tx->size = m->m_pkthdr.len;
+                       tx->size = m_head->m_pkthdr.len;
 
                        /*
                         * The first fragment contains the checksum flags
@@ -1654,12 +1710,12 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct 
mbuf *m_head)
                         * so we have to test for CSUM_TSO
                         * explicitly.
                         */
-                       if (m->m_pkthdr.csum_flags
+                       if (m_head->m_pkthdr.csum_flags
                            & (CSUM_DELAY_DATA | CSUM_TSO)) {
                                tx->flags |= (NETTXF_csum_blank
                                    | NETTXF_data_validated);
                        }
-                       if (m->m_pkthdr.csum_flags & CSUM_TSO) {
+                       if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
                                struct netif_extra_info *gso =
                                        (struct netif_extra_info *)
                                        RING_GET_REQUEST(&txq->ring,
@@ -1667,7 +1723,7 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct 
mbuf *m_head)
 
                                tx->flags |= NETTXF_extra_info;
 
-                               gso->u.gso.size = m->m_pkthdr.tso_segsz;
+                               gso->u.gso.size = m_head->m_pkthdr.tso_segsz;
                                gso->u.gso.type =
                                        XEN_NETIF_GSO_TYPE_TCPV4;
                                gso->u.gso.pad = 0;
@@ -1677,9 +1733,9 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct 
mbuf *m_head)
                                gso->flags = 0;
                        }
                } else {
-                       tx->size = m->m_len;
+                       tx->size = seg->ss_len;
                }
-               if (m->m_next)
+               if (i != nfrags - 1)
                        tx->flags |= NETTXF_more_data;
 
                txq->ring.req_prod_pvt++;




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.