[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0


  • To: G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 4 Jan 2022 11:25:03 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2CoJ5Z+gCsz/TXYsj4JV1gOcEDecdibliOyT7JUOI8E=; b=fV0UjfmWeCT7o6RhXB+TX15c53u5tJW0dlhoXeXEvmEqtmQTaJhfcG4cmo9Lso73kmMwFgrSLH2mW0GBwG2BSmrFLjA6SwTExoQlgxAeqkZOmYH3KvbAyzdoqKGsqSMlKvfhXXibx7bpH13fMGqAuimlOs1KX1y7FvuySVekWwjs7sYE/Cj67F+Uur6TxAwXIGxSwpqyahEVGi/AjYziiR1jius4GUtvx6IMisCctAhNqdz04r93IFGFJU1gOlyiBtDpxav/DsCOR/aLJ2RgiuC2Op2DrRpc1/Z0eVWJxYHWUMDHjDBRsTF/5Lqy5wWhpzWSmcT9PSsnuWWbJSivOQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aOk/BMJ+suUpgiC0G/Z/dadW/r0WYJ5l7CoJt35LacIuhK/xK9G6oj4Z3H4FDXRwAGXGNfQXqJZk94uIT/9k8bES7YdbGj0NFJ9wH66PFFiSWaEdA32WdSRZE8E2k6qD2ZjuQlmAz01Vny8icNSSQTjBwMTwxVQ3ltO+RIZU6hNe+th5UY7DK5FmnDRUYe5CPJWrKTODjicz5gayFl81f1DHtv5fmo7/hw2P3gffEKaKe9l7HjX4L60eMbgZJYo5VbAejY+3XT/MAfTWbXjKD2vRoXMCKnxNL5zSieHSrA1rNleR3WWxpQ+UkaCosaWk5ZX4Dfg78uv320+6MojjGg==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 04 Jan 2022 10:25:44 +0000
  • Ironport-data: A9a23:Y82r+6O3naKXr0XvrR1vkMFynXyQoLVcMsEvi/4bfWQNrUp01TFWm GsfDWHSaf6MYWP9fo9zOoXjpB4DvZOExoRkHgto+SlhQUwRpJueD7x1DKtR0wB+jCHnZBg6h ynLQoCYdKjYdpJYz/uUGuCJQUNUjMlkfZKhTr6UUsxNbVU8En150Eo/w7dRbrNA2rBVPSvc4 bsenOWHULOV82Yc3rU8sv/rRLtH5ZweiRtA1rAMTakjUGz2zhH5OKk3N6CpR0YUd6EPdgKMq 0Qv+5nilo/R109F5tpICd8XeGVSKlLZFVDmZna7x8FOK/WNz8A/+v9TCRYSVatYoyegwulPy ud2iZGhRlogM4qLxMc/biANRkmSPYUekFPGCX22sMjVxEzaaXr8hf5pCSnaP6VBpLwxWzsXs 6VFdnZdNXhvhMrvqF6/YvNrick5atHiIasUu216zCGfBvEjKXzGa/uSuoMCg2ps7ixINbH0e vcbYBZvVT7JWRdBJW1KE4w3tt790xETdBUH8QnI9MLb+VP73AF3lbTgLtfRUtiLXtlO2FaVo Hrc+Gb0CQ1cM8aQoRKL82ihg+LTkCThcJ8JGaejsOVtnUeYy2IUEhIbE122vZGEZlWWAowFb RZOo2x38PZ0pBfDosTBswOQslfZhyAeaeFsHvw1tQiA7e2IwCSkPz1RJtJeU+AOuMgzTD0s8 1aGmdL1GDBi2IGopWKhGqS89m3rZ3VMRYMWTWpdFFZevYG/yG0mpk+XJuuPBpJZmTEc9dvY5 zmR5BYziLwI5SLg//XqpAuX695AS3Wgc+LU2uk1dj70hu+aTNT8D2BN1bQ8xawaRLt1tnHb4 BA5dzG2tYji962lmi2XW/kqF7q0/fuDOzC0qQcxQ8B7rGnwoSL9Jtg4DNRCyKBBaJ1sRNMUS BWL5VM5CGF7YRNGkpObk6ruUp93nMAM5PzuV+zOb8omX3SCXFTvwc2aXmbJhzqFuBF1yckXY M7HGe7xXSdyIfk2l1KeGrZCuYLHMwhjnAs/s7iglE/5uVdfDVbIIYo43KymMrpksfja+VqNq L6y9aKikn1ibQE3WQGOmaY7JlEWN3krQ5fwrs1cbOmYJQR6XmomDpfsLXkJIeSJRoxZybXF+ G+TQEhdxAatjHHLM1zSOHtidKnuTdB0qndiZX4gOlOh2n4CZ4ez7fhAK8trLOd/rOEzn+RpS /QletmbBqgdQDrw5DlAP4L2q5ZvdUr3iFvWbTalejU2Y7VpWxfNpo3/ZgLq+SRXVni3uMIyr qeOzATeRZZfFQ1uANyPMKCkzk+rvGhbk+V3BhOaLt5WcUTq0Y5rNy2u0aNnf5BScU3On2LI2 RyXDBEUofj2j7U0qNSZ17qZq4qJEvdlGhYIFWfs8rvrZzLR+XCuwNEcXb/QLyzdTm795I6re f5Rk6PnKPQCkVtH79h8HrJswf5s7tfjveYHnAFtHXGNZFW3ELJwZHKB2JAX5KFKw7ZYvyqwW 16OpYYGaenYZpu9HQ5DPhchY8SCyeoQy2vb4vkCKUnn4DN6oeicWkJIMhjQ0CFQIdOZ6m/+L TvNbCLO1zGCtw==
  • Ironport-hdrordr: A9a23:g6ZeK6OxNZ7vr8BcT1n155DYdb4zR+YMi2TDiHofdfUFSKClfp 6V8cjztSWUtN4QMEtQ/uxoHJPwO080kqQFnLX5XI3SJzUO3VHHEGgM1/qB/9SNIVyaygcZ79 YdT0EcMqyAMbEZt7eC3ODQKb9Jq7PmgcOVbKXlvg9QpGlRGt9dBmxCe2Cm+yNNNW177c1TLu vi2iMLnUvqRV0nKuCAQlUVVenKoNPG0LrgfB49HhYirC2Dlymh5rLWGwWRmk52aUID/Z4StU z+1yDp7KSqtP+2jjfaym/o9pxT3P/s0MFKCsCggtUcbh/slgGrToJ8XKDqhkF+nMifrHIR1P XcqRYpOMp+r1vXY2GOuBPonzLt1T4/gkWSv2OwsD/Gm4jUVTg6A81OicZyaR3C8Xctu9l6ze Ziw3+Zn4A/N2KPoA3No/zzEz16nEu9pnQv1cQJiWZEbIcYYLhN6aQC4UJuFosaFi6S0vFpLA BXNrCd2B9qSyLYU5iA1VMfguBEH05DUitue3Jy+/B8iFNt7TVEJ0hx/r1pop5PzuN4d3B+3Z W2Dk1frsA7ciYnV9MMOA4/e7rENoXse2OEDIvAGyWuKEk4U0i93qIfpo9Fo92XRA==
  • Ironport-sdr: yfrtZWW73y/8JpfOKtAkrjTXDqdPBOAPyjuEzpVVkI4dQ82d6vmTdefJ3LPegBbg7X2eCiYQ1R U7Gar8QZTi6+8WI1NZGgOHqoCOB3uAW3DmXLtEwZZepvU7awFqWc/oGeOeinqRBZEA2G0m2rmW 2/8iFduKk9qp7zKUtGFCDccvjjhotKWa6O3MmL8LsdE9XSzrNuKz1GzziE2711MLpOj3RQ6q19 ygQetqCkRPIB1Fm0pZAaMHvAfnlzqpy7Kq6TFuvAM6aDb9t4SO2oZZRELJiYDka8z1XxmNX3Zn 2mkT87sr1l1MDGVgWucUdIDX
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Dec 31, 2021 at 10:47:57PM +0800, G.R. wrote:
> On Fri, Dec 31, 2021 at 2:52 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
> >
> > On Thu, Dec 30, 2021 at 11:12:57PM +0800, G.R. wrote:
> > > On Thu, Dec 30, 2021 at 3:07 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> 
> > > wrote:
> > > >
> > > > On Wed, Dec 29, 2021 at 11:27:50AM +0100, Roger Pau Monné wrote:
> > > > > On Wed, Dec 29, 2021 at 05:13:00PM +0800, G.R. wrote:
> > > > > > >
> > > > > > > I think this is hitting a KASSERT, could you paste the text 
> > > > > > > printed as
> > > > > > > part of the panic (not just he backtrace)?
> > > > > > >
> > > > > > > Sorry this is taking a bit of time to solve.
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > Sorry that I didn't make it clear in the first place.
> > > > > > It is the same cross boundary assertion.
> > > > >
> > > > > I see. After looking at the code it seems like sglist will coalesce
> > > > > contiguous physical ranges without taking page boundaries into
> > > > > account, which is not suitable for our purpose here. I guess I will
> > > > > either have to modify sglist, or switch to using bus_dma. The main
> > > > > problem with using bus_dma is that it will require bigger changes to
> > > > > netfront I think.
> > > >
> > > > I have a crappy patch to use bus_dma. It's not yet ready for upstream
> > > > but you might want to give it a try to see if it solves the cross page
> > > > boundary issues.
> > > >
> > > I think this version is better.
> >
> > Thanks for all the testing.
> >
> > > It fixed the mbuf cross boundary issue and allowed me to boot from one
> > > disk image successfully.
> >
> > It's good to know it seems to handle splitting mbufs fragments at page
> > boundaries correctly.
> >
> > > But seems like this patch is not stable enough yet and has its own
> > > issue -- memory is not properly released?
> >
> > I know. I've been working on improving it this morning and I'm
> > attaching an updated version below.
> >
> Good news.
> With this  new patch, the NAS domU can serve iSCSI disk without OOM
> panic, at least for a little while.
> I'm going to keep it up and running for a while to see if it's stable over 
> time.

Thanks again for all the testing. Do you see any difference
performance wise?

> BTW, an irrelevant question:
> What's the current status of HVM domU on top of storage driver domain?
> About 7 years ago, one user on the list was able to get this setup up
> and running with your help (patch).[1]
> When I attempted to reproduce a similar setup two years later, I
> discovered that the patch was not submitted.
> And even with that patch the setup cannot be reproduced successfully.
> We spent some time debugging on the problem together[2], but didn't
> bottom out the root cause at that time.
> In case it's still broken and you still have the interest and time, I
> can launch a separate thread on this topic and provide required
> testing environment.

Yes, better as a new thread please.

FWIW, I haven't looked at this since a long time, but I recall some
fixes in order to be able to use driver domains with HVM guests, which
require attaching the disk to dom0 in order for the device model
(QEMU) to access it.

I would give it a try without using stubdomains and see what you get.
You will need to run `xl devd` inside of the driver domain, so you
will need to install xen-tools on the domU. There's an init script to
launch `xl devd` at boot, it's called 'xendriverdomain'.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.