[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0


  • To: G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 5 Jan 2022 15:33:33 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zMjIHHZJFeTFDTmLnE2ZBy3TmKC0C/cAuXPQ26YKXIs=; b=YbaLidohSE32TrPpCbdUhkYFVL9kMnuiytHZ8Tmj031v46WKpadSFARX1Gy85RSrM1IM6KewcEuzaWhqG616XINPlvg/6IHALTBNWYTB59D41LXa3nRYtFrJ+0legpUDW+wL05d+WuF+inUdrkPVTNpxY6nBnXA8yoE7DuZB/9/XH3Vh1D0NzkeRIVsVBxHJx7gt6l2Hh7ZOsDiTZr2L2zkFORLdodEKX44PDj0KY90YP0Bsa1t6EdoPQ156RgMrH+20SI2p7rm3FQPI5V6oayOJAnr9RL5IC3fiXypuCRL4XV44iUvK4lZQTT5vdUgDwuqbdiAswOdZmRR/AvcwbQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NY04HXZCviRRG3PcHUA32fQItJyOBon3n3GXpywkjGFwqM8Ns0GB5DYkxNH9wqYs4zBdwfDN6mHtzSvi/v/67DL5GgC/cTeWw6nRxQER21jSVYUOoRRQONqUR6l2iIHhVxOkThvsbrjKuqInnfnOB7/7IP5ju8SSGkg/XrteBCJHHdagnLcfkIse8Jr60+1YWg0Ea9DC5nCcOYeJUOzy/y/mLUfH1+Tj0lCgi4rOwhwhzT3o+Rsia+JQ08GvAoHFdb83GSFjnUD6t/bgTm/Tc9QVjt/YW2epRpCPTqjgVitvaAIvx4Cviu5pjQaL2Zph1OCEj+MQMZhoBdIJGwRyjQ==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 05 Jan 2022 14:34:21 +0000
  • Ironport-data: A9a23:pskElqgncUPcrdKeL/QyZiiwX161vhcKZh0ujC45NGQN5FlHY01je htvDD/SMquIYzbxedtwb4Xlo04CuZ+AyoBnTgA/pS02RS8b9cadCdqndUqhZCn6wu8v7a5EA 2fyTvGacajYm1eF/k/F3oAMKRCQ7InQLlbGILes1htZGEk0GE/NtTo5w7Rj2tcw3IDja++wk YiaT/P3aQfNNwFcagr424rbwP+4lK2v0N+wlgVWicFj5DcypVFMZH4sDfjZw0/DaptVBoaHq 9Prl9lVyI97EyAFUbtJmp6jGqEDryW70QKm0hK6UID66vROS7BbPg/W+5PwZG8O4whlkeydx /1D8piXaURqJJT8kchaWSFgIzh/OoZ/reqvzXiX6aR/zmXDenrohf5vEFs3LcsT/eMf7WNmr KJCbmpXN1ba2rzwkOnTpupE36zPKOHxO4wSoDd4xCzxBvc6W5HTBa7N4Le02R9u3pASTKmBP qL1bxJOKxjsUjNCFW4VM7t5sP703WHxXD5x/Qf9Sa0fvDGIkV0ZPKLWGMrYfJmGSNtYmm6cp 3na5CLpDxcCLtudxDGZtHW2iYfnkyLhVYcfCLC/7NZ3m1CJ3SoIDwAMXla1puO2hwi4Vs43F qAP0nNw9+5orhXtF4SjGU3jyJKZgvICc9QXF8oaq0KX85rdz0GTPlA0Vz9GQ+Vz4afaWgcW/ lOOmtroAxlmv7uUVW+R+9+okN+iBcQGBTRcPHFZFGPp9/Gm+dhu1UyXEr6PBYbo1rXI9SfML ydmRcTUr5EaloY12qqy5jgraBr898GSHmbZCug6N19JDz+Vhqb5N+RECnCBtJ6sybp1qHHa4 hDofODEtogz4WmlznDlfQn0NOjBCwy5GDPdm0VzOJIq6i6g/XWuFagJvm0nfx0wYp9aI2O3C KM2he+3zMQNVJdNRfUsC79d9uxwlfSwfTgbfq28giVyjmhZK1bcoXAGib+41GHxikk8+ZzTy r/AGftA+U0yUPw9pBLvHr91+eZymkgWmDOCLbimnk/P+efPOxaopUItbQLmghYRt/jf/m04M r93aqO39vmoeLGkP3SMr9dMdQtiwLpSLcmelvG7v9Wre2JOMGogF+XQ0fUmfYlklL5SjeDG4 je2XUow9bY1rSGvxdyiZi8xZbXxc4x4qH5nbyUgMUzxgyooYJq17bdZfJwyJOF1+OtmxP9yb v8EZ8TfXagfFmWZo2wQPcvnsYhvVBW3ngbSbSCrVycyIsx7TAvT9966Iga2rHsSDjC6vNcVq qG70l+JWoIKQglvVZ6EaP+mw16rk2IaneZ+AxnBLtVJIR2++4l2MS3hyPQwJphUexnEwzKb0 SeQAAsZ+raR89NkroGRiPnd/YmzEuZ4Ek5LJEXh7O67ZXvA426u4Y5cS+LULzrTY3z5pfe5b uJPwvCibPBexARWs5BxGqpAxL4l44e9vKdTywlpESmZb1mvDb88cHCK0dMW6/9Iz75d/wC3R liO6p9RPrDQYJHpF1sYJQwEaOWf1K5LxmmOvKpteEiqtjVq+LenUFlJO0jegSNQG7J5LYc5z Lpzo8UR8QG+1kInP9vuYvq4LIhQwqjsi5kai6w=
  • Ironport-hdrordr: A9a23:n+cJ+aGIS4aCFzYapLqFCpHXdLJyesId70hD6qkvc3Nom52j+/ xGws536faVslcssHFJo6HmBEClewKnyXcV2/hrAV7GZmfbUQSTXeNfBOfZsljd8mjFh5NgPM RbAtZD4b/LfCFHZK/BiWHSebZQo+VvsprY/ds2p00dMj2CAJsQiTuRZDzrdnGfE2J9dOYE/d enl4J6jgvlXU5SQtWwB3EDUeSGj9rXlKj+aRpDIxI88gGBgR6h9ba/SnGjr10jegIK5Y1n3X nOkgT/6Knmm/anyiXE32uWy5hNgtPuxvZKGcTJoMkILTfHjBquee1aKva/lQFwhNvqxEchkd HKrRtlF8Nv60nJdmXwmhfp0xmI6kdY11bSjXujxVfzq83wQzw3T+Bbg5hCTxff4008+Plhza NixQuixtVqJCKFuB64y8nDVhlsmEbxi2Eli/Qvg3tWVpZbQKNNrLYY4FheHP47bW7HAbgcYa hT5fznlbZrmQvwVQGbgoAv+q3gYp0LJGbJfqBY0fblkQS/nxhCvj8lLYIk7zI9HakGOul5Dt L/Q9FVfYF1P74rhJ1GdZQ8qOuMexrwqEH3QSuvyWqOLtB0B5uKke+y3IkI
  • Ironport-sdr: EFuKXv5+E1FLRzcxoMzd9ASmB9vHb2xd8+wRvFd+pdKCx6WWK5kFk+YvcZ4jnnHxIo7nhWZz4L 9XyLweeHZeV3S4R4Cu9KGzo5V49aKPHewZmWtdw+k3NjNP2rr6qS5TYoHvY54aQ5sIxvFlNBE7 7nHZH3/gv5xCS4SH7U66TO2ubKZdZnOOyzVDw3zWEQ3hVGsVSxoTpuH1hIMskoh9eZkpvuw3E4 p0X5T9zEuP1bytgVq2XlxgP6mtqVLodn8Bo1EWELd0e8LehnFBXR9yANWE7dRFE5HfwZ3g8NOk RaFudPtga/biwRsmvZgio0a3
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jan 05, 2022 at 12:05:39AM +0800, G.R. wrote:
> > > > > But seems like this patch is not stable enough yet and has its own
> > > > > issue -- memory is not properly released?
> > > >
> > > > I know. I've been working on improving it this morning and I'm
> > > > attaching an updated version below.
> > > >
> > > Good news.
> > > With this  new patch, the NAS domU can serve iSCSI disk without OOM
> > > panic, at least for a little while.
> > > I'm going to keep it up and running for a while to see if it's stable 
> > > over time.
> >
> > Thanks again for all the testing. Do you see any difference
> > performance wise?
> I'm still on a *debug* kernel build to capture any potential panic --
> none so far -- no performance testing yet.
> Since I'm a home user with a relatively lightweight workload, so far I
> didn't observe any difference in daily usage.
> 
> I did some quick iperf3 testing just now.

Thanks for doing this.

> 1. between nas domU <=> Linux dom0 running on an old i7-3770 based box.
> The peak is roughly 12 Gbits/s when domU is the server.
> But I do see regression down to ~8.5 Gbits/s when I repeat the test in
> a short burst.
> The regression can recover when I leave the system idle for a while.
> 
> When dom0 is the iperf3 server, the transfer rate is much lower, down
> all the way to 1.x Gbits/s.
> Sometimes, I can see the following kernel log repeats during the
> testing, likely contributing to the slowdown.
>              interrupt storm detected on "irq2328:"; throttling interrupt 
> source

I assume the message is in the domU, not the dom0?

> Another thing that looks alarming is the retransmission is high:
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec   212 MBytes  1.78 Gbits/sec  110    231 KBytes
> [  5]   1.00-2.00   sec   230 MBytes  1.92 Gbits/sec    1    439 KBytes
> [  5]   2.00-3.00   sec   228 MBytes  1.92 Gbits/sec    3    335 KBytes
> [  5]   3.00-4.00   sec   204 MBytes  1.71 Gbits/sec    1    486 KBytes
> [  5]   4.00-5.00   sec   201 MBytes  1.69 Gbits/sec  812    258 KBytes
> [  5]   5.00-6.00   sec   179 MBytes  1.51 Gbits/sec    1    372 KBytes
> [  5]   6.00-7.00   sec  50.5 MBytes   423 Mbits/sec    2    154 KBytes
> [  5]   7.00-8.00   sec   194 MBytes  1.63 Gbits/sec  339    172 KBytes
> [  5]   8.00-9.00   sec   156 MBytes  1.30 Gbits/sec  854    215 KBytes
> [  5]   9.00-10.00  sec   143 MBytes  1.20 Gbits/sec  997   93.8 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  1.76 GBytes  1.51 Gbits/sec  3120             sender
> [  5]   0.00-10.45  sec  1.76 GBytes  1.44 Gbits/sec                  receiver

Do you see the same when running the same tests on a debug kernel
without my patch applied? (ie: a kernel build yourself from the same
baseline but just without my patch applied)

I'm mostly interested in knowing whether the patch itself causes any
regressions from the current state (which might not be great already).

> 
> 2. between a remote box <=> nas domU, through a 1Gbps ethernet cable.
> Roughly saturate the link when domU is the server, without obvious perf drop
> When domU running as a client, the achieved BW is ~30Mbps lower than the peak.
> Retransmission sometimes also shows up in this scenario, more
> seriously when domU is the client.
> 
> I cannot test with the stock kernel nor with your patch in release
> mode immediately.
> 
> But according to the observed imbalance between inbounding and
> outgoing path, non-trivial penalty applies I guess?

We should get a baseline using the same sources without my path
applied.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.