[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/console: do not drop serial output from the hardware domain


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 14 Jun 2022 11:38:28 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EMEGXRZfJV8cRfQIPJiIUOeSoPS/a1AqUpOTX0HMXjM=; b=PeoXy3PqK36k8NKbtlulixtCQA0P79qwdX1XKCRsA4twAv7yJGer4nZgAOFhRde0fJDwvpcESSacMcfmMm3bGjbthJY3CGtu3v2rYNmyCF59wr+6TGjqwOgNbQRKnfvDdV5gMebKKjPnhi1fc1pTNk83XTV2vgv6gGshVjDgdCYIYimI/S6NDqLhzC+L1tsmz5/TEgtJXDfeeDfwAH9tEF+Wfe0puAzxNZdECZ9r2P2f9aSLaU3SHyE7c8Z5S39SCbOmDSIv1nqG98UcXdyK/Ju4YZ5QGf1x8/BK0cq4787Hv5wSFmjw3QLG16KLknWDRhbGKPo0BXjk5SlO3mKAlQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Dw7KAUf88nyrjxfuuJPGCi1uvZtB8cwQov9qagSyZA1PHFWuI3eGbHHM2U5yiYPpjC82goArrVsTS+U+WgG9Thu3FJvayeUeEQ+2JhUtUcSwu9vhMNkmKZI4jXrErD8nZnaonGL7YDU0lck3JUOKcpHzd7LHfzEvbo3SM01HsuJkxaE6BuAJAX9zRXK8JCDETiEQbG8Vm0e3iVbqEkqNh1dS858ilVJAwgVYdJ8agHMaVwaUYcyzJ6AnlyOKW/Y7ffrtBNYpUXo/nzUr8TEzWuhFhACGhd7nG0TzrEx1uXYzeRQbuYtZCcG3/N14oas5krWeASel/kHDsALHKVIRtA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 14 Jun 2022 09:38:47 +0000
  • Ironport-data: A9a23:Vr2w468DpDsXPYN1KCz/DrUDnH+TJUtcMsCJ2f8bNWPcYEJGY0x3y 2UcWzqCaayCZjD2KtFxYYq29UkE65DczNBrHQE/pCw8E34SpcT7XtnIdU2Y0wF+jyHgoOCLy +1EN7Es+ehtFie0Si+Fa+Sn9T8mvU2xbuKU5NTsY0idfic5DnZ44f5fs7Rh2NQw34HlW1rlV e7a+KUzBnf0g1aYDUpMg06zgEsHUCPa4W5wUvQWPJinjXeG/5UnJMt3yZKZdhMUdrJ8DO+iL 9sv+Znilo/vE7XBPfv++lrzWhVirrc/pmFigFIOM0SpqkAqSiDfTs/XnRfTAKtao2zhojx/9 DlCnZu6dCEFYrLAoc0UdUhiLjh9HvZL47CSdBBTseTLp6HHW13F5qw0SW0TY8gf8OsxBnxS/ /sFLjxLdgqEm++93LO8TK9rm9gnK87oeogYvxmMzxmAVapgHc+FHvSMvIAHtNszrpkm8fL2f c0WZCApdB3dSxZOJk0WGNQ1m+LAanzXLGYF9AnJ//RfD277lwBWzLf3D//pZ4aIfexRx2OVp mT38DGsav0dHJnFodafyVqujOLSmSLwWKoJCaa1sPVthTW7xHEXCRAQfUu2p7++kEHWc8lEN 0Ue9y4qrK4z3E+mVN/wW1u/unHslgEYc8pdFas98g7l4rrZ5UOVC3YJShZFacc6r4kmSDoyz FiLktj1Qzt1v9WopWm1876VqXa+PHYTJGpbPyscF1JavJ/kvZ05iQ/JQpB7Cqmpg9bpGDb2h TeXsCw5gLZVhskOv0mmwW36b/uXjsChZmYICs//BApJMisRiFaZWrGV
  • Ironport-hdrordr: A9a23:Rq3RG6Ce+dC485PlHeg+sceALOsnbusQ8zAXPh9KJCC9I/bzqy nxpp8mPH/P5wr5lktQ/OxoHJPwOU80kqQFmrX5XI3SJTUO3VHFEGgM1+vfKlHbak7DH6tmpN 1dmstFeaLN5DpB/KHHCWCDer5PoeVvsprY49s2p00dMT2CAJsQizuRZDzrcHGfE2J9dOcE/d enl4N6jgvlXU5SQtWwB3EDUeSGj9rXlKj+aRpDIxI88gGBgR6h9ba/SnGjr1wjegIK5Y1n3X nOkgT/6Knmm/anyiXE32uWy5hNgtPuxvZKGcTJoMkILTfHjBquee1aKvW/lQFwhNvqxEchkd HKrRtlF8Nv60nJdmXwmhfp0xmI6kdb11bSjXujxVfzq83wQzw3T+Bbg5hCTxff4008+Plhza NixQuixtVqJCKFuB64y8nDVhlsmEbxi2Eli/Qvg3tWVpZbQKNNrLYY4FheHP47bW7HAbgcYa hT5fznlbZrmQvwVQGbgoAv+q3gYp0LJGbJfqBY0fblkQS/nxhCvj4lLYIk7zI9HakGOuh5Dt T/Q9pVfY51P78rhIJGdZA8qJiMexrwqSylChPgHX3XUIc6Blnql7nbpJ0I2cDCQu178HJ1ou WKbG9l
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Jun 14, 2022 at 11:13:07AM +0200, Jan Beulich wrote:
> On 14.06.2022 10:32, Roger Pau Monné wrote:
> > On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote:
> >> On 14.06.2022 08:52, Roger Pau Monné wrote:
> >>> On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote:
> >>>> On 13.06.2022 14:32, Roger Pau Monné wrote:
> >>>>> On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote:
> >>>>>> On 13.06.2022 11:04, Roger Pau Monné wrote:
> >>>>>>> On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote:
> >>>>>>>> On 13.06.2022 10:21, Roger Pau Monné wrote:
> >>>>>>>>> On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote:
> >>>>>>>>>> On 10.06.2022 17:06, Roger Pau Monne wrote:
> >>>>>>>>>>> Prevent dropping console output from the hardware domain, since 
> >>>>>>>>>>> it's
> >>>>>>>>>>> likely important to have all the output if the boot fails without
> >>>>>>>>>>> having to resort to sync_console (which also affects the output 
> >>>>>>>>>>> from
> >>>>>>>>>>> other guests).
> >>>>>>>>>>>
> >>>>>>>>>>> Do so by pairing the console_serial_puts() with
> >>>>>>>>>>> serial_{start,end}_log_everything(), so that no output is dropped.
> >>>>>>>>>>
> >>>>>>>>>> While I can see the goal, why would Dom0 output be (effectively) 
> >>>>>>>>>> more
> >>>>>>>>>> important than Xen's own one (which isn't "forced")? And with this
> >>>>>>>>>> aiming at boot output only, wouldn't you want to stop the 
> >>>>>>>>>> overriding
> >>>>>>>>>> once boot has completed (of which, if I'm not mistaken, we don't
> >>>>>>>>>> really have any signal coming from Dom0)? And even during boot I'm
> >>>>>>>>>> not convinced we'd want to let through everything, but perhaps just
> >>>>>>>>>> Dom0's kernel messages?
> >>>>>>>>>
> >>>>>>>>> I normally use sync_console on all the boxes I'm doing dev work, so
> >>>>>>>>> this request is something that come up internally.
> >>>>>>>>>
> >>>>>>>>> Didn't realize Xen output wasn't forced, since we already have rate
> >>>>>>>>> limiting based on log levels I was assuming that non-ratelimited
> >>>>>>>>> messages wouldn't be dropped.  But yes, I agree that Xen (non-guest
> >>>>>>>>> triggered) output shouldn't be rate limited either.
> >>>>>>>>
> >>>>>>>> Which would raise the question of why we have log levels for 
> >>>>>>>> non-guest
> >>>>>>>> messages.
> >>>>>>>
> >>>>>>> Hm, maybe I'm confused, but I don't see a direct relation between log
> >>>>>>> levels and rate limiting.  If I set log level to WARNING I would
> >>>>>>> expect to not loose _any_ non-guest log messages with level WARNING or
> >>>>>>> above.  It's still useful to have log levels for non-guest messages,
> >>>>>>> since user might want to filter out DEBUG non-guest messages for
> >>>>>>> example.
> >>>>>>
> >>>>>> It was me who was confused, because of the two log-everything variants
> >>>>>> we have (console and serial). You're right that your change is 
> >>>>>> unrelated
> >>>>>> to log levels. However, when there are e.g. many warnings or when an
> >>>>>> admin has lowered the log level, what you (would) do is effectively
> >>>>>> force sync_console mode transiently (for a subset of messages, but
> >>>>>> that's secondary, especially because the "forced" output would still
> >>>>>> be waiting for earlier output to make it out).
> >>>>>
> >>>>> Right, it would have to wait for any previous output on the buffer to
> >>>>> go out first.  In any case we can guarantee that no more output will
> >>>>> be added to the buffer while Xen waits for it to be flushed.
> >>>>>
> >>>>> So for the hardware domain it might make sense to wait for the TX
> >>>>> buffers to be half empty (the current tx_quench logic) by preempting
> >>>>> the hypercall.  That however could cause issues if guests manage to
> >>>>> keep filling the buffer while the hardware domain is being preempted.
> >>>>>
> >>>>> Alternatively we could always reserve half of the buffer for the
> >>>>> hardware domain, and allow it to be preempted while waiting for space
> >>>>> (since it's guaranteed non hardware domains won't be able to steal the
> >>>>> allocation from the hardware domain).
> >>>>
> >>>> Getting complicated it seems. I have to admit that I wonder whether we
> >>>> wouldn't be better off leaving the current logic as is.
> >>>
> >>> Another possible solution (more like a band aid) is to increase the
> >>> buffer size from 4 pages to 8 or 16.  That would likely allow to cope
> >>> fine with the high throughput of boot messages.
> >>
> >> You mean the buffer whose size is controlled by serial_tx_buffer?
> > 
> > Yes.
> > 
> >> On
> >> large systems one may want to simply make use of the command line
> >> option then; I don't think the built-in default needs changing. Or
> >> if so, then perhaps not statically at build time, but taking into
> >> account system properties (like CPU count).
> > 
> > So how about we use:
> > 
> > min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096))
> 
> That would _reduce_ size on small systems, wouldn't it? Originally
> you were after increasing the default size. But if you had meant
> max(), then I'd fear on very large systems this may grow a little
> too large.

See previous followup about my mistake of using min() instead of
max().

On a system with 512 CPUs that would be 512KB, I don't think that's a
lot of memory, specially taking into account that a system with 512
CPUs should have a matching amount of memory I would expect.

It's true however that I very much doubt we would fill a 512K buffer,
so limiting to 64K might be a sensible starting point?

> > Maybe we should also take CPU frequency into account, but that seems
> > too complex for the purpose.
> 
> Why would frequency matter? Other aspects I could see mattering is
> node count and maybe memory size.

Higher frequency likely means faster boot, and faster buffer fill,
because the baudrate of the console is constant.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.