[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Xen/timer: Disable watchdog during dumping timer queues



>>> On 19.09.16 at 15:57, <tianyu.lan@xxxxxxxxx> wrote:

> 
> On 9/15/2016 10:32 PM, Jan Beulich wrote:
>>>>> On 15.09.16 at 16:16, <tianyu.lan@xxxxxxxxx> wrote:
>>> On 9/13/2016 11:25 PM, Jan Beulich wrote:
>>>> Wait - what is do_invalid_op() doing on the stack? I don't think it
>>>> belongs there, and hence I wonder whether the keypress
>>>> happened after some already fatal event (in which case all bets
>>>> are off anyway).
>>>
>>> Not clear why do_invalid_op() on the stack. There is no other fatal
>>> event. The issue disappears when set watchdog_timeout to 10s.
>>>
>>>>>> Another solution is to schedule a tasklet to run keyhandler in timer
>>>>>> handler and invoke process_pending_softirqs() in the dump_timerq().
>>>>>> This also works but it requires to rework keyhandler mechanism.
>>>>>>
>>>>>> Disable watchdog seems to be simpler and I found dump_registers() also
>>>>>> used the same way to deal with the issue.
>>>> That's true. Just that on large machines it defaults to the
>>>> alternative model, for which I'm not sure it actually needs the
>>>> watchdog disabled (as data for a single CPU shouldn't exceed
>>>> the threshold).
>>>>
>>>
>>> It seems not to be necessary to disable watchdog in alternative model
>>> since dumping a single cpu's status will not last a long time.
>>>
>>>
>>> For the issue in the dump timer info handler, disabling watchdog is ok
>>> for you or you have other suggestions to resolve the issue?
>>
>> Well, without a clear understanding of why the issue occurs (for
>> which I need to refer you back to the questionable stack dump)
>> I'm hesitant to agree to this step, yet ...
> 
> After some researches, I found do_invalid_op() on the stack dump is
> caused by run_in_exception_handler(__ns16550_poll) in the ns16550_poll()
> rather than fatal event. The timeout issue still exists when run
> __ns16550_poll() directly in the ns16550_poll().

Well, I then still don't see why e.g. dump_domains() doesn't also need
it. Earlier you did say:

  Keyhandler may run in the timer handler and the following log shows
  calltrace. The timer subsystem run all expired timers' handler
  before programing next timer event. If keyhandler runs longer than
  timeout, there will be no chance to configure timer before triggering
  watchdog and hypervisor rebooting.

The fact that using debug keys may adversely affect the rest of the
system is known. And the nesting of process_pending_softirqs()
inside do_softirq() should, from looking at them, work fine. So I
continue to have trouble seeing the specific reason for the problem
you say you observe.

And as a separate note - dump_registers() is quite an exception
among the key handlers, and that's for a good reason (as the
comment there says). So I continue to be hesitant to see this
spread to other key handlers.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.