[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)



On 03/17/2015 11:43 AM, Jan Beulich wrote:
>>>> On 17.03.15 at 12:32, <george.dunlap@xxxxxxxxxxxxx> wrote:
>> On 03/17/2015 11:25 AM, Jan Beulich wrote:
>>>>>> On 17.03.15 at 12:05, <george.dunlap@xxxxxxxxxxxxx> wrote:
>>>> On 03/17/2015 10:54 AM, Jan Beulich wrote:
>>>>> Finally, as said in different contexts earlier, I think unconditionally
>>>>> acquiring locks in dumping routines isn't the best practice. At least
>>>>> in non-debug builds I think these should be try-locks only, skipping
>>>>> the dumping when a lock is busy.
>>>>
>>>> You mean so that we don't block the console if there turns out to be a
>>>> deadlock?
>>>
>>> For example. And also to not unduly get in the way of an otherwise
>>> extremely busy system.
>>
>> I don't understand this last argument.  If you're using the debug keys,
>> you want to know about the state of the system.  I would much rather my
>> system ran 25% slower for the 5 seconds the debug key was dumping
>> information, and have a complete snapshot of the system, than for it to
>> only run 10% slower and to have half the information missing.  The
>> upshot of missing information would likely be that I have to press the
>> debug key 3-4 times in a row, meaning I'd be running 10% slower for 20
>> seconds rather than 25% slower for 5 seconds.
> 
> Yes, I understand this, and in many cases this is the perspective to
> take. Yet I've been in the situation where suggesting the use of
> debug keys to learn something about a (partially) live locked system
> would have had the risk of causing further corruption to it, and
> hence a more careful state dumping approach would have been
> desirable.
> 
>> All in all, I don't think the performance of the debug keys should be a
>> major concern.  The only thing I'd be worried about is making the system
>> as diagnosable as possible if things have already gone pear-shaped
>> (e.g., if there's a deadlock).
> 
> It's not their performance that's of concern, but the effect they
> may have on the performance (or even correctness - see how
> many process_pending_softirqs() calls we had to sprinkle around
> over the years) of other code.

So it sounds like maybe we're actually on the same page, but are using
words slightly differently.  :-)  It sounds like we agree that the
ability to tread carefully on a system which may be having trouble, in
order not to make it worse, is important.  For instance, not wedging the
serial console behind a deadlocked lock, and not further corrupting a
system that had gotten itself wedged in livelock.  Those are things I
would classify under "correctness" and/or "diagnosis".

When I say "performance is not a concern", I mean "it does not concern
me that someone's web page loads 25% slower for the five seconds it
takes to dump the information".  If delaying other parts of the system
causes the system to get wedged or crash, that's obviously a problem.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.