[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] BUG_ON() vs ASSERT()



Wednesday, September 14, 2016, 10:35:30 AM, you wrote:

> On Tue, Sep 13, 2016 at 7:16 PM, Andrew Cooper
> <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 12/09/16 16:23, Jan Beulich wrote:
>>> All,
>>>
>>> in
>>> https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg01201.html
>>> and
>>> https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg01210.html
>>> Andrew basically suggests that we should switch away from using
>>> ASSERT() and over to BUG_ON() in perhaps quite broad a set of
>>> cases. And honestly I'm not convinced of this: We've been adding
>>> quite a few ASSERT()s over the last years with the aim of doing
>>> sanity checking in debug builds, without adding overhead to non-
>>> debug builds. I can certainly see possible cases where using
>>> BUG_ON() to prevent further possible damage is appropriate, but
>>> I don't think we should overdo here.
>>
>> I am not advocating switching all ASSERT()s to BUG_ON()s.  That would be
>> silly.
>>
>> However, ASSERT()'s as a bounds check very definitely are dangerous.  If
>> there is any uncertainty about the bounds, the check must not disappear
>> in a release build.  (Better yet, code which copes cleanly with
>> insufficient bounds).
>>
>>
>> For anyone reading this email who hasn't already worked out the details
>> of XSA-186, the data corruption issue is here:
>>
>> static int hvmemul_insn_fetch(...)
>> {
>>     unsigned int insn_off = offset - hvmemul_ctxt->insn_buf_eip;
>>     ...
>>     ASSERT(insn_off + bytes <= sizeof(hvmemul_ctxt->insn_buf));
>>     memcpy(&hvmemul_ctxt->insn_buf[insn_off], p_data, bytes);
>>     ...
>>
>> It is left as an exercise to the reader to work out how to exploit this
>> on a release build of Xen, but it is hopefully obvious that the ASSERT()
>> isn't helpful.  A BUG_ON() in this case would have been a host DoS,
>> which is substantially better than a guest escape.

> This seems quite sensible, and I'm glad Andy clarified.  (Although in
> a lot of these cases, a domain_crash() would be preferable to a
> BUG_ON()  if possible.)

>  -George

Just my two cents, but please try to limit BUG_ON()'s to only really 
unrecoverable cases. Because it is quite a disaster especially on
remote administrated machines to have the host crash on a BUG_ON() 
(with or without auto restart).
 
Try to recover (for instance if necessary by crashing the guest 
instead of the host) as much as you can to make at least the hypervisor and 
dom0 
survive. Put a big fat warning in xl dmesg and taint the hypervisor 
(preferably in a way you could easily read out with monitoring tools),
so you could propagate that to the sysadmin that the system is
now deemed instable, so he can investigate (and report the bug),
triage and schedule a reboot. Instead of only having a hard to debug crashing 
machine with probably no or limited logs to go by. 

In the Linux kernel the problem now is to get a lot of them converted back to
ratelimited warnings instead of BUG_ON()'s.

--
Sander


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.