[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 0/3] xen/arm: Inject an exception to the guest rather than crashing it



On 30/01/18 18:46, Julien Grall wrote:
> Hi,
>
> On 30/01/18 18:29, Andrew Cooper wrote:
>> On 30/01/18 17:00, Julien Grall wrote:
>>>
>>>
>>> On 30/01/18 16:38, Andrew Cooper wrote:
>>>> On 30/01/18 16:14, Julien Grall wrote:
>>>>> Hi all,
>>>>>
>>>>> This small series replaces all call to domain_crash_synchronous by
>>>>> injecting
>>>>> an exception to the guest.
>>>>>
>>>>> This will result to a nicer trace from the guest (no need to
>>>>> manually walk
>>>>> the stack) and give a chance to the guest to give a bit more
>>>>> information on
>>>>> what it was doing.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Julien Grall (3):
>>>>>     xen/arm: io: Distinguish unhandled IO from aborted one
>>>>>     xen/arm: Don't crash domain on bad MMIO emulation
>>>>>     xen/arm: Don't crash the domain on invalid HVC immediate
>>>>
>>>> Thanks.
>>>>
>>>> I don't feel qualified to review these, but some notes.
>>>>
>>>> Patch 1.  s/avodi/avoid/ in the commit message
>>>>
>>>> Patches 2 and 3.  You probably want to convert the printks to
>>>> gdprintk()s, otherwise guests can choke up the ratelimited log.  Doing
>>>> so will also mean that the vcpu will be identified consistently, which
>>>> it isn't currently.
>>> We didn't use g*printk because it would be more confusing to print the
>>> current vCPU in some cases (e.g when accessing the re-distributor of
>>> another vCPU) or does not matter (e.g for ITS).
>>
>> In the former case, you'd want to print both current, and the target
>> vcpu.  The latter still matters what current is if something goes wrong.
>>
>> We have plenty of similar cases in x86, but at the point you are
>> printing an diagnostic message, ignoring current is almost always the
>> wrong think to do.
>
> I will look at it on another series.

Fair enough.

>
>>
>>>
>>> The problem with the debug version is those information are actually
>>> quite useful in non-debug build. We found quite a few issues thanks to
>>> them.
>>>
>>> I think it would make more sense for Xen to provide per-guest
>>> ratelimited than hiding those messages in non-debug build.
>>
>> Per guest is quite a lot more complicated than global, and would still
>> require a global limit to prevent a concerted attack from multiple
>> guests to avoid DoSing the system.
>>
>> Debug vs unilateral is your prerogative as a maintainer, but as you've
>> said yourself, the are used for debugging purposes, which proves my
>> point.
>
> So on x86, you always request the user to reproduce it with debug
> build enable?

That is very specific to what goes wrong, but no, we generally don't
have release-build messages for "the guest screwed up and we hit it
expected way for doing so".  We do (well - are trying to, which is how I
found this domain_crash_sync issue in the first place) get consistent at
printing useful release-build messages for abnormal termination of the
guest.

In terms of making it easier to debug, XenServer always ships with a
release and debug hypervisor built from the same source so customers can
trivially switch into the debug version and collect logs if we think
that is a useful thing to do, but this is only used in a fraction of
cases in the first place.

Anything more complicated is definitely going to require an experienced
human to investigate, at which point they can do whatever they want.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.