|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.17?] x86/paging: return -EINVAL for paging domctls for dying domains
On 09.11.2022 10:45, Edwin Torok wrote:
>> On 9 Nov 2022, at 07:48, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> On 08.11.2022 18:15, Roger Pau Monné wrote:
>>> On Tue, Nov 08, 2022 at 06:03:54PM +0100, Jan Beulich wrote:
>>>> On 08.11.2022 17:43, Roger Pau Monné wrote:
>>>>> On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote:
>>>>>> On 08.11.2022 12:38, Roger Pau Monne wrote:
>>>>>>> Like on the Arm side, return -EINVAL when attempting to do a p2m
>>>>>>> operation on dying domains.
>>>>>>>
>>>>>>> The current logic returns 0 and leaves the domctl parameter
>>>>>>> uninitialized for any parameter fetching operations (like the
>>>>>>> GET_ALLOCATION operation), which is not helpful from a toolstack point
>>>>>>> of view, because there's no indication that the data hasn't been
>>>>>>> fetched.
>>>>>>
>>>>>> While I can see how the present behavior is problematic when it comes
>>>>>> to consuming supposedly returned data, ...
>>>>>>
>>>>>>> --- a/xen/arch/x86/mm/paging.c
>>>>>>> +++ b/xen/arch/x86/mm/paging.c
>>>>>>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct
>>>>>>> xen_domctl_shadow_op *sc,
>>>>>>>
>>>>>>> if ( unlikely(d->is_dying) )
>>>>>>> {
>>>>>>> - gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain
>>>>>>> %u\n",
>>>>>>> + gdprintk(XENLOG_INFO,
>>>>>>> + "Tried to do a paging domctl op on dying domain %u\n",
>>>>>>> d->domain_id);
>>>>>>> - return 0;
>>>>>>> + return -EINVAL;
>>>>>>> }
>>>>>>
>>>>>> ... going from "success" to "failure" here has a meaningful risk of
>>>>>> regressing callers. It is my understanding that it was deliberate to
>>>>>> mimic success in this case (without meaning to assign "good" or "bad"
>>>>>> to that decision).
>>>>>
>>>>> I would assume that was the original intention, yes, albeit the commit
>>>>> message doesn't go into details about why mimicking success is
>>>>> required, it's very well possible the code relying on this was xend.
>>>>
>>>> Quite possible, but you never know who else has cloned code from there.
>>>>
>>>>>> Can you instead fill the data to be returned in
>>>>>> some simple enough way? I assume a mere memset() isn't going to be
>>>>>> good enough, though (albeit public/domctl.h doesn't explicitly name
>>>>>> any input-only fields, so it may not be necessary to preserve
>>>>>> anything). Maybe zeroing ->mb and ->stats would do?
>>>>>
>>>>> Hm, it still feels kind of wrong. We do return errors elsewhere for
>>>>> operations attempted against dying domains, and that seems all fine,
>>>>> not sure why paging operations need to be different in this regard.
>>>>> Arm does also return -EINVAL in that case.
>>>>>
>>>>> So what about postponing this change to 4.18 in order to avoid
>>>>> surprises, but then taking it in its current form at the start of the
>>>>> development window, as to have time to detect any issues?
>>>>
>>>> Maybe, but to be honest I'm not convinced. Arm can't really be taken
>>>> for comparison, since the op is pretty new there iirc.
>>>
>>> Indeed, but the tools code paths are likely shared between x86 and
>>> Arm, as the hypercalls are the same.
>>
>> On x86 we have both xc_shadow_control() and (functional)
>> xc_logdirty_control(); on Arm only the former is used, while the latter
>> would also be impacted by your change. Plus you're not accounting for
>> external tool stacks (like xend would be if anyone had cared to forward
>> port it, when - as you said earlier - the suspicion is that the original
>> change was made to "please" xend).
>
> I don't see how returning random uninitialised data (current behaviour) or
> wrong data (all zeroes) is better than returning an explicit error code.
I didn't say anything like that. What I did say is that we cannot lightly
move from "success" to "failure".
> And even this change happens to break some code, it'd do so with a clear
> error code and in a reproducible way, and then such breakage can be easily
> fixed in the affected piece of code
> (e.g. a toolstack other than XAPI which was not looking for errors from this
> domctl call)
My experience with error messages (including the conveying of error codes)
by the tool stack is rather poor. I also dare to question the
"reproducible" aspect: The main risk I see here is with a multi-threaded
tool stack where one thread hasn't become / been made properly aware of
another one being in the process of cleaning up after a domain. Its racy
issuing of further domctl-s may very well not be nicely reproducible.
> AFAICT XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION is only used in the OCaml C stubs
> (i.e. used XAPI/xenopsd) which always checks return value
> and raises OCaml exception and in python lowlevel libs, which checks return
> value and raises python exception.
But XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION isn't the only sub-op affected here,
is it?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |