Xen project Mailing List

Re: [PATCH for-4.17?] x86/paging: return -EINVAL for paging domctls for dying domains

To: Edwin Torok <edvin.torok@xxxxxxxxxx>

Date: Wed, 9 Nov 2022 10:58:10 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yzCST69uBHrwmMjnwqRwykHt4WvodA0N+auO43+NDAg=; b=k8gbVXEnigynhGz7LFUPLecJNoEX6F+5VszUErLXJu5aIwYrH5Zg6qSLzTPIkCp7MGb4rPVSQgd2CaLnwFu13w/KWs1nrtlfIcbeHB21NKZcOeK6k69LPhW2BlR8RuBoxWJeOerZWsiHaBdJrM8eF9hWofnZq2g/TaKsJLHW9gh7KvVCkovVsJd89JItTjFoOyKyqwVLOOJNr92TuxoQ/2rEj5DCsZRvZR2l1FI1bol78IKzbH0rn6Xwebot4qtfkbMHIdRxN6kf6g5ZQjUvz6D4w9Bj7UZ6zQa6GuPdrIsm3xbyk02pkPN5k2PRKNqF1XtjxCDdC9YvZdG2FJcq+Q==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fLfYGIdTNKwgRpN6o1fEyYCIRrN3McCYqbAvUlE/cz7X3ZjRwvT3s2XHBseKtfpgMMdBKQ+K4zAl3bneSEo8wawd3ijhiiUUOzNFscfgDKzw6lNcP8gmOWYp2ut7otpRHt1GYbqi2n+OJYihRwSHtSifnulaV//+XMTgjXLgmc7IsqEAhnlVeFh/oUWoZI/TbDN633Z8SWznWq7VwN6gHRfNomJRjsYwyNREe5Z/pKNipBpWA7XdqQ93mQWecrmOoFZTuyIeT5yy81mhS/MOrb+/BiEveN+rIMyZnRdanCagJlx2ISWr+FpFnsiCjOlZndV/WoYs4SgCxiaENGyNjw==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, "Henry.Wang@xxxxxxx" <Henry.Wang@xxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 09 Nov 2022 09:58:20 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09.11.2022 10:45, Edwin Torok wrote: >> On 9 Nov 2022, at 07:48, Jan Beulich <jbeulich@xxxxxxxx> wrote: >> On 08.11.2022 18:15, Roger Pau Monné wrote: >>> On Tue, Nov 08, 2022 at 06:03:54PM +0100, Jan Beulich wrote: >>>> On 08.11.2022 17:43, Roger Pau Monné wrote: >>>>> On Tue, Nov 08, 2022 at 05:14:40PM +0100, Jan Beulich wrote: >>>>>> On 08.11.2022 12:38, Roger Pau Monne wrote: >>>>>>> Like on the Arm side, return -EINVAL when attempting to do a p2m >>>>>>> operation on dying domains. >>>>>>> >>>>>>> The current logic returns 0 and leaves the domctl parameter >>>>>>> uninitialized for any parameter fetching operations (like the >>>>>>> GET_ALLOCATION operation), which is not helpful from a toolstack point >>>>>>> of view, because there's no indication that the data hasn't been >>>>>>> fetched. >>>>>> >>>>>> While I can see how the present behavior is problematic when it comes >>>>>> to consuming supposedly returned data, ... >>>>>> >>>>>>> --- a/xen/arch/x86/mm/paging.c >>>>>>> +++ b/xen/arch/x86/mm/paging.c >>>>>>> @@ -694,9 +694,10 @@ int paging_domctl(struct domain *d, struct >>>>>>> xen_domctl_shadow_op *sc, >>>>>>> >>>>>>> if ( unlikely(d->is_dying) ) >>>>>>> { >>>>>>> - gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain >>>>>>> %u\n", >>>>>>> + gdprintk(XENLOG_INFO, >>>>>>> + "Tried to do a paging domctl op on dying domain %u\n", >>>>>>> d->domain_id); >>>>>>> - return 0; >>>>>>> + return -EINVAL; >>>>>>> } >>>>>> >>>>>> ... going from "success" to "failure" here has a meaningful risk of >>>>>> regressing callers. It is my understanding that it was deliberate to >>>>>> mimic success in this case (without meaning to assign "good" or "bad" >>>>>> to that decision). >>>>> >>>>> I would assume that was the original intention, yes, albeit the commit >>>>> message doesn't go into details about why mimicking success is >>>>> required, it's very well possible the code relying on this was xend. >>>> >>>> Quite possible, but you never know who else has cloned code from there. >>>> >>>>>> Can you instead fill the data to be returned in >>>>>> some simple enough way? I assume a mere memset() isn't going to be >>>>>> good enough, though (albeit public/domctl.h doesn't explicitly name >>>>>> any input-only fields, so it may not be necessary to preserve >>>>>> anything). Maybe zeroing ->mb and ->stats would do? >>>>> >>>>> Hm, it still feels kind of wrong. We do return errors elsewhere for >>>>> operations attempted against dying domains, and that seems all fine, >>>>> not sure why paging operations need to be different in this regard. >>>>> Arm does also return -EINVAL in that case. >>>>> >>>>> So what about postponing this change to 4.18 in order to avoid >>>>> surprises, but then taking it in its current form at the start of the >>>>> development window, as to have time to detect any issues? >>>> >>>> Maybe, but to be honest I'm not convinced. Arm can't really be taken >>>> for comparison, since the op is pretty new there iirc. >>> >>> Indeed, but the tools code paths are likely shared between x86 and >>> Arm, as the hypercalls are the same. >> >> On x86 we have both xc_shadow_control() and (functional) >> xc_logdirty_control(); on Arm only the former is used, while the latter >> would also be impacted by your change. Plus you're not accounting for >> external tool stacks (like xend would be if anyone had cared to forward >> port it, when - as you said earlier - the suspicion is that the original >> change was made to "please" xend). > > I don't see how returning random uninitialised data (current behaviour) or > wrong data (all zeroes) is better than returning an explicit error code. I didn't say anything like that. What I did say is that we cannot lightly move from "success" to "failure". > And even this change happens to break some code, it'd do so with a clear > error code and in a reproducible way, and then such breakage can be easily > fixed in the affected piece of code > (e.g. a toolstack other than XAPI which was not looking for errors from this > domctl call) My experience with error messages (including the conveying of error codes) by the tool stack is rather poor. I also dare to question the "reproducible" aspect: The main risk I see here is with a multi-threaded tool stack where one thread hasn't become / been made properly aware of another one being in the process of cleaning up after a domain. Its racy issuing of further domctl-s may very well not be nicely reproducible. > AFAICT XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION is only used in the OCaml C stubs > (i.e. used XAPI/xenopsd) which always checks return value > and raises OCaml exception and in python lowlevel libs, which checks return > value and raises python exception. But XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION isn't the only sub-op affected here, is it? Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.