[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 2/8] public/io/netif: add directory for backend parameters



On 02/08/2018 11:13 AM, Wei Liu wrote:
> On Wed, Feb 07, 2018 at 12:10:37PM +0000, Joao Martins wrote:
>> On 02/06/2018 05:12 PM, Wei Liu wrote:
>>> (Three months after you sent this, sorry...)
>>>
>>> On Mon, Nov 06, 2017 at 12:33:06PM +0000, Joao Martins wrote:
>>>> On Mon, Nov 06, 2017 at 10:33:59AM +0000, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Joao Martins [mailto:joao.m.martins@xxxxxxxxxx]
>>>>>> Sent: 02 November 2017 18:06
>>>>>> To: Xen Development List <xen-devel@xxxxxxxxxxxxx>
>>>>>> Cc: Joao Martins <joao.m.martins@xxxxxxxxxx>; Konrad Rzeszutek Wilk
>>>>>> <konrad.wilk@xxxxxxxxxx>; Paul Durrant <Paul.Durrant@xxxxxxxxxx>; Wei Liu
>>>>>> <wei.liu2@xxxxxxxxxx>
>>>>>> Subject: [PATCH RFC 2/8] public/io/netif: add directory for backend
>>>>>> parameters
>>>>>>
>>>>>> The proposed directory provides a mechanism for tools to control the
>>>>>> maximum feature set of the device being provisioned by backend.
>>>>>> The parameters/features include offloading features, number of
>>>>>> queues etc.
>>>>>>
>>>>>> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
>>>>>> ---
>>>>>>  xen/include/public/io/netif.h | 16 ++++++++++++++++
>>>>>>  1 file changed, 16 insertions(+)
>>>>>>
>>>>>> diff --git a/xen/include/public/io/netif.h 
>>>>>> b/xen/include/public/io/netif.h
>>>>>> index 2454448baa..a412e4771d 100644
>>>>>> --- a/xen/include/public/io/netif.h
>>>>>> +++ b/xen/include/public/io/netif.h
>>>>>> @@ -161,6 +161,22 @@
>>>>>>   */
>>>>>>
>>>>>>  /*
>>>>>> + * The directory "require" maybe be created in backend path by tools
>>>>>> + * domain to override the maximum feature set that backend provides to
>>>>>> the
>>>>>> + * frontend. The children entries within this directory are features 
>>>>>> names
>>>>>> + * and the correspondent values that should be used backend as defaults
>>>>>> e.g.:
>>>>>> + *
>>>>>> + * /local/domain/X/backend/<domid>/<handle>/require
>>>>>> + * /local/domain/X/backend/<domid>/<handle>/require/multi-queue-
>>>>>> max-queues = "2"
>>>>>> + * /local/domain/X/backend/<domid>/<handle>/require/feature-no-csum-
>>>>>> offload = "1"
>>>>>> + *
>>>>>> + * In the example above, network backend will negotiate up to a maximum
>>>>>> of
>>>>>> + * two queues with frontend plus disabling IPv4 checksum offloading.
>>>>>> + *
>>>>>> + * This directory and its children entries shall only be visible to the 
>>>>>> backend.
>>>>>> + */
>>>>>> +
>>>>>
>>>>> What should happen if the toolstack sets something in 'require' that
>>>>> the backend cannot provide? I don't see anything in your RFC patches
>>>>> to check that the backend has responded appropriately to the keys.
>>>>
>>>> Hmm, you're right that this RFC doesn't handle that properly - but for the
>>>> ones the backend provide I had suggested (albeit not implemented here)
>>>> back in the other thread that we could compare the values of feature in
>>>> "require" with the one announced to the frontend. But well this wouldn't
>>>> cover the non-provided ones, and possibly would fall a bit as a hack.
>>>>
>>>> I could change the format of the entries within "require"
>>>> directory to be e.g. "<id>-<feature-name> = <feature-value>" and the
>>>> acknowledgement entry would come in the form "<id>-status
>>>> = <error code>". Consequently the lack of a "<id>-status" entry would
>>>> have a stronger semantic i.e. unsupported and ignored. The toolstack then 
>>>> would have
>>>> means to check whether the feature was really succesfull set as desired
>>>> or not. But then one question comes to mind: should the backend be
>>>> prevented to init in the event that the features requested fail to be
>>>> set? In which case uevent (on Linux) isn't triggered and xenbus state 
>>>> doesn't
>>>> get changed and toolstack would fail with timeout later on.
>>>>
>>>
>>> I think the backend should not proceed if it can't meet the
>>> requirements. But to be clear I also don't think the timeout behaviour
>>> should be used to determine if the setting is successful because it is
>>> asking other part of the system to pick up the slack and system
>>> administrators would be left in the dark (unless there is easily
>>> accessible message that can be obtained by libxl to return to system
>>> administrators).
>>
>> That timeout behaviour is already there *I think* (or maybe I have the wrong
>> impression)? The alternative is to trigger the uevent and add more logic on 
>> the
> 
> Yes, it is already there. Libxl will wait for the backend to change its
> state for X seconds.
> 
> The difference now is the system administrators can potentially easily
> trigger a timeout due to misconfiguration, while previously it is mostly
> due to the module not getting loaded or some other failures that are not
> the system administrators' fault.
> 
/nods

>> hotplug script to check if the parameters were set according to config, but 
>> OTOH
>> you add more complexity there. Or perhaps we can check that the backend set 
>> to
>> its state to Unknown (or some other state) and that determines the failure - 
>> but
>> still no uevent should be triggered. Unless you had something else in your 
>> mind?
> 
> On the other hand, I don't think adding uevent or whatever other logic
> is a good idea and it is a bit risky to rely on the state of driver to
> determine failure because we don't have a state machine that applies to
> all drivers. 

Agreed.

> We can probably specify a xenstore node in the spec to
> return some error code and let libxl read it. With that model old tools
> work the same (extra node ignored) but new tools can utilise the new
> node. IIRC there could already be some node that can be utilised --
> xenbus_dev_fatal writes message to xenstore, I think.
> 

I almost forgot about xenbus_dev_fatal(). It writes to an "error" entry in the
backend|frontend path with the errno plus error message. But it also changes the
device xenbus state to XenbusClosed. Taking into consideration your earlier
comment you probably meant xenbus_dev_error() instead? netback does allow
Initialising state to be directly into Closing, but others might not be the 
same.

> What do you think?

I like the idea of having a similar "error" entry in the config|require
directory following the same pattern as mentioned in the last paragraph.

Something like:

<backend|frontend path>/config/error = "<errno> <message>"

I would imagine this could be wrapper in a xenbus_config_fatal().

I had suggested a slightly more complicated version of it in a old reply to Paul
(at top of this message) with:

<backend|frontend path>/require/<id>-<feature-name> = "<feature-value>"
<backend|frontend path>/require/<id>-status = "<error code>"

But taking morphing it with your comment could also be something like:

<backend|frontend path>/config/<feature-name> = "<feature-value>"
<backend|frontend path>/config/<feature-name>/error = "<errno> <message>"

But either this option or the config/error global one depend on whether people
here prefer to deliver all configuration errors at once, or one global
"config/error" which would give the first entry with an error. The latter though
could lead to the sysadmin having to recreate the domain multiple times to
see/handle all the errors.

        Joao

P.S. s/require/config

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.