[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen-netback: make feature-rx-notify mandatory -- Breaks stubdoms



On 10/12/14 16:20, Ian Campbell wrote:
> On Wed, 2014-12-10 at 15:29 +0000, David Vrabel wrote:
>> On 10/12/14 15:07, Ian Campbell wrote:
>>> On Wed, 2014-12-10 at 14:12 +0000, David Vrabel wrote:
>>>> On 10/12/14 13:42, John wrote:
>>>>> David,
>>>>>
>>>>> This patch you put into 3.18.0 appears to break the latest version of
>>>>> stubdomains. I found this out today when I tried to update a machine to
>>>>> 3.18.0 and all of the domUs crashed on start with the dmesg output like
>>>>> this:
>>>>
>>>> Cc'ing the lists and relevant netback maintainers.
>>>>
>>>> I guess the stubdoms are using minios's netfront?  This is something I
>>>> forgot about when deciding if it was ok to make this feature mandatory.
>>>
>>> Oh bum, me too :/
>>>
>>>> The patch cannot be reverted as it's a prerequisite for a critical
>>>> (security) bug fix.  I am also unconvinced that the no-feature-rx-notify
>>>> support worked correctly anyway.
>>>>
>>>> This can be resolved by:
>>>>
>>>> - Fixing minios's netfront to support feature-rx-notify. This should be
>>>> easy but wouldn't help existing Xen deployments.
>>>
>>> I think this is worth doing in its own right, but as you say it doesn't
>>> help existing users.
>>>
>>>> - Reimplement feature-rx-notify support.  I think the easiest way is to
>>>> queue packets on the guest Rx internal queue with a short expiry time.
>>>
>>> Right, I don't think we especially need to make this case good (so long
>>> as it doesn't reintroduce a security hole!).
>>>
>>> In principal we aren't really obliged to queue at all, but since all the
>>> infrastructure for queuing and timing out all exists I suppose it would
>>> be simple enough to implement and a bit less harsh.
>>>
>>> Given we now have XENVIF_RX_QUEUE_BYTES and rx_drain_timeout_jiffies we
>>> don't have the infinite queue any more. So does the expiry in this case
>>> actually need to be shorter than the norm? Does it cause any extra
>>> issues to keep them around for tx_drain_timeout_jiffies rather than some
>>> shorter time?
>>
>> If the internal guest rx queue fills and the (host) tx queue is stopped,
>> it will take tx_drain_timeout for the thread to wake up and notice if
>> the frontend placed any rx requests on the ring.  This could potentially
>> end up where you shovel 512k through stall for 10 s, put another 512k
>> through, stall for 10 s again and so on.
> 
> Ah, true, that's not so great.
> 
> What about if we don't queue at all(*) if rx-notify isn't supported, i.e
> just drop the packet on the floor in start_xmit if the ring is full?
> Would that be so bad? It would surely be simple...

There needs to be a queue between start_xmit and the rx thread so
checking for ring state in start_xmit doesn't help here since the
internal queue can fill before the thread wakes and begins to drain it.

netback could complete the request directly in start_xmit, avoiding the
internal queue but not allowing for any batching but I don't think it is
a good idea to add a different data path for this mode.

> (*) Not counting the "queue" which is the ring itself.
> 
>> The rx stall detection will also need to be disabled since there would
>> be no way for the frontend to signal rx ready.
> 
> Agreed.
> 
> Could be trivially argued to be safe if we were just dropping packets on
> ring overflow...

David


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.