Xen project Mailing List

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

From: SeongJae Park <sjpark@xxxxxxxxxx>

Date: Tue, 17 Dec 2019 08:59:32 +0100

Cc: axboe@xxxxxxxxx, linux-block@xxxxxxxxxxxxxxx, konrad.wilk@xxxxxxxxxx, pdurrant@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, SeongJae Park <sj38.park@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, roger.pau@xxxxxxxxxx

Delivery-date: Tue, 17 Dec 2019 08:00:29 +0000

Ironport-sdr: dQl3FfbwDFzEi5+WHu2uoScZ6eke6A7B/RL+g0Mrj1b2ivor+lL+j1uvfXSJe7e8020yBgpm9Y BP0Qpso9a5ng==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß" <jgross@xxxxxxxx> wrote: > On 16.12.19 20:48, SeongJae Park wrote: > > On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote: > > > >> On 16.12.19 17:15, SeongJae Park wrote: > >>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park <sjpark@xxxxxxxxxx> > >>> wrote: > >>> > >>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park <sjpark@xxxxxxxxxx> > >>>> wrote: > >>>> > >>>>> From: SeongJae Park <sjpark@xxxxxxxxx> > >>>>> > >>> [...] > >>>>> --- a/drivers/block/xen-blkback/xenbus.c > >>>>> +++ b/drivers/block/xen-blkback/xenbus.c > >>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device > >>>>> *dev, > >>>>> } > >>>>> > >>>>> > >>>>> +/* Once a memory pressure is detected, squeeze free page pools for a > >>>>> while. */ > >>>>> +static unsigned int buffer_squeeze_duration_ms = 10; > >>>>> +module_param_named(buffer_squeeze_duration_ms, > >>>>> + buffer_squeeze_duration_ms, int, 0644); > >>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > >>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is > >>>>> detected"); > >>>>> + > >>>>> +/* > >>>>> + * Callback received when the memory pressure is detected. > >>>>> + */ > >>>>> +static void reclaim_memory(struct xenbus_device *dev) > >>>>> +{ > >>>>> + struct backend_info *be = dev_get_drvdata(&dev->dev); > >>>>> + > >>>>> + be->blkif->buffer_squeeze_end = jiffies + > >>>>> + msecs_to_jiffies(buffer_squeeze_duration_ms); > >>>> > >>>> This callback might race with 'xen_blkbk_probe()'. The race could > >>>> result in > >>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it > >>>> links > >>>> 'be' to the 'dev'. Please _don't merge_ this patch now! > >>>> > >>>> I will do more test and share results. Meanwhile, if you have any > >>>> opinion, > >>>> please let me know. > > > > I reduced system memory and attached bunch of devices in short time so that > > memory pressure occurs while device attachments are ongoing. Under this > > circumstance, I was able to see the race. > > > >>> > >>> Not only '->blkif', but 'be' itself also coule be a NULL. As similar > >>> concurrency issues could be in other drivers in their way, I suggest to > >>> change > >>> the reclaim callback ('->reclaim_memory') to be called for each driver > >>> instead > >>> of each device. Then, each driver could be able to deal with its > >>> concurrency > >>> issues by itself. > >> > >> Hmm, I don't like that. This would need to be changed back in case we > >> add per-guest quota. > > > > Extending this callback in that way would be still not too hard. We could > > use > > the argument to the callback. I would keep the argument of the callback to > > 'struct device *' as is, and will add a comment saying 'NULL' value of the > > argument means every devices. As an example, xenbus would pass NULL-ending > > array of the device pointers that need to free its resources. > > > > After seeing this race, I am now also thinking it could be better to > > delegate > > detailed control of each device to its driver, as some drivers have some > > complicated and unique relation with its devices. > > > >> > >> Wouldn't a get_device() before calling the callback and a put_device() > >> afterwards avoid that problem? > > > > I didn't used the reference count manipulation operations because other > > similar > > parts also didn't. But, if there is no implicit reference count guarantee, > > it > > seems those operations are indeed necessary. > > > > That said, as get/put operations only adjust the reference count, those will > > not make the callback to wait until the linking of the 'backend' and > > 'blkif' to > > the device (xen_blkbk_probe()) is finished. Thus, the race could still > > happen. > > Or, am I missing something? > > No, I think we need a xenbus lock per device which will need to be > taken in xen_blkbk_probe(), xenbus_dev_remove() and while calling the > callback. I also agree that locking should be used at last. But, as each driver manages its devices and resources in their way, it could have its unique race conditions. And, each unique race condition might have its unique efficient way to synchronize it. Therefore, I think the synchronization should be done by each driver, not by xenbus and thus we should make the callback to be called per-driver. Thanks, SeongJae Park > > > Juergen > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.