[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Hackathon minutes] PV block improvements



On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote:
> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> > On Tue, 25 Jun 2013, Ian Campbell wrote:
> >> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monnà wrote:
> >> > On 21/06/13 20:07, Matt Wilson wrote:
> >> > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monnà wrote:
> >> > >> Hello,
> >> > >>
> >> > >> While working on further block improvements I've found an issue with
> >> > >> persistent grants in blkfront.
> >> > >>
> >> > >> Persistent grants basically allocate grants and then they are never
> >> > >> released, so both blkfront and blkback keep using the same memory 
> >> > >> pages
> >> > >> for all the transactions.
> >> > >>
> >> > >> This is not a problem in blkback, because we can dynamically choose 
> >> > >> how
> >> > >> many grants we want to map. On the other hand, blkfront cannot remove
> >> > >> the access to those grants at any point, because blkfront doesn't know
> >> > >> if blkback has this grants mapped persistently or not.
> >> > >>
> >> > >> So if for example we start expanding the number of segments in 
> >> > >> indirect
> >> > >> requests, to a value like 512 segments per requests, blkfront will
> >> > >> probably try to persistently map 512*32+512 = 16896 grants per device,
> >> > >> that's much more grants that the current default, which is 32*256 = 
> >> > >> 8192
> >> > >> (if using grant tables v2). This can cause serious problems to other
> >> > >> interfaces inside the DomU, since blkfront basically starts hoarding 
> >> > >> all
> >> > >> possible grants, leaving other interfaces completely locked.
> >> > >
> >> > > Yikes.
> >> > >
> >> > >> I've been thinking about different ways to solve this, but so far I
> >> > >> haven't been able to found a nice solution:
> >> > >>
> >> > >> 1. Limit the number of persistent grants a blkfront instance can use,
> >> > >> let's say that only the first X used grants will be persistently 
> >> > >> mapped
> >> > >> by both blkfront and blkback, and if more grants are needed the 
> >> > >> previous
> >> > >> map/unmap will be used.
> >> > >
> >> > > I'm not thrilled with this option. It would likely introduce some
> >> > > significant performance variability, wouldn't it?
> >> >
> >> > Probably, and also it will be hard to distribute the number of available
> >> > grant across the different interfaces in a performance sensible way,
> >> > specially given the fact that once a grant is assigned to a interface it
> >> > cannot be returned back to the pool of grants.
> >> >
> >> > So if we had two interfaces with very different usage (one very busy and
> >> > another one almost idle), and equally distribute the grants amongst
> >> > them, one will have a lot of unused grants while the other will suffer
> >> > from starvation.
> >>
> >> I do think we need to implement some sort of reclaim scheme, which
> >> probably does mean a specific request (per your #4). We simply can't
> >> have a device which once upon a time had high throughput but is no
> >> mostly ideal continue to tie up all those grants.
> >>
> >> If you make the reuse of grants use an MRU scheme and reclaim the
> >> currently unused tail fairly infrequently and in large batches then the
> >> perf overhead should be minimal, I think.
> >>
> >> I also don't think I would discount the idea of using ephemeral grants
> >> to cover bursts so easily either, in fact it might fall out quite
> >> naturally from an MRU scheme? In that scheme bursting up is pretty cheap
> >> since grant map is relative inexpensive, and recovering from the burst
> >> shouldn't be too expensive if you batch it. If it turns out to be not a
> >> burst but a sustained level of I/O then the MRU scheme would mean you
> >> wouldn't be recovering them.
> >>
> >> I also think there probably needs to be some tunable per device limit on
> >> the maximum persistent grants, perhaps minimum and maximum pool sizes
> >> ties in with an MRU scheme? If nothing else it gives the admin the
> >> ability to prioritise devices.
> >
> > If we introduce a reclaim call we have to be careful not to fall back
> > to a map/unmap scheme like we had before.
> >
> > The way I see it either these additional grants are useful or not.
> > In the first case we could just limit the maximum amount of persistent
> > grants and be done with it.
> > If they are not useful (they have been allocated for one very large
> > request and not used much after that), could we find a way to identify
> > unusually large requests and avoid using persistent grants for those?
> 
> Isn't it possible that these grants are useful for some periods of
> time, but not for others?  You wouldn't say, "Caching the disk data in
> main memory is either useful or not; if it is not useful (if it was
> allocated for one very large request and not used much after that), we
> should find a way to identify unusually large requests and avoid
> caching it."  If you're playing a movie, sure; but in most cases, the
> cache was useful for a time, then stopped being useful.  Treating the
> persistent grants the same way makes sense to me.

Right, this is what I was trying to suggest with the MRU scheme. If you
are using lots of grants and you keep on reusing them then they remain
persistent and don't get reclaimed. If you are not reusing them for a
while then they get reclaimed. If you make "for a while" big enough then
you should find you aren't unintentionally falling back to a map/unmap
scheme.


Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.