[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH 0 of 1] Deal with broken frontend/backend ring I/O.
Hi. After running this blkback patch (Don't let in-flight requests defer pending ones...) http://lists.xensource.com/archives/html/xen-devel/2011-05/msg01968.html for a while I guess it's mostly been verified. Unfortunately, it also revealed a great potential to demo old guest bugs. The 2.6.32 tree used to have a problem with lost notifications during IRQ handler migration, due to a glitch in the dynirq handler logic. http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commitdiff;h=c5783925493e315f91330241546da7915dcc46e3 Blkfront got fixed in stable/v2.6.32.y, but looks as at least RHEL6 didn't patch it (yet), so I suspect CentOS and derivatives to suffer too. Xen-blkfront is particularly sensitive to this. Some people seem to report around one or two incidents per week. Presumably more on heavily loaded systems (to repro, manually spinning the affinity mask under scattered I/O will trigger almost immediately). That's going to increase. So let's learn to live with that. Main issue is that even if you know what to blame, there's nothing in place to deal with it. I'd like to propose toolstack support which provides people with a workaround. With minimal kernel support, a watchdog can mostly live in userland, is easy to do and won't need to clutter backend drivers. This can hardly be considered a fix for what's essentiallly guest problem. But it gives hosts a chance to automate guest recovery until there's an update. Also, it's nice for debugging. Ring I/O and event races are a constant source of paranaoia whenever guests appear to wedge, and I believe it might help to drastically reduce time spent on remote triage in some cases. It can also identify excessively blocking I/O (as opposed to a stuck message dispatch). Some potential use cases - Run occasionally (cron). Alerting on production systems where guest OSes resides in a different admistrative domain with no prospect for a quick fix. Might go into distros. - More frequently, once the machine is known to host guests prone to error. There shouldn't be much of a performance impact anyway. But it might want to be tuned to not start spamming the console logs. - Command line test. For people reporting I/O issues, wherever suspecting front/backend problems (or to dismiss that). Or to aid driver hacking. Might also go in xen-bugtool. I chose to drop it into tools/misc. It's rather standalone. Takes a sysfs patch to blkback. I didn't add netback support, but I guess that would look very similar if it ever becomes desirable. Cheers, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |