[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/backend ring I/O
On Mon, 2011-06-20 at 12:49 -0400, Ian Jackson wrote: > Daniel Stodden writes ("[Xen-devel] [PATCH 1 of 1] xen-backwatch: Deal with > broken frontend/backend ring I/O"): > > Adds tool support to debug backends which expose I/O ring state in > > sysfs. Currently supports /sys/devices/xen-backend/vbd-*-*/io_ring > > nodes for block I/O, where implemented. > > Thanks. > > > Primary function is to observe ring state make progress over a period > > of time, then report stuck message queue halves where pending > > consumer/event are not moving. > > This seems to have only one entry in COMMANDS, "check". Is that > right ? The <command> thing should allow alternative ways to run it without breaking existing deployments. I used to think about a 'daemon', but then found that cron would likely do the job. > And it doesn't seem to provide a way to specify a particular > domain to look for ? I briefly considered it initially, but after testing it just didn't look so important anymore. :} Presently, a # xen-ringwatch check -v RingWatch(vbd-1-51760/io_ring)[IDLE]: RingState(size=32, Req(prod=31, cons=31, event=32), Rsp(prod=31, pvt=31, event=32)): io: complete, req: complete, rsp: complete RingWatch(vbd-1-51712/io_ring)[BUSY]: RingState(size=32, Req(prod=143236466, cons=143236466, event=143236467), Rsp(prod=143236459, pvt=143236459, event=143236460)): io: pending, req: complete, rsp: complete will to dump the entire set of running backends, independent of state. I should point out there's not really a significant overhead involved, except some required wait period to come to a conclusion. It's all glob/read/write/wait and all VBDs are watched in parallel. But even with 50 VMs, at some point I anticipated people to rather grep instead. Here's a sample crontab invocation: xen-ringwatch check -T 4 --kick | logger -p daemon.crit -t RINGWATCH-ALERT Which will remain silent, until it actually discovers some watched subset to .kick() and then outputs those, exclusively. Jun 20 13:26:59 localhost RINGWATCH-ALERT: RingWatch(vbd-1-51712/io_ring)[STCK]: RingState(size=32, Req(prod=146141561, cons=146141561, event=146141562), Rsp(prod=146141561, pvt=146141561, event=146141530)): io: complete, req: complete, rsp: pending > I'm happy to take it as-is as it seems like a better-than-nothing tool > but I just wanted to check I'd understood it, first. Found that the patch I sent was missing cleanup in some spots (mainly a program rename, and the verbose variable in __main__ ended up off by one). Can I sneak in the update attached before you push it? Also, I never tried the make install target. Does it look okay to you? Cheers, Daniel Attachment:
xen-ringwatch.diff _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |