[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/backend ring I/O



On Mon, 2011-06-20 at 12:49 -0400, Ian Jackson wrote:
> Daniel Stodden writes ("[Xen-devel] [PATCH 1 of 1] xen-backwatch: Deal with 
> broken frontend/backend ring I/O"):
> > Adds tool support to debug backends which expose I/O ring state in
> > sysfs. Currently supports /sys/devices/xen-backend/vbd-*-*/io_ring
> > nodes for block I/O, where implemented.
> 
> Thanks.
> 
> > Primary function is to observe ring state make progress over a period
> > of time, then report stuck message queue halves where pending
> > consumer/event are not moving.
> 
> This seems to have only one entry in COMMANDS, "check".  Is that
> right ?  

The <command> thing should allow alternative ways to run it without
breaking existing deployments. I used to think about a 'daemon', but
then found that cron would likely do the job.

> And it doesn't seem to provide a way to specify a particular
> domain to look for ?

I briefly considered it initially, but after testing it just didn't look
so important anymore. :}

Presently, a 

# xen-ringwatch check -v 
RingWatch(vbd-1-51760/io_ring)[IDLE]: RingState(size=32, Req(prod=31, cons=31, 
event=32), Rsp(prod=31, pvt=31, event=32)): io: complete, req: complete, rsp: 
complete
RingWatch(vbd-1-51712/io_ring)[BUSY]: RingState(size=32, Req(prod=143236466, 
cons=143236466, event=143236467), Rsp(prod=143236459, pvt=143236459, 
event=143236460)): io: pending, req: complete, rsp: complete

will to dump the entire set of running backends, independent of state.

I should point out there's not really a significant overhead involved,
except some required wait period to come to a conclusion. It's all
glob/read/write/wait and all VBDs are watched in parallel. But even with
50 VMs, at some point I anticipated people to rather grep instead.

Here's a sample crontab invocation:

xen-ringwatch check -T 4 --kick | logger -p daemon.crit -t RINGWATCH-ALERT

Which will remain silent, until it actually discovers some watched
subset to .kick() and then outputs those, exclusively.

Jun 20 13:26:59 localhost RINGWATCH-ALERT: 
RingWatch(vbd-1-51712/io_ring)[STCK]: RingState(size=32, Req(prod=146141561, 
cons=146141561, event=146141562), Rsp(prod=146141561, pvt=146141561, 
event=146141530)): io: complete, req: complete, rsp: pending

> I'm happy to take it as-is as it seems like a better-than-nothing tool
> but I just wanted to check I'd understood it, first.

Found that the patch I sent was missing cleanup in some spots (mainly a
program rename, and the verbose variable in __main__ ended up off by
one). Can I sneak in the update attached before you push it?

Also, I never tried the make install target. Does it look okay to you?

Cheers,
Daniel

Attachment: xen-ringwatch.diff
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.