WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [Q] Device error handling discussion -- Was: Is qemu use

On Mon, 6 Oct 2008 10:28:26 +0800
"Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:

> Yuji Shimada <mailto:shimada-yxb@xxxxxxxxxxxxxxx> wrote:
> > On Fri, 26 Sep 2008 12:36:21 +0800
> > "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
> 
> I changed the subject to reflect what's discussed.
> 
> > We have to solve many difficulties to keep guest domain running.
> >
> > How about following idea for first step?
> 
> Yes, agree.
> 
> >
> >    Non-fatal error on I/O device:
> >        - kill the domain with error source function.
> >        - reset the function.
> 
> >From following staement in PCI-E 2.0 section 6.6.2: "Note that Port
> state machines associated with Link functionality including those
> in the Physical and Data Link Layers are not reset by FLR", I'm not
> sure if FLR is a right method to handle the error situation. That's
> the reason I asked on how to handle multiple-function devices.

I think Non-fatal error is transaction's error and it does not require
to reset lower layer. But I am not sure.

> >    Non-fatal error on PCI-PCI bridge.
> >        - kill all domains with the functions under the PCI-PCI bridge.
> >        - reset PCI-PCI bridge and secondary bus.
> >
> >    Fatal error:
> >        - kill all domains with the functions under the same root port.
> >        - reset the link (secondary bus reset on root port).
> 
> Agree. Basically I think the action of "reset PCI-PCI bridge and
> secondary bus" or "reset the link" has been done by AER core
> already. What we need define is PCI back's error handler.  In first
> step, the error handler will trigger domain reset, in future, more
> elegant action can be defined/implemented, Any idea?

I agree with you basically.

Current AER core does not reset PCI-PCI bridge and secondary bus,
when Non-fatal error occurs on PCI-PCI bridge. We need to implement
resetting PCI-PCI bridge and secondary bus.

> >
> > Note: we have to consider to prevent device from destroying other domain's
> > memory.
> 
> Why should we consider destroy other domain's memory? I think VT-d
> should gurantee this.

The device is re-assigned to dom0 on destroying HVM domain. If we
destroy domain before resetting the device, I/O device can write
memory of dom0. On the other hand, we have to stop guest software
before resetting the device to prevent guest software from accessing
device.


By the way, do you have any plan to implement these function?
I can provide the idea. But I can't provide the code.

Thanks,
--
Yuji Shimada

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel