Xen project Mailing List

RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and pending bit array

To: Jan Beulich <JBeulich@xxxxxxxxxx>

From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>

Date: Thu, 26 Aug 2010 16:41:41 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Konrad

Delivery-date: Thu, 26 Aug 2010 01:42:53 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: ActE7ULIOgPEFaLoR5i0r0lhH94UCwAB/7lg

Thread-topic: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and pending bit array

>-----Original Message----- >From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx] >Sent: Thursday, August 26, 2010 3:07 PM >To: Jiang, Yunhong >Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Konrad Rzeszutek Wilk >Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X >table and >pending bit array > >>>> On 26.08.10 at 08:24, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote: >>>From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx >>>[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jan Beulich >>>An alternative would be to determine and insert the address ranges >>>earlier into mmio_ro_ranges, but that would require a hook in the >>>PCI config space writes, which is particularly problematic in case >>>MMCONFIG accesses are being used. >> >> I noticed you stated in your previous mail that this should be done in >> hypervisor, not tools. Is it because tools is not trusted by xen hypervisor? >> If tools can be trusted, is it possible to achieve this in tools: Tools tell >> xen hypervisor the MMIO range that is read-only to this guest after the guest >> is created, but before the domain is unpaused. > >Doing this from the tools would have the unfortunate side effect that >Dom0 (or the associated stubdom) would continue to need special >casing, which really it shouldn't for this purpose (and all the special >casing in the patch is really just because qemu wants to map writably >the ranges in question, and this can only be removed after the >hypervisor handling changed). Agree that doing this from tools can't remove dom0's write access. Maybe we can combine this together. Tools for normal guest, while your change to msix_capability_init() is for dom0, the flow is followed: 1) When xen boot, xen will scan all PCI hierarchy, and get the MSI-X address. Mark that range to dom0's mmio_ro_range. (Maybe a seperetaed list is needed to track which MMIO range to which device). 2) When a MSI-x interrupt is started, we check the corresponding BAR to see if the range is changed by dom0. If yes, update dom0's mmio_ro_range. We can also check if the assigned domain's mmio_ro_range cover this. 3) When tools create domain, tools will remove the mmio range for the guest. A bit over-complicated? > >Another aspect is that of necessity of denying write access to these >ranges: The guest must be prevented writing to them only once at Although preventing writing access is only needed when MSI-x interrupt enabled, but as your patch stated, we need consider how to handle mapping setup already before the MSI-x enabled (In fact, we are working on removing setup-already mapping for MCA purpose, although not sure if it is accepetable by upstream). Removing the access from beginning will avoid such clean-already-mapped requirement. >least one MSI-X interrupt got enabled. Plus (while not the case with >the current Linux implementation) a (non-Linux or future Linux) >version may choose to (re-)assign device resources only when the >device gets enabled, which would be after the guest was already >launched. I'm a bit confused. Are you talking about guest re-assign device resources or dom0? If you are talking about guest, I think that's easy to handle , and we should anyway do that. Especially it should not impact physical resource. If you are talking aboud dom0, I can't think out while guest enable device will cause re-assignment in dom0. > >>>A second alternative would be to require Dom0 to report all devices >>>(or at least all MSI-X capable ones) regardless of whether they would >>>be used by that domain, and do so after resources got determined/ >>>assigned for them (i.e. a second notification later than the one >>>currently happening from the PCI bus scan would be needed). >> >> Currently Xen knows about the PCI device's resource assignment already when >> system boot, since Xen have PCI information. The only situations that Xen may >> have no idea includes: a) Dom0 re-assign the device resource, may because of >> resource balance; b) VF device for SR-IOV. >> >> I think for situation a, IIRC, xen hypervisor can't handle it, because that >> means all shadow need be rebuild, the MSI information need be updated etc. > >Shadows and p2m table need updating in this case, but MSI information >doesn't afaict (it gets proagated only when the first interrupt is being >set up). Maybe I didn't state my point clearly. What I mean is, since xen hypervisor knows about PCI hierarchy, they can setup the mmio_ro_range when xen boot. The only concerns for this method is situation a and situation b, which will update the PCI device's resource assignment, while xen hypervisor have no idea and can't update mmio_ro_range. >>>@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v, >>> } >>> >>> /* Read-only memory */ >>>- if ( p2m_is_readonly(p2mt) ) >>>+ if ( p2m_is_readonly(p2mt) || >>>+ (p2mt == p2m_mmio_direct && >>>+ rangeset_contains_singleton(mmio_ro_ranges, >mfn_x(target_mfn))) ) >>> sflags &= ~_PAGE_RW; >> >> Would it have performance impact if too much mmio rangeset and we need >> search it for each l1 entry update? Or you assume this range will not be >> updated so frequently? > >Yes, performance wise this isn't nice. > >> I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand >> the p2m_is_readonly to cover this also? PAE xen may have trouble for it, but >> at least it works for x86_64, and some wrap function with #ifdef X86_64 can >> handle the difference. > >That would be a possible alternative, but I'm afraid I wouldn't dare to >make such a change. Which change? Would following code works without much issue? if (p2m_is_readonly(p2mt) || p2m_is_ro_mmio(p2mt, mfn_t(target_mfn))) sflags &= ~_PAGE_RW; #ifdef __x86_64 #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty) |....|p2m_to_mask(p2m_ram_shared)|p2m_to_mask(p2m_mmio_ro) #define p2m_is_ro_mmio() 0 #else #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty) |....|p2m_to_mask(p2m_ram_shared) #define p2m_is_ro_mmio(_t, mfn) (p2mt == p2m_mmio_direct && (rangeset_contains_singleton(mmio_ro_ranges,>mfn_x(target_mfn))) #endif #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPE)) > >>>@@ -707,7 +810,7 @@ static int __pci_enable_msix(struct msi_ >>> return 0; >>> } >>> >>>- status = msix_capability_init(pdev, msi, desc); >>>+ status = msix_capability_init(pdev, msi, desc, nr_entries); >>> return status; >>> } >> >> As stated above, would it be possible to achive this in tools? >> Also if it is possible to place the mmio_ro_ranges to be per domain >> structure? > >As said above, doing it in the tools has down sides, but if done >there and if excluding/special-casing Dom0 (or the servicing >stubdom) it should be possible to make the ranges per-domain. I agree that dom0's writing access should also be avoided . The concern for global mmio_ro_ranges is, the ranges maybe a bit big, if several SR-IOV card populated, each support several VF. But I have no data how bit impact would it be. Thanks --jyh > >Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.