[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through



>>> On 20.03.17 at 02:58, <xuquan8@xxxxxxxxxx> wrote:
> On March 16, 2017 11:32 PM, Jan Beulich wrote:
>>>>> On 16.03.17 at 15:21, <xuquan8@xxxxxxxxxx> wrote:
>>> On March 16, 2017 10:06 PM, Jan Beulich wrote:
>>>>>>> On 16.03.17 at 14:55, <xuquan8@xxxxxxxxxx> wrote:
>>>>> I try to pass-through a device with 8G large bar, such as nvidia
>>>>> M60(note1, pci-e info as below). It takes about '__15 sconds__' to
>>>>> update 8G large bar in QEMU::xen_pt_region_update()..
>>>>> Specifically, it is xc_domain_memory_mapping() in
>>xen_pt_region_update().
>>>>>
>>>>> Digged into xc_domain_memory_mapping(), I find it mainly call
>>>>> "do_domctl
>>>>> (…case XEN_DOMCTL_memory_mapping…)"
>>>>> to mapping mmio region.. of cause, I find out that this mapping
>>>>> could take a while in the code comment below ' case
>>>>XEN_DOMCTL_memory_mapping '.
>>>>>
>>>>> my questions:
>>>>> 1. could we make this mapping mmio region quicker?
>>>>
>>>
>>> Thanks for your quick reply.
>>>
>>>>Yes, e.g. by using large (2M or 1G) pages. This has been on my todo
>>>>list for quite a while...
>>>>
>>>>> 2. if could not, does it limit by hardware performance?
>>>>
>>>>I'm afraid I don't understand the question. If you mean "Is it limited
>>>>by hw performance", then no, see above. If you mean "Does it limit hw
>>>>performance", then again no, I don't think so (other than the effect
>>>>of having more IOMMU translation levels than really necessary for such
>>large a region).
>>>>
>>>
>>> Sorry, my question is  "Is it limited by hw performance"...
>>>
>>> I am much confused. why does this mmio mapping take a while?
>>> I guessed it takes a lot of time to set up p2m / iommu entry. That's
>>> why I ask "Is it limited by hw performance".
>>
>>Well, just count the number of page table entries and that of the resulting
>>hypercall continuations. It's the sheer amount of work that's causing the
>>slowness, together with the need for us to use continuations to be on the safe
>>side. There may well be redundant TLB invalidations as well. Since we can do
>>better (by using large
>>pages) I wouldn't call this "limited by hw performance", but of course one
>>may.
>>
> 
> I agree.
> So far as I know, xen&qemu upstream doesn't support to pass-through large bar 
> (pci-e bar > 4G) device, such as nvidia M60,
> However cloud providers may want to leverage this feature for machine 
> learning .etc.
> Is it on your TODO list?

Is what on my todo list? I was assuming large BAR handling to work
so far (Konrad had done some adjustments there quite a while ago,
from all I recall).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.