[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through



>>> On 21.03.17 at 02:53, <xuquan8@xxxxxxxxxx> wrote:
> On March 20, 2017 3:35 PM, Jan Beulich wrote:
>>>>> On 20.03.17 at 02:58, <xuquan8@xxxxxxxxxx> wrote:
>>> On March 16, 2017 11:32 PM, Jan Beulich wrote:
>>>>>>> On 16.03.17 at 15:21, <xuquan8@xxxxxxxxxx> wrote:
>>>>> On March 16, 2017 10:06 PM, Jan Beulich wrote:
>>>>>>>>> On 16.03.17 at 14:55, <xuquan8@xxxxxxxxxx> wrote:
>>>>>>> I try to pass-through a device with 8G large bar, such as nvidia
>>>>>>> M60(note1, pci-e info as below). It takes about '__15 sconds__' to
>>>>>>> update 8G large bar in QEMU::xen_pt_region_update()..
>>>>>>> Specifically, it is xc_domain_memory_mapping() in
>>>>xen_pt_region_update().
>>>>>>>
>>>>>>> Digged into xc_domain_memory_mapping(), I find it mainly call
>>>>>>> "do_domctl
>>>>>>> (…case XEN_DOMCTL_memory_mapping…)"
>>>>>>> to mapping mmio region.. of cause, I find out that this mapping
>>>>>>> could take a while in the code comment below ' case
>>>>>>XEN_DOMCTL_memory_mapping '.
>>>>>>>
>>>>>>> my questions:
>>>>>>> 1. could we make this mapping mmio region quicker?
>>>>>>
>>>>>
>>>>> Thanks for your quick reply.
>>>>>
>>>>>>Yes, e.g. by using large (2M or 1G) pages. This has been on my todo
>>>>>>list for quite a while...
>>>>>>
>>>>>>> 2. if could not, does it limit by hardware performance?
>>>>>>
>>>>>>I'm afraid I don't understand the question. If you mean "Is it
>>>>>>limited by hw performance", then no, see above. If you mean "Does it
>>>>>>limit hw performance", then again no, I don't think so (other than
>>>>>>the effect of having more IOMMU translation levels than really
>>>>>>necessary for such
>>>>large a region).
>>>>>>
>>>>>
>>>>> Sorry, my question is  "Is it limited by hw performance"...
>>>>>
>>>>> I am much confused. why does this mmio mapping take a while?
>>>>> I guessed it takes a lot of time to set up p2m / iommu entry. That's
>>>>> why I ask "Is it limited by hw performance".
>>>>
>>>>Well, just count the number of page table entries and that of the
>>>>resulting hypercall continuations. It's the sheer amount of work
>>>>that's causing the slowness, together with the need for us to use
>>>>continuations to be on the safe side. There may well be redundant TLB
>>>>invalidations as well. Since we can do better (by using large
>>>>pages) I wouldn't call this "limited by hw performance", but of course
>>>>one may.
>>>>
>>>
>>> I agree.
>>> So far as I know, xen&qemu upstream doesn't support to pass-through
>>> large bar (pci-e bar > 4G) device, such as nvidia M60, However cloud
>>> providers may want to leverage this feature for machine learning .etc.
>>> Is it on your TODO list?
>>
>>Is what on my todo list?
> 
> support to pass-through large bar (pci-e bar > 4G) device..
> 
>> I was assuming large BAR handling to work so far
>>(Konrad had done some adjustments there quite a while ago, from all I 
> recall).
>>
> 
> 
> _iirc_ what Konrad mentioned was using qemu-trad..

Quite possible (albeit my memory says hvmloader), but the qemu
side (trad or upstream) isn't my realm anyway.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.