[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through



On March 16, 2017 11:32 PM, Jan Beulich wrote:
>>>> On 16.03.17 at 15:21, <xuquan8@xxxxxxxxxx> wrote:
>> On March 16, 2017 10:06 PM, Jan Beulich wrote:
>>>>>> On 16.03.17 at 14:55, <xuquan8@xxxxxxxxxx> wrote:
>>>> I try to pass-through a device with 8G large bar, such as nvidia
>>>> M60(note1, pci-e info as below). It takes about '__15 sconds__' to
>>>> update 8G large bar in QEMU::xen_pt_region_update()..
>>>> Specifically, it is xc_domain_memory_mapping() in
>xen_pt_region_update().
>>>>
>>>> Digged into xc_domain_memory_mapping(), I find it mainly call
>>>> "do_domctl
>>>> (…case XEN_DOMCTL_memory_mapping…)"
>>>> to mapping mmio region.. of cause, I find out that this mapping
>>>> could take a while in the code comment below ' case
>>>XEN_DOMCTL_memory_mapping '.
>>>>
>>>> my questions:
>>>> 1. could we make this mapping mmio region quicker?
>>>
>>
>> Thanks for your quick reply.
>>
>>>Yes, e.g. by using large (2M or 1G) pages. This has been on my todo
>>>list for quite a while...
>>>
>>>> 2. if could not, does it limit by hardware performance?
>>>
>>>I'm afraid I don't understand the question. If you mean "Is it limited
>>>by hw performance", then no, see above. If you mean "Does it limit hw
>>>performance", then again no, I don't think so (other than the effect
>>>of having more IOMMU translation levels than really necessary for such
>large a region).
>>>
>>
>> Sorry, my question is  "Is it limited by hw performance"...
>>
>> I am much confused. why does this mmio mapping take a while?
>> I guessed it takes a lot of time to set up p2m / iommu entry. That's
>> why I ask "Is it limited by hw performance".
>
>Well, just count the number of page table entries and that of the resulting
>hypercall continuations. It's the sheer amount of work that's causing the
>slowness, together with the need for us to use continuations to be on the safe
>side. There may well be redundant TLB invalidations as well. Since we can do
>better (by using large
>pages) I wouldn't call this "limited by hw performance", but of course one
>may.
>

I agree.
So far as I know, xen&qemu upstream doesn't support to pass-through large bar 
(pci-e bar > 4G) device, such as nvidia M60,
However cloud providers may want to leverage this feature for machine learning 
.etc.
Is it on your TODO list?

Quan









_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.