[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Discussion about virtual iommu support for Xen guest



On 03/06/16 14:09, Lan, Tianyu wrote:
>
>
> On 6/3/2016 7:17 PM, Tian, Kevin wrote:
>>> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
>>> Sent: Friday, June 03, 2016 2:59 AM
>>>
>>> On 02/06/16 16:03, Lan, Tianyu wrote:
>>>> On 5/27/2016 4:19 PM, Lan Tianyu wrote:
>>>>> On 2016年05月26日 19:35, Andrew Cooper wrote:
>>>>>> On 26/05/16 09:29, Lan Tianyu wrote:
>>>>>>
>>>>>> To be viable going forwards, any solution must work with
>>>>>> PVH/HVMLite as
>>>>>> much as HVM.  This alone negates qemu as a viable option.
>>>>>>
>>>>>> From a design point of view, having Xen needing to delegate to
>>>>>> qemu to
>>>>>> inject an interrupt into a guest seems backwards.
>>>>>>
>>>>>
>>>>> Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and
>>>>> the qemu virtual iommu can't work for it. We have to rewrite virtual
>>>>> iommu in the Xen, right?
>>>>>
>>>>>>
>>>>>> A whole lot of this would be easier to reason about if/when we get a
>>>>>> basic root port implementation in Xen, which is necessary for
>>>>>> HVMLite,
>>>>>> and which will make the interaction with qemu rather more clean. 
>>>>>> It is
>>>>>> probably worth coordinating work in this area.
>>>>>
>>>>> The virtual iommu also should be under basic root port in Xen, right?
>>>>>
>>>>>>
>>>>>> As for the individual issue of 288vcpu support, there are already
>>>>>> issues
>>>>>> with 64vcpu guests at the moment. While it is certainly fine to
>>>>>> remove
>>>>>> the hard limit at 255 vcpus, there is a lot of other work
>>>>>> required to
>>>>>> even get 128vcpu guests stable.
>>>>>
>>>>>
>>>>> Could you give some points to these issues? We are enabling more
>>>>> vcpus
>>>>> support and it can boot up 255 vcpus without IR support basically.
>>>>> It's
>>>>> very helpful to learn about known issues.
>>>>>
>>>>> We will also add more tests for 128 vcpus into our regular test to
>>>>> find
>>>>> related bugs. Increasing max vcpu to 255 should be a good start.
>>>>
>>>> Hi Andrew:
>>>> Could you give more inputs about issues with 64 vcpus and what
>>>> needs to
>>>> be done to make 128vcpu guest stable? We hope to do somethings to
>>>> improve them.
>>>>
>>>> What's progress of PCI host bridge in Xen? From your opinion, we
>>>> should
>>>> do that first, right? Thanks.
>>>
>>> Very sorry for the delay.
>>>
>>> There are multiple interacting issues here.  On the one side, it would
>>> be useful if we could have a central point of coordination on
>>> PVH/HVMLite work.  Roger - as the person who last did HVMLite work,
>>> would you mind organising that?
>>>
>>> For the qemu/xen interaction, the current state is woeful and a tangled
>>> mess.  I wish to ensure that we don't make any development decisions
>>> which makes the situation worse.
>>>
>>> In your case, the two motivations are quite different I would recommend
>>> dealing with them independently.
>>>
>>> IIRC, the issue with more than 255 cpus and interrupt remapping is that
>>> you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs
>>> can't be programmed to generate x2apic interrupts?  In principle, if
>>> you
>>> don't have an IOAPIC, are there any other issues to be considered? 
>>> What
>>> happens if you configure the LAPICs in x2apic mode, but have the IOAPIC
>>> deliver xapic interrupts?
>>
>> The key is the APIC ID. There is no modification to existing PCI MSI and
>> IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send
>> interrupt message containing 8bit APIC ID, which cannot address >255
>> cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to
>> enable >255 cpus with x2apic mode.
>>
>> If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC
>> cannot
>> deliver interrupts to all cpus in the system if #cpu > 255.
>
> Another key factor, Linux kernel disables x2apic mode when MAX APIC id
> is > 255 if no interrupt remapping function. The reason for this is what
> Kevin said. So booting up >255 cpus relies on the interrupt remapping.

That is an implementation decision of Linux, not an architectural
requirement.

We need to carefully distinguish the two (even if it doesn't affect the
planned outcome from Xen's point if view), as Linux is not the only
operating system we virtualise.


One interesting issue in this area is plain, no-frills HVMLite domains,
which have an LAPIC but no IOAPIC, as they have no legacy devices/PCI
bus/etc.  In this scenario, no vIOMMU would be required for x2apic mode,
even if the domain had >255 vcpus.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.