[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/10] vpci: Make every domain handle its own BARs



On 13.11.2020 12:06, Julien Grall wrote:
> Hi Jan,
> 
> On 13/11/2020 10:53, Jan Beulich wrote:
>> On 13.11.2020 11:36, Julien Grall wrote:
>>> On 13/11/2020 10:25, Jan Beulich wrote:
>>>> On 13.11.2020 07:32, Oleksandr Andrushchenko wrote:
>>>>> On 11/12/20 4:46 PM, Roger Pau Monné wrote:
>>>>>> On Thu, Nov 12, 2020 at 01:16:10PM +0000, Oleksandr Andrushchenko wrote:
>>>>>>> On 11/12/20 11:40 AM, Roger Pau Monné wrote:
>>>>>>>> On Mon, Nov 09, 2020 at 02:50:27PM +0200, Oleksandr Andrushchenko 
>>>>>>>> wrote:
>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>>>>>> +static uint32_t bar_read_dispatch(const struct pci_dev *pdev, 
>>>>>>>>> unsigned int reg,
>>>>>>>>> +                                  void *data)
>>>>>>>>> +{
>>>>>>>>> +    struct vpci_bar *vbar, *bar = data;
>>>>>>>>> +
>>>>>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>>>>>> +        return bar_read_hwdom(pdev, reg, data);
>>>>>>>>> +
>>>>>>>>> +    vbar = get_vpci_bar(current->domain, pdev, bar->index);
>>>>>>>>> +    if ( !vbar )
>>>>>>>>> +        return ~0;
>>>>>>>>> +
>>>>>>>>> +    return bar_read_guest(pdev, reg, vbar);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void bar_write_dispatch(const struct pci_dev *pdev, unsigned 
>>>>>>>>> int reg,
>>>>>>>>> +                               uint32_t val, void *data)
>>>>>>>>> +{
>>>>>>>>> +    struct vpci_bar *bar = data;
>>>>>>>>> +
>>>>>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>>>>>> +        bar_write_hwdom(pdev, reg, val, data);
>>>>>>>>> +    else
>>>>>>>>> +    {
>>>>>>>>> +        struct vpci_bar *vbar = get_vpci_bar(current->domain, pdev, 
>>>>>>>>> bar->index);
>>>>>>>>> +
>>>>>>>>> +        if ( !vbar )
>>>>>>>>> +            return;
>>>>>>>>> +        bar_write_guest(pdev, reg, val, vbar);
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>> You should assign different handlers based on whether the domain that
>>>>>>>> has the device assigned is a domU or the hardware domain, rather than
>>>>>>>> doing the selection here.
>>>>>>> Hm, handlers are assigned once in init_bars and this function is only 
>>>>>>> called
>>>>>>>
>>>>>>> for hwdom, so there is no way I can do that for the guests. Hence, the 
>>>>>>> dispatcher
>>>>>> I think we might want to reset the vPCI handlers when a devices gets
>>>>>> assigned and deassigned.
>>>>>
>>>>> In ARM case init_bars is called too early: PCI device assignment is 
>>>>> currently
>>>>>
>>>>> initiated by Domain-0' kernel and is done *before* PCI devices are given 
>>>>> memory
>>>>>
>>>>> ranges and BARs assigned:
>>>>>
>>>>> [    0.429514] pci_bus 0000:00: root bus resource [bus 00-ff]
>>>>> [    0.429532] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff]
>>>>> [    0.429555] pci_bus 0000:00: root bus resource [mem 
>>>>> 0xfe200000-0xfe3fffff]
>>>>> [    0.429575] pci_bus 0000:00: root bus resource [mem 
>>>>> 0x30000000-0x37ffffff]
>>>>> [    0.429604] pci_bus 0000:00: root bus resource [mem 
>>>>> 0x38000000-0x3fffffff pref]
>>>>> [    0.429670] pci 0000:00:00.0: enabling Extended Tags
>>>>> [    0.453764] pci 0000:00:00.0: -------------------- 
>>>>> BUS_NOTIFY_ADD_DEVICE
>>>>>
>>>>> < init_bars >
>>>>>
>>>>> [    0.453793] pci 0000:00:00.0: -- IRQ 0
>>>>> [    0.458825] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X 
>>>>> might fail!
>>>>> [    0.471790] pci 0000:01:00.0: -------------------- 
>>>>> BUS_NOTIFY_ADD_DEVICE
>>>>>
>>>>> < init_bars >
>>>>>
>>>>> [    0.471821] pci 0000:01:00.0: -- IRQ 255
>>>>> [    0.476809] pci 0000:01:00.0: Failed to add - passthrough or MSI/MSI-X 
>>>>> might fail!
>>>>>
>>>>> < BAR assignments below >
>>>>>
>>>>> [    0.488233] pci 0000:00:00.0: BAR 14: assigned [mem 
>>>>> 0xfe200000-0xfe2fffff]
>>>>> [    0.488265] pci 0000:00:00.0: BAR 15: assigned [mem 
>>>>> 0x38000000-0x380fffff pref]
>>>>>
>>>>> In case of x86 this is pretty much ok as BARs are already in place, but 
>>>>> for ARM we
>>>>>
>>>>> need to take care and re-setup vPCI BARs for hwdom.
>>>>
>>>> Even on x86 there's no guarantee that all devices have their BARs set
>>>> up by firmware.
>>>>
>>>> In a subsequent reply you've suggested to move init_bars from "add" to
>>>> "assign", but I'm having trouble seeing what this would change: It's
>>>> not Dom0 controlling assignment (to itself), but Xen assigns the device
>>>> towards the end of pci_add_device().
>>>>
>>>>> Things are getting even more
>>>>>
>>>>> complicated if the host PCI bridge is not ECAM like, so you cannot set 
>>>>> mmio_handlers
>>>>>
>>>>> and trap hwdom's access to the config space to update BARs etc. This is 
>>>>> why I have that
>>>>>
>>>>> ugly hack for rcar_gen3 to read actual BARs for hwdom.
>>>>
>>>> How to config space accesses work there? The latest for MSI/MSI-X it'll
>>>> be imperative that Xen be able to intercept config space writes.
>>>
>>> I am not sure to understand your last sentence. Are you saying that we
>>> always need to trap access to MSI/MSI-X message in order to sanitize it?
>>>
>>> If one is using the GICv3 ITS (I haven't investigated other MSI
>>> controller), then I don't believe you need to sanitize the MSI/MSI-X
>>> message in most of the situation.
>>
>> Well, if it's fine for the guest to write arbitrary values to message
>> address and message data,
> 
> The message address would be the doorbell of the ITS that is usually 
> going through the IOMMU page-tables. Although, I am aware of a couple of 
> platforms where the doorbell access (among other address ranges 
> including P2P transaction) bypass the IOMMU. In this situation, we would 
> need a lot more work than just trapping the access.

When you say "The message address would be the doorbell of the ITS" am
I right in understanding that's the designated address to be put there?
What if the guest puts some random different address there?

> Regarding the message data, for the ITS this is an event ID. The HW will 
> then tag each message with the device ID (this prevents spoofing). The 
> tupple (device ID, event ID) is used by the ITS to decide where to 
> inject the event.
> 
> Whether other MSI controllers (e.g. GICv2m) have similar isolation 
> feature will be on the case by case basis.
> 
>> _and_ to arbitrarily enable/disable MSI / MSI-X,
>> then yes, no interception would be needed.
> The device would be owned by the guest, so I am not sure to understand 
> the exact problem of letting it enabling/disabling MSI/MSI-X. Do you 
> mind expanding your thoughts?

Question is - is Xen involved in any way in the handling of interrupts
from such a device? If not, then I guess full control can indeed be
left with the guest.

> Furthermore, you can also control which event is enabled/disabled at the 
> ITS level.

And that's something Xen controls? On x86 we don't have a 2nd level
of controls, so we need to merge Xen's and the guest's intentions in
software to know what to store in hardware.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.