[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 for-next 2/3] x86/pt: enable binding of GSIs to a PVH Dom0



>>> On 31.05.17 at 16:29, <roger.pau@xxxxxxxxxx> wrote:
> On Tue, May 30, 2017 at 04:01:00AM -0600, Jan Beulich wrote:
>> >>> On 17.05.17 at 17:15, <roger.pau@xxxxxxxxxx> wrote:
>> > --- a/xen/arch/x86/hvm/irq.c
>> > +++ b/xen/arch/x86/hvm/irq.c
>> > @@ -126,6 +126,49 @@ void hvm_pci_intx_deassert(
>> >      spin_unlock(&d->arch.hvm_domain.irq_lock);
>> >  }
>> >  
>> > +void hvm_gsi_assert(struct domain *d, unsigned int gsi)
>> > +{
>> > +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
>> > +
>> > +    if ( gsi >= hvm_irq->nr_gsis )
>> > +    {
>> > +        ASSERT_UNREACHABLE();
>> > +        return;
>> > +    }
>> > +
>> > +    /*
>> > +     * __hvm_pci_intx_{de}assert uses an array to track the status of each
>> > +     * interrupt line, and Xen does the routing and GSI assertion based on
>> > +     * that. This prevents the same line from triggering multiple times, 
>> > which
>> > +     * is not available here, and thus Xen needs to rely on 
>> > gsi_assert_count in
>> > +     * order to know if the GSI is pending or not.
>> > +     */
>> 
>> The "which is not available here" part is at least confusing. I'm not
>> even sure whether the "which" is supposed to refer to the array or
>> something else, because you use the exact same array here.
> 
> I agree, it's confusing and badly worded, how about:
> 
> __hvm_pci_intx_{de}assert uses a bitfield in pci_intx.i to track the
> status of each interrupt line, and Xen does the routing and GSI
> assertion based on that. The value of the pci_intx.i bitmap prevents
> the same line from triggering multiple times, which is not available
> here, and thus Xen needs to rely on gsi_assert_count in order to know
> if the GSI is pending or not.

Well, it's better, but the "which is not available here" still leaves
open what the "which" refers to. How about "The value of the
pci_intx.i bitmap prevents the same line from triggering multiple
times. As we don't use that bitmap for the hardware domain, Xen
nneds to ..."?

>> > --- a/xen/drivers/passthrough/io.c
>> > +++ b/xen/drivers/passthrough/io.c
>> > @@ -164,6 +164,20 @@ static void pt_irq_time_out(void *data)
>> >  
>> >      spin_lock(&irq_map->dom->event_lock);
>> >  
>> > +    if ( irq_map->flags & HVM_IRQ_DPCI_IDENTITY_GSI )
>> > +    {
>> > +        struct pirq *pirq = dpci_pirq(irq_map);
>> > +
>> > +        ASSERT(is_hardware_domain(irq_map->dom));
>> > +        /*
>> > +         * Identity mapped, no need to iterate over the guest GSI list to 
>> > find
>> > +         * other pirqs sharing the same guest GSI.
>> > +         */
>> > +        irq_map->flags |= HVM_IRQ_DPCI_EOI_LATCH;
>> > +        hvm_gsi_deassert(irq_map->dom, pirq->pirq);
>> > +        goto out;
>> > +    }
>> > +
>> >      dpci = domain_get_irq_dpci(irq_map->dom);
>> >      if ( unlikely(!dpci) )
>> >      {
>> > @@ -185,6 +199,7 @@ static void pt_irq_time_out(void *data)
>> >          hvm_pci_intx_deassert(irq_map->dom, digl->device, digl->intx);
>> >      }
>> >  
>> > + out:
>> >      pt_pirq_iterate(irq_map->dom, pt_irq_guest_eoi, NULL);
>> 
>> With the 1:1 mapping, do you really need to go through
>> pt_pirq_iterate() here? I.e. can't you invoke pt_irq_guest_eoi()
>> just once and be done? Or really it probably can be even more
>> straight, as then there also is no point in setting the
>> HVM_IRQ_DPCI_EOI_LATCH flag, but you could rather do
>> directly what pt_irq_guest_eoi() does in its if() body. Otoh I may
>> be missing something here, as I can't see why the code is using
>> pt_pirq_iterate() even before your change.
> 
> I have to admit this is not obviously clear to me (or I'm also missing
> something), there are too many translation layers and indirections in
> this code, together with a complete lack of comments.
> 
> Previous to my change, pt_irq_time_out iterates over the list of guest
> devices (digl) bound to a pirq, desserts the interrupt lines and marks
> all the pirqs bound to the same guest GSI with the EOI_LATCH flag.
> Finally pt_irq_time_out iterates over the list of guest pirqs and
> clears (EOI) all the ones marked as EOI_LATCH.
> 
> I don't really understand the usefulness of the EOI_LATCH flag, can't
> pt_irq_time_out just call pt_irq_guest_eoi and avoid the iteration?
> Something like:
> 
> list_for_each_entry ( digl, &irq_map->digl_list, list )
> {
>     unsigned int guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
>     const struct hvm_girq_dpci_mapping *girq;
> 
>     list_for_each_entry ( girq, &dpci->girq[guest_gsi], list )
>     {
>         struct pirq *pirq = pirq_info(irq_map->dom, girq->machine_gsi);
> 
>         pirq_dpci(pirq)->masked = 0;
>         pirq_dpci(pirq)->pending = 0;
>         pirq_guest_eoi(dpci_pirq(pirq));
>     }
>     hvm_pci_intx_deassert(irq_map->dom, digl->device, digl->intx);
> }
> 
> I can of course also do something similar for the identity mapping
> case.

Well, that's basically what I did outline, yes. We may need to do
some archeology here to figure out whether once the more indirect
mechanism was really needed, but the need disappeared without
it being noticed. I don't think the original author(s) of the code
would still be around to ask, but I didn't check (yet) who they are.

>> > @@ -472,7 +510,27 @@ int pt_irq_create_bind(
>> >                  pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
>> >                                     HVM_IRQ_DPCI_MACH_PCI |
>> >                                     HVM_IRQ_DPCI_GUEST_PCI;
>> > -                share = BIND_PIRQ__WILL_SHARE;
>> > +                if ( !is_hardware_domain(d) )
>> > +                    share = BIND_PIRQ__WILL_SHARE;
>> > +                else
>> > +                {
>> > +                    unsigned int pin;
>> > +                    struct hvm_vioapic *vioapic = gsi_vioapic(d, 
>> > guest_gsi,
>> > +                                                              &pin);
>> > +
>> > +                    if ( !vioapic )
>> > +                    {
>> > +                        ASSERT_UNREACHABLE();
>> > +                        return -EINVAL;
>> > +                    }
>> > +                    pirq_dpci->flags |= HVM_IRQ_DPCI_IDENTITY_GSI;
>> > +                    /*
>> > +                     * Check if the corresponding vIO APIC pin is 
>> > configured
>> > +                     * level or edge trigger, level triggered interrupts 
>> > will
>> > +                     * be marked as shareable.
>> > +                     */
>> > +                    share = vioapic->redirtbl[pin].fields.trig_mode;
>> 
>> As pointed out during prior review (perhaps of another patch of
>> yours) the trigger mode bit is meaningless for masked RTEs. At
>> the very least an ASSERT() needs to be here for that reason,
>> of course provided masked entries can never be seen here.
> 
> pt_irq_create_bind for GSIs will only be called when the hardware
> domain unmasks the RTE, so an ASSERT is the right choice IMHO.

Yes, I think I did see this in later code, so an ASSERT() it is then.

>> > @@ -489,9 +547,15 @@ int pt_irq_create_bind(
>> >                   * IRQ_GUEST is not set. As such we can reset 'dom' 
>> > directly.
>> >                   */
>> >                  pirq_dpci->dom = NULL;
>> > -                list_del(&girq->list);
>> > -                list_del(&digl->list);
>> > -                hvm_irq_dpci->link_cnt[link]--;
>> > +                if ( !is_hardware_domain(d) )
>> 
>> To be honest I'd prefer if you checked digl and/or girq against NULL
>> here, to avoid someone updating the condition above without
>> updating this one in lock step.
> 
> I've changed the condition to "girq && digl".

How about using || and then ASSERT()ing && inside the if() body?

>> > @@ -638,11 +712,15 @@ int pt_irq_destroy_bind(
>> >      if ( what && iommu_verbose )
>> >      {
>> >          unsigned int device = pt_irq_bind->u.pci.device;
>> > +        char buf[24] = "";
>> >  
>> > -        printk(XENLOG_G_INFO
>> > -               "d%d %s unmap: m_irq=%u dev=%02x:%02x.%u intx=%u\n",
>> > -               d->domain_id, what, machine_gsi, pt_irq_bind->u.pci.bus,
>> > -               PCI_SLOT(device), PCI_FUNC(device), 
>> > pt_irq_bind->u.pci.intx);
>> > +        if ( !is_hardware_domain(d) )
>> > +            snprintf(buf, ARRAY_SIZE(buf), " dev=%02x.%02x.%u intx=%u",
>> > +                     pt_irq_bind->u.pci.bus, PCI_SLOT(device),
>> > +                     PCI_FUNC(device), pt_irq_bind->u.pci.intx);
>> 
>> Now that this supports Dom0, you also need to log the segment.
> 
> I'm not sure I follow you here, for the Dom case all the fields in
> u.pci.* are unused, since Xen does an identity mapping of the GSI, but
> it doesn't know to which device it belongs. That's different for the
> MSI case, but then this fields are not used anyway.

Oh, right, sorry for the noise.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.