[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8 4/7] VT-d: Use one function to update both remapped and posted IRTE



>>> On 22.02.17 at 08:12, <chao.gao@xxxxxxxxx> wrote:
> On Wed, Feb 22, 2017 at 02:10:25AM -0700, Jan Beulich wrote:
>>>>> On 22.02.17 at 02:53, <chao.gao@xxxxxxxxx> wrote:
>>> On Tue, Nov 22, 2016 at 05:58:56PM +0800, Jan Beulich wrote:
>>>>> @@ -637,7 +657,23 @@ static int msi_msg_to_remap_entry(
>>>>>      remap_rte->address_hi = 0;
>>>>>      remap_rte->data = index - i;
>>>>>
>>>>> -    memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
>>>>> +    if ( !pi_desc )
>>>>> +        memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
>>>>> +    else
>>>>> +    {
>>>>> +        __uint128_t ret;
>>>>> +
>>>>> +        old_ire = *iremap_entry;
>>>>> +        ret = cmpxchg16b(iremap_entry, &old_ire, &new_ire);
>>>>> +
>>>>> +        /*
>>>>> +         * In the above, we use cmpxchg16 to atomically update the 
>>>>> 128-bit IRTE,
>>>>> +         * and the hardware cannot update the IRTE behind us, so the 
>>>>> return value
>>>>> +         * of cmpxchg16 should be the same as old_ire. This ASSERT 
>>>>> validate it.
>>>>> +         */
>>>>> +        ASSERT(ret == old_ire.val);
>>>>> +    }
>>>>
>>>>Could you remind me again please why posted format updates need
>>>>to use cmpxchg16b, but remapped format ones don't? (As a aside,
>>>>with the code structure as you have it you should move the old_irte
>>>>declaration here, or omit it altogether, as you could as well pass
>>>>*iremap_entry directly afaict.)
>>> 
>>> Before feng left, I have asked him about this question. He told me that
>>> the PDA field of posted format IRTE comprises of two parts:
>>> Posted Descritor Address High[127:96] and Low [63:38]. If we want to update
>>> PDA field, do it atomically or disable-update-enable. He also said, it had
>>> been confirmed that cmpxchg16b was supported on all intel platform with 
>>> VT-d PI.
>>> If we assume that updates to remapped format IRTE only is to update either
>>> 64 bit or high 64 bit (except initialition), two 64bit memory write 
>>> operations
>>> is enough. 
>>
>>Two 64-bit memory write operations? Where do you see them? I
>>only see memcpy(), which for the purposes of the code here is
>>supposed to be a black box.
> 
> Ok. I made a mistake here. In ioapic case, before update IRTE, the according
> IOAPIC RTE is masked. So, using a memcpy() is safe. In msi case, there is no
> mask operation. I think only using a memcpy() is unsafe. Do you think so?

memcpy() is unsafe in any event. I was actually trying to
understand why it's not always cmpxchg16b that gets used
here.

>>>>> @@ -668,7 +704,8 @@ int msi_msg_write_remap_rte(
>>>>>
>>>>>      drhd = pdev ? acpi_find_matched_drhd_unit(pdev)
>>>>>                  : hpet_to_drhd(msi_desc->hpet_id);
>>>>> -    return drhd ? msi_msg_to_remap_entry(drhd->iommu, pdev, msi_desc, 
>>>>> msg)
>>>>> +    return drhd ? msi_msg_to_remap_entry(drhd->iommu, pdev,
>>>>> +                                         msi_desc, msg, NULL, 0)
>>>>
>>>>Is this unconditional passing of NULL here really correct?
>>> 
>>> Since two parameters are added to this function, we should think about what
>>> the function does again. the last 2 parameters are optional.
>>> 
>>> If they are not present, just means a physical device driver changes its msi
>>> message. So it notifies iommu to do some changes to IRTE accordingly (the 
>>> driver doesn't
>>> know the format of the live IRTE). This is the case above.
>>> 
>>> If they are present, it means the msi should be delivered to the vcpu with 
>>> the
>>> vector num. To achieve that, the function replaces the old IRTE with a new
>>> posted format IRTE.
>>
>>I don't see how this answers my question. In fact it feels like you,
>>just like Feng, are making assumptions on the conditions under
>>which the function here is being called _at present_. You should,
>>however, make the function work correctly for all possible uses,
>>or add ASSERT()s to clearly expose issues with potential new,
>>future callers.
> 
> Ok. Your suggestion is very good. 
> Every caller tells the function to construct a centain format IRTE. Return to
> your question, I think it is defintely wrong to pass NULL unconditionally.
> We should  pass NULL before the msi is binded to a guest interrupt and pass 
> the
> destination vcpu and vector num after that. At this moment, I can't come up
> with a way to check whether the msi is binded to a guest interrupt and get the
> destination vcpu and vector num only through the struct pci_dev and struct
> msi_desc. Could you give me some suggestion on that or recommend a structure,
> you think, in which we can add some fields to record these information?  

I would hope all information is already available (i.e. stored
somewhere). And I have to admit that it is increasingly annoying
to have to repeatedly, after many weeks of silence, swap back
in all the context here. In particular, I don't have the original
patch in my mailbox anymore (it got purged automatically as it
seems), so recovering context is even more time consuming.

And then, pi_update_irte() is being called when the interrupt
gets bound. gvec, if it changed, would require a rebind, so the
request would again come that route. Perhaps the function
here simply needs to fail if undue changes are being requested
without all necessary data? As suggested before, even adding
ASSERT()s for places where you make assumptions on current
caller behavior would be okay. The only thing that's not okay in
my opinion is to make implicit/hidden assumptions.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.