[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for-4.13 2/2] x86/ioapic: don't use raw entry reads/writes in clear_IO_APIC_pin



On 08.11.2019 17:07, Roger Pau Monné  wrote:
> On Fri, Nov 08, 2019 at 11:16:26AM +0100, Jan Beulich wrote:
>> On 08.11.2019 10:27, Roger Pau Monné  wrote:
>>> On Thu, Nov 07, 2019 at 04:56:09PM +0100, Jan Beulich wrote:
>>>> On 07.11.2019 16:46, Roger Pau Monné  wrote:
>>>>> On Thu, Nov 07, 2019 at 04:28:56PM +0100, Jan Beulich wrote:
>>>>>> On 07.11.2019 16:06, Roger Pau Monne wrote:
>>>>>>> @@ -530,9 +530,9 @@ static void clear_IO_APIC_pin(unsigned int apic, 
>>>>>>> unsigned int pin)
>>>>>>>       */
>>>>>>>      memset(&entry, 0, sizeof(entry));
>>>>>>>      entry.mask = 1;
>>>>>>> -    __ioapic_write_entry(apic, pin, true, entry);
>>>>>>> +    __ioapic_write_entry(apic, pin, false, entry);
>>>>>>
>>>>>> I may be able to understand why this one can't use raw mode, but as
>>>>>> per above a better overall description is needed.
>>>>>
>>>>> Yes, this is the one that's actually incorrect, but see my reasoning
>>>>> below.
>>>>>
>>>>>>
>>>>>>> -    entry = __ioapic_read_entry(apic, pin, true);
>>>>>>> +    entry = __ioapic_read_entry(apic, pin, false);
>>>>>>>      if (entry.irr)
>>>>>>>          printk(KERN_ERR "IO-APIC%02x-%u: Unable to reset IRR\n",
>>>>>>>                 IO_APIC_ID(apic), pin);
>>>>>>
>>>>>> This read again shouldn't need conversion, as the IRR bit doesn't
>>>>>> get touched (I think) by the interrupt remapping code during the
>>>>>> translation it does.
>>>>>
>>>>> TBH, I think raw mode should only be used by the iommu code in order
>>>>> to setup the entries to point to the interrupt remapping table,
>>>>> everything else shouldn't be using raw mode. While it's true that some
>>>>> of the cases here are safe to use raw mode I would discourage it's
>>>>> usage as it can lead to issues, and this is not a performance critical
>>>>> path anyway.
>>>>
>>>> You also should take the other possible perspective - not using
>>>> raw mode means going through interrupt remapping logic, which
>>>> can (needlessly) trigger errors. I think you want to break the
>>>> patch into a necessary and an optional part. The optional part
>>>> should be discussed separately and deferred until after 4.13.
>>>
>>> IMO generic IO-APIC code has not business playing with raw entries
>>> when interrupt remapping is enabled, the layout of IO-APIC entries in
>>> that case is vendor-specific, and hence the generic IO-APIC code is
>>> not able to parse it.
>>>
>>> For example the code in clear_IO_APIC_pin modifies the mask or the
>>> trigger fields of RAW entries, is there any guarantee that those
>>> fields don't have different meanings/layout when interrupt remapping
>>> is enabled?
>>
>> From an abstract pov there's no such guarantee, but in practice
>> the meaning of the fields doesn't change. You make a good point
>> though nevertheless: For VT-d the trigger mode fields in RTE and
>> IRTE need to match, so the interrupt remapping code needs to see
>> the trigger mode change. See below for a possible alternative
>> patch.
>>
>>> I can split the specific bugfix into a separate patch, but IMO the
>>> code in clear_IO_APIC_pin is not safe.
>>
>> A change is needed, yes, but in particular because of the use of
>> the function from clear_IO_APIC(), in turn called from
>> disable_IO_APIC(), yet in turn used e.g. during emergency
>> shutdown after a crash, I'd like the function to do as simple
>> operations as possible, i.e. specifically avoid going through
>> interrupt remapping code (because its data structures may also
>> be corrupted at that point) unless really needed (hence the
>> alternative patch suggestion below).
> 
> Isn't just masking the entries fine when disabling the IO-APIC, or a
> full wipe of all the entries is required?

Just masking the RTE _should_ be fine (but you never know).

> I would be fine with having a maskall_IO_APIC function that reads
> entries in raw format, sets the mask bit and writes the entry back in
> raw format as long as it's annotated that this is done in order to
> limit as much a possible the chances of hitting corrupted data in the
> crash case.

Right, certainly something we may want to do for 4.14.

>> As an aside, iommu_crash_shutdown() - even if actually doing
>> something, i.e. disabling interrupt remapping - does _not_
>> cause the RTEs to be re-written in non-translated format.
> 
> I'm not sure whether that will work, it's possible that some entries
> can't be translated because they use x2APIC IDs, and thus don't fit in
> a non-translated IO-APIC entry.

And of course there's no expectation that interrupts would still
work, but any inspection (e.g. via dumping) of the RTEs would be
misleading at that point.

>> --- unstable.orig/xen/arch/x86/io_apic.c
>> +++ unstable/xen/arch/x86/io_apic.c
>> @@ -519,8 +519,9 @@ static void clear_IO_APIC_pin(unsigned i
>>      if (entry.irr) {
>>          /* Make sure the trigger mode is set to level. */
>>          if (!entry.trigger) {
>> +            entry = __ioapic_read_entry(apic, pin, false);
>>              entry.trigger = 1;
>> -            __ioapic_write_entry(apic, pin, TRUE, entry);
>> +            __ioapic_write_entry(apic, pin, false, entry);
>>          }
>>          __io_apic_eoi(apic, entry.vector, pin);
>>      }
>> @@ -530,7 +531,7 @@ static void clear_IO_APIC_pin(unsigned i
>>       */
>>      memset(&entry, 0, sizeof(entry));
>>      entry.mask = 1;
>> -    __ioapic_write_entry(apic, pin, TRUE, entry);
>> +    __ioapic_write_entry(apic, pin, false, entry);
>>  
>>      entry = __ioapic_read_entry(apic, pin, TRUE);
>>      if (entry.irr)
> 
> Well, this is certainly better than what's there currently, and should
> fix the issue reported by Sergey, albeit I still think checking the
> irr or the trigger fields of a raw entry when using interrupt
> remapping is not safe future wise.
> 
> There's no guarantee that future interrupt remapping implementations
> don't clobber the non-translated fields with different ones when using
> a remapped IO-APIC entry, and hence while this fixes the current issue
> at hand it seems fragile.
> 
> Anyway, I think this should be fixed ASAP, so if you are happy with
> this version that's fine for me. Do you want me to pick this up and
> rebase it on top of my TRUE/FALSE removal patch, or would you rather
> send it formally standalone?

I'd appreciate you making this a v2 of your original patch, ideally
with a further improved description.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.