[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] arch: arm: vgic-v3: fix GICD_ISACTIVER range



Hi Stefano,

On 21/11/2019 18:50, Stefano Stabellini wrote:
On Tue, 19 Nov 2019, Julien Grall wrote:
Hi Andre,

On 12/11/2019 12:46, Andre Przywara wrote:
On Mon, 11 Nov 2019 11:01:07 -0800 (PST)
Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote:
On Sat, 9 Nov 2019, Julien Grall wrote:
On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <sstabellini@xxxxxxxxxx>
wrote:
        On Thu, 7 Nov 2019, Peng Fan wrote:
        > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
        >
        > Signed-off-by: Peng Fan <peng.fan@xxxxxxx>

        Reviewed-by: Stefano Stabellini <sstabellini@xxxxxxxxxx>


To be honest, I am not sure the code is correct. A read to those
registers should tell you the list of interrupts active. As we always
return 0, this will not return the correct state of the GIC.

I know that returning the list of actives interrupts is complicated with
the old vGIC, but I don't think silently ignoring it is a good
idea.
The question here is why the guest accessed those registers? What is it
trying to figure out?

I see Linux querying the active state (IRQCHIP_STATE_ACTIVE) at two relevant
points for ARM:
- In kernel/irq/manage.c, in __synchronize_hardirq().
- In KVM's arch timer emulation code.

I think the latter is of no concern (yet), but the first might actually
trigger. At the moment it's beyond me what it actually does, but maybe some
IRQ changes (RT, threaded IRQs?) trigger this now?\
It happens I am sitting right next to Marc now, so I had a chat with him about
this :). Let me try to summarize the discussion here.

__synchronize_hardirq() is used to ensure that all active interrupts have been
handled before continuing. When sync_chip == false, we only ensure that all in
progress interrupts (from Linux PoV) are handled before continue.

sync_chip == true will additionally ensure that any interrupt that were
acknowledge but not yet marked as in progress by the kernel are also handled.
The assumption is this is called after the interrupt has been masked/disabled.

The call to __synchronize_hardirq() in free_irq() (as reported by Peng stack
trace) was introduced recently (see [1]). It is not entirely clear whether
this would affect anyone using Linux 5.4 or just a limited subset of users.

Anyhow, this is a genuine bug and I think returning 0 is only papering over
the bug with no long-term resolution. As Marc pointed out, Even the old vGIC
in KVM was not spec compliant and thankfully this was fixed in the new vGIC.

As I said in a different sub-thread, I would reluctanly be ok to see returning
0 as long as we have add a warning for *every* access. Long-term, the current
vGIC should really get killed.

I appreciate your intention of raising awareness and getting help in
fixing things in the community, which is the right thing to do. However,
I am doubtful of the usefulness of a warning to achieve the goal. Maybe
it would be best to curate an "open issues" section somewhere, even just
as an email after every release or as part of the release notes, or as a
jira ticket, or a wikipage, or a document under docs/.

The state of the brokeness of the vGIC have been documented numerous time on the ML (see [1]) and on JIRA (see XEN-92). And there are probably more not reported (Marc mentionned a new one during the week). Yet no-one ever looked at finishing the new vGIC.

So allow me to doubt that the documentation is going to help here...


Actually, an "open issues" document under docs/ could be a good idea for
this and other similar items.

A warning is a blunt instrument that lacks the subtleties necessary to
raise the attention in the right way and achieve the goal of getting
help and contributions. Especially it has the risk of causing problems
and confusion with less knowledgeable users.  Maybe a dprintk warning
message (only DEBUG builds) could avoid some of the issues, while still
gaining attention of savvy developers who could understand what it means.
But I think that the "open issues" document would be more effective.

See above, this is not the first time the state of the vGIC was raised. While I agree this is a blunt instruction, all the other options had no effect so far. So with this solution maybe someone will soon realized there are effort to put in the core architecture of Xen.

Now, if you suggest me a plan (a person, a date of completion...) for fixing properly the vGIC then I would be more willing to accept the workaround suggested here.

Cheers,

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-02/msg00784.html

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.