[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-unstable-smoke test] 169781: regressions - FAIL



On Thu, 28 Apr 2022, Julien Grall wrote:
> On 28/04/2022 01:47, Stefano Stabellini wrote:
> > On Thu, 28 Apr 2022, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@xxxxxxxxxx>
> > > wrote
> > >        It seems to me that it is acceptable to allocate memory with
> > > interrupt
> > >        disabled during __init. I cannot see any drawbacks with it. I think
> > > we
> > >        should change the ASSERT to only trigger after __init: system_state
> > > ==
> > >        SYS_STATE_active.
> > > 
> > >        What do you think?
> > > 
> > > 
> > > This would solve the immediate problem but not the long term one (i.e cpu
> > > hotplug).
> > > 
> > > So I think it would be better to properly fix it right away.
> > 
> > Yeah, you are right about cpu hotplug. I think both statements are true:
> > 
> > - it is true that this is supposed to work with cpu hotplug and these
> >    functions might be directly affected by cpu hotplug (by a CPU coming
> >    online later on)
> > 
> > - it is also true that it might not make sense to ASSERT at __init time
> >    if IRQs are disabled. There might be other places, not affected by cpu
> >    hotplug, where we do memory allocation at __init time with IRQ
> >    disabled. It might still be a good idea to add the system_state ==
> >    SYS_STATE_active check in the ASSERT, not to solve this specific
> >    problem but to avoid other issues.
> 
> AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled *and*
> multiple CPUs running. So we can't generically relax the check.
> 
> Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS tests
> have passed. So it seems unnecessary to me to preemptively relax the check
> just for Arm.

It is good news that it works already (GICv3 aside) on ARM. If you
prefer not to relax it, I am OK with it (although it makes me a bit
worried about future breakages).

 
> > In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
> > implications of cpu hotplug for LPIs and GICv3 before. Do you envision
> > that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
> > the extra CPU comes online?
> 
> It is already called per-CPU. See gicv3_secondary_cpu_init() ->
> gicv3_cpu_init() -> gicv3_populate_rdist().

Got it, thanks!


> > Today gicv3_lpi_init_rdist is called based on the number of
> > rdist_regions without checking if the CPU is online or offline (I think ?)
> 
> The re-distributors are not banked and therefore accessible by everyone.
> However, in Xen case, each pCPU will only touch its own re-distributor (well
> aside TYPER to figure out the ID).
> 
> The loop in gicv3_populate_rdist() will walk throught all the
> re-distributor to find which one corresponds to the current pCPU. Once we
> found it, we will call gicv3_lpi_init_rdist() to fully initialize the
> re-distributor.
> 
> I don't think we want to populate the memory for each re-distributor in
> advance.

I agree.

Currently we do:

    start_secondary
        [...]
        gic_init_secondary_cpu()
            [...]
            gicv3_lpi_init_rdist()
        [...]
        local_irq_enable();

Which seems to be the right sequence to me. There must be an early boot
phase where interrupts are disabled on a CPU but memory allocations are
possible. If this was x86 with the tlbflush limitation, I would suggest
to have per-cpu memory mapping areas so that we don't have to do any
global tlb flushes with interrupts disabled.

On ARM, we don't have the tlbflush limitation so we could do that but we
wouldn't have much to gain from it.

Also, this seems to be a bit of a special case, because in general we
can move drivers initializations later after local_irq_enable(). But
this is the interrupt controller driver itself -- we cannot move it
after local_irq_enable().

So maybe an ad-hoc solution could be acceptable?

The only one I can think of is to check on system_state ==
SYS_STATE_active now. In the future for CPU hotplug we could have a
per-CPU system_state, like cpu_system_state, and do a similar check.

I am totally open to other ideas, I couldn't come up with anything
better at the moment.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.