[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: xen: IPI interrupts not resumed early enough on suspend/resume



On Mon, 2011-10-03 at 19:42 +0100, Thomas Gleixner wrote:
> On Mon, 3 Oct 2011, Ian Campbell wrote:
> > I can see a few options for how I might go about solving this in a
> > non-hacky way, which approach do you think would be preferable:
> 
> The question is whether you need to disable the IPI interrupt at
> all. If not, we have a flag for that.

We already that flag for these (I think that was why it was added even).
The issue is that in the resuming domain on the other side event
channels all start off masked and something needs to unmask them.

> >       * Add "IRQF_RESUME_EARLY", driven from syscore_resume, and use it
> >         for these interrupts.
> 
> That's the preferable solution, as we could use that for PPC as well,
> unless we can move stuff around, so we disable stuff later.

OK

> >       * register syscore ops for the Xen event channel subsystem to
> >         unmask the IPIs earlier (would probably look a lot like the code
> >         removed by 676dc3cf5bc3).
> 
> I'd like to avoid that.

Sure.

> >       * add syscore_ops to Xen smp subsystem to unmask the specific IPIs
> >         (which it binds at start of day) earlier.
> >       * push dpm_(suspend|resume)_noirq down into stop machine region
> 
> Where is stomp machine used?

It is used by the xen PV suspend handler which runs in that context in
order to quiesce non-boot CPUs (which Xen does not unplug like native
does).

> >       * use something other than stop_machine to quiesce system and move
> >         to cpu0 for suspend (doesn't seem sensible to reproduce that
> >         functionality).
> 
> We already shut down the nonboot cpus on suspend. We could do that
> _before_ we disable devices and the interrupts.

Xen PV suspend uses many of the PM/suspend core code paths but it does
not have the bit which shuts down non-boot CPUs.

It was a while ago but IIRC Xen used to unplug the secondary processors
and it was found to lead to larger latencies in the migration and
checkpointing cases (which at their core are a suspend/resume). The
disaster recovery folks in particular care about this latency since they
want to do rolling checkpoints many times a second.

Ian.

>  
> Raphael ?
> 
> Thanks,
> 
>       tglx



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.