WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] xen: IPI interrupts not resumed early enough on suspend/resu

To: Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Jeremy Fitzhardinge <Jeremy.Fitzhardinge@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: [Xen-devel] xen: IPI interrupts not resumed early enough on suspend/resume
From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Date: Mon, 3 Oct 2011 16:10:26 +0100
Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, linux-kernel <linux-kernel@xxxxxxxxxxxxxxx>
Delivery-date: Mon, 03 Oct 2011 08:11:38 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Citrix Systems, Inc.
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi Thomas,

Recently I've been chasing an issue where a Xen guest will fail to
resume about 1 time in 100. I eventually managed to bisect this back to
676dc3cf5bc3 "xen: Use IRQF_FORCE_RESUME".

The Xen suspend procedure (drivers/xen/manage.c:do_suspend()) is roughly
(I've omitted some uninteresting parts) as follows:
  dpm_suspend_start()
  dpm_suspend_noirq()
  stop_machine()
   -> xen_suspend()
        syscore_suspend()
        HYPERVISOR_suspend() /* Hypercall, returns on resume */
        xen_irq_resume() /* Re-establishes evtchn<->irq bindings */
        syscore_resume()
  dpm_resume_noirq()
  dpm_resume_end()

The resume process appears to be coming to a halt at the end of the
stop_machine invocation of xen_suspend(), i.e. after syscore_resume()
but before dpm_resume_noirq().

Looking at the stack traces of all VCPUs when this happens it appears
that they are all idle, which suggests we are missing an event to cause
a reschedule out of the stop_machine thread back into the suspending
thread.

One of the effects of 676dc3cf5bc3 was to move the unmasking of the
timer and IPI interrupts from xen_irq_resume() (i.e. within the
stop_machine region) to dpm_resume_noirq() (i.e. outside the
stop_machine region). Since the IPI interrupts includes the reschedule
IPI I rather suspect that is the reason for the problem. I added a hack
to unmask the reched* IPIs at xen_irq_resume() time and so far it seems
to fix things, which backs up my gut feeling.

I can see a few options for how I might go about solving this in a
non-hacky way, which approach do you think would be preferable:

      * Add "IRQF_RESUME_EARLY", driven from syscore_resume, and use it
        for these interrupts.
      * register syscore ops for the Xen event channel subsystem to
        unmask the IPIs earlier (would probably look a lot like the code
        removed by 676dc3cf5bc3).
      * add syscore_ops to Xen smp subsystem to unmask the specific IPIs
        (which it binds at start of day) earlier.
      * push dpm_(suspend|resume)_noirq down into stop machine region
      * use something other than stop_machine to quiesce system and move
        to cpu0 for suspend (doesn't seem sensible to reproduce that
        functionality).

Routing IPIs through the regular IRQ path seems a little bit unusual but
it looks like powerpc does something similar in smp_request_message_ipi
and mpic_request_ipis and that code uses the syscore approach. Does
applying that here too seem sane?

Any preference / advice?

Thanks,
Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel