[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: S3 resume issue in xstate_init


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 17 Aug 2021 16:04:24 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aFGm9Cm1AQf93NwDWQlg7aQsRTTMVjCX3fKQG+arNYI=; b=Eovl8rYdmtoNcjl0595NcKSwsoskLhi7/mSEGdPjhefyzjZKkb1GElrFcJ8OAjWCGqLY1ntddcl945Zp+Ze15f9eU3IMj7+cF2QObPnBFmA0HyHVPFlua9reJi9xVq2tmT4Uh3VnDvAXCPzzJkK3P3+u8CNkHO3l19I0jrQD7LMLnAH+Jk837z7uTBcXaKR5YdyMo00g3HKLsB93yALZrDmAnwzK4lRPnKFeYkFxo2sbMJEBogbOBQvdNjBgazM2CRFxa5WCOWut1npJ8j9wC2Pg8Aa7sdRo6uYACS2pkFZBH38bxkCYkTA0C9diLSjC3sVTHFtLSFnEClaGWwyLHw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UyTxzxy2VHiEIXXjsUG51aPlPx6Sn6fF1PAengoaspvn4RXO3uDnwEJoadc+1zN6KQu8PdUtShnahT307tBylJVu4IO4CM2Yh3Y6iTWtCDVlq7IcsRCU347sSFGK9LlNi6V501bVgOAz/GTEpSKnQqmmRCL0oAg5oBEN8H8RTWP4n6TNnTSi8LhZJTo18Y4Gbx041+HOeNGBH5cDqATDvupQMF9AV9o0CTmK5XVPAO4C9Dd3CoEn6DSWJwDT8e2uCYdGTs2MkOczbA5k25u2SL2AagGbjftzmFsN/Nmcik0d2+8MeqRhj0nV6Ap79UPfq4m3TEnM0xlEpXdlyvfUVw==
  • Authentication-results: citrix.com; dkim=none (message not signed) header.d=none;citrix.com; dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Tue, 17 Aug 2021 14:04:37 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 17.08.2021 15:48, Marek Marczykowski-Górecki wrote:
> On Tue, Aug 17, 2021 at 02:29:20PM +0100, Andrew Cooper wrote:
>> On 17/08/2021 14:21, Jan Beulich wrote:
>>> On 17.08.2021 15:06, Andrew Cooper wrote:
>>>> Perhaps we want the cpu_down() logic to explicitly invalidate their
>>>> lazily cached values?
>>> I'd rather do this on the cpu_up() path (no point clobbering what may
>>> get further clobbered, and then perhaps not to a value of our liking),
>>> yet then we can really avoid doing this from a notifier and instead do
>>> it early enough in xstate_init() (taking care of XSS at the same time).
> 
> Funny you mention notifiers. Apparently cpufreq driver does use it to
> initialize things. And fails to do so:
> 
> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) CPU0: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
> (XEN) Enabling non-boot CPUs  ...
> (XEN) CPU1: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
> (XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d04024ad2b>] vcpu_runstate_get+0x153/0x244
> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff830049667c50   rcx: 0000000000000001
> (XEN) rdx: 000000321d74d000   rsi: ffff830049667c50   rdi: ffff83025dcc0000
> (XEN) rbp: ffff830049667c40   rsp: ffff830049667c10   r8:  ffff83020511a820
> (XEN) r9:  ffff82d04057ef78   r10: 0180000000000000   r11: 8000000000000000
> (XEN) r12: ffff83025dcc0000   r13: ffff830205118c60   r14: 0000000000000001
> (XEN) r15: 0000000000000010   cr0: 000000008005003b   cr4: 00000000003526e0
> (XEN) cr3: 0000000049656000   cr2: 0000000000000028
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d04024ad2b> (vcpu_runstate_get+0x153/0x244):
> (XEN)  48 8b 14 ca 48 8b 04 02 <4c> 8b 70 28 e9 01 ff ff ff 4c 8d 3d dd 64 32 
> 00
> (XEN) Xen stack trace from rsp=ffff830049667c10:
> (XEN)    0000000000000180 ffff83025dcbd410 ffff83020511bf30 ffff830205118c60
> (XEN)    0000000000000001 0000000000000010 ffff830049667c80 ffff82d04024ae73
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 ffff830049667cb8 ffff82d0402560a9
> (XEN)    ffff830205118320 0000000000000001 ffff83020511bf30 ffff83025dc7a6f0
> (XEN)    0000000000000000 ffff830049667d58 ffff82d040254cb1 00000001402e9f74
> (XEN)    0000000000000000 ffff830049667d10 ffff82d040224eda 000000000025dc81
> (XEN)    000000321d74d000 ffff82d040571278 0000000000000001 ffff830049667d28
> (XEN)    ffff82d040228b44 ffff82d0403102cf 0000000000000000 ffff82d0402283a4
> (XEN)    ffff82d040459688 ffff82d040459680 ffff82d040459240 0000000000000004
> (XEN)    0000000000000000 ffff830049667d68 ffff82d04025510e ffff830049667db0
> (XEN)    ffff82d040221ba4 0000000000000000 0000000000000001 0000000000000001
> (XEN)    0000000000000000 ffff830049667e00 0000000000000001 ffff82d04058a5c0
> (XEN)    ffff830049667dc8 ffff82d040203867 0000000000000001 ffff830049667df0
> (XEN)    ffff82d040203c51 ffff82d040459400 0000000000000001 0000000000000010
> (XEN)    ffff830049667e20 ffff82d040203e26 ffff830049667ef8 0000000000000000
> (XEN)    0000000000000003 0000000000000200 ffff830049667e50 ffff82d040270bac
> (XEN)    ffff83020116a640 ffff830258ff6000 0000000000000000 0000000000000000
> (XEN)    ffff830049667e70 ffff82d0402056aa ffff830258ff61b8 ffff82d0405701b0
> (XEN)    ffff830049667e88 ffff82d04022963c ffff82d0405701a0 ffff830049667eb8
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04024ad2b>] R vcpu_runstate_get+0x153/0x244
> (XEN)    [<ffff82d04024ae73>] F get_cpu_idle_time+0x57/0x59
> (XEN)    [<ffff82d0402560a9>] F cpufreq_statistic_init+0x191/0x210
> (XEN)    [<ffff82d040254cb1>] F cpufreq_add_cpu+0x3cc/0x5bb
> (XEN)    [<ffff82d04025510e>] F cpufreq.c#cpu_callback+0x27/0x32
> (XEN)    [<ffff82d040221ba4>] F notifier_call_chain+0x6c/0x96
> (XEN)    [<ffff82d040203867>] F cpu.c#cpu_notifier_call_chain+0x1b/0x36
> (XEN)    [<ffff82d040203c51>] F cpu_up+0xaf/0xc8
> (XEN)    [<ffff82d040203e26>] F enable_nonboot_cpus+0x6b/0x1f8
> (XEN)    [<ffff82d040270bac>] F power.c#enter_state_helper+0x152/0x60a
> (XEN)    [<ffff82d0402056aa>] F 
> domain.c#continue_hypercall_tasklet_handler+0x4c/0xb9
> (XEN)    [<ffff82d04022963c>] F tasklet.c#do_tasklet_work+0x76/0xac
> (XEN)    [<ffff82d040229920>] F do_tasklet+0x58/0x8a
> (XEN)    [<ffff82d0402e6607>] F domain.c#idle_loop+0x74/0xdd
> (XEN) 
> (XEN) Pagetable walk from 0000000000000028:
> (XEN)  L4[0x000] = 000000025dce1063 ffffffffffffffff
> (XEN)  L3[0x000] = 000000025dce0063 ffffffffffffffff
> (XEN)  L2[0x000] = 000000025dcdf063 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN) 
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000028
> (XEN) ****************************************
> 
> This is after adding brutal `this_cpu(xcr0) = 0` in xstate_init().

And presumably again only with "smt=0"? In any event, for us to not mix
things, may I ask that you start a new thread for this further issue?

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.