[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kernel panic in __pci_enable_msix_range on Xen PV with PCI passthrough


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 25 Aug 2021 17:33:54 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HpWmp5RS0cYU5Up6i9e3qSuhhXjCGqAHRW+Kc+kGyOg=; b=Eyl3XxrcsVWJm/U2O94ouoaSt0YfEYOl8HaCKUoRJ4c4/J+045iYnN7k82oJDkhFEkxzusDwesj3G+GcCD9jGf0dX8sn2D+/KPI41SMVB0ns2WOi4lIIGmsQ8vhFFpUVw7iYq/unBZ1GxadhZ6kIGgbKK6T0e4imDj3HM06aDlxhfg6m1e+j+q1WKPhIBM0zkOfbT5DCg0uFQm4KdF4VD/FN5er9YVQy740edpW1HRNO4ilONIMWtsGDGYibk6/EfvF/nUeBZTOXAgrorjDaOHTccCteFdt+fQFfGaFnDedclmY38A5GX3jTl+LDg/1C+dJYinWcvWEuEIMeb/mD3w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=c5R1g+q19jxA98L+PEXNmL/vvYLqYS9H7yysU/SvsT+j/yyd6VeV1zZmu9sOFc7yKuAQ33KAgNFLpydDou7yQb+BwsLoUtcnbIkMquy8uSXTpOKuEnXsdDRHywPvgch/y0iSMN6kBDnP28eHDW7Qa+U2PNgk22CEL7uaq7vimZU5kfEA0qkZK180wO9xiOTNQ40lGoo4BOgpsdF7qL2bSIAhSLYD2ZXAYhxGNvM1L7oivPO1Vcl1eJPlN7MG9PVlzVPH8mx07WO50j2zdjdYsHaRiyynY5Kvbb/S4Qbv0PL2SbBMVKVB4mr84W/5GOHgKjrczeJ+hPIo3C0Jiq5ljA==
  • Authentication-results: google.com; dkim=none (message not signed) header.d=none;google.com; dmarc=none action=none header.from=suse.com;
  • Cc: linux-pci@xxxxxxxxxxxxxxx, stable@xxxxxxxxxxxxxxx, regressions@xxxxxxxxxxxxxxxx, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
  • Delivery-date: Wed, 25 Aug 2021 15:34:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 25.08.2021 17:24, Marek Marczykowski-Górecki wrote:
> On recent kernel I get kernel panic when starting a Xen PV domain with
> PCI devices assigned. This happens on 5.10.60 (worked on .54) and
> 5.4.142 (worked on .136): 
> 
> [   13.683009] pcifront pci-0: claiming resource 0000:00:00.0/0
> [   13.683042] pcifront pci-0: claiming resource 0000:00:00.0/1
> [   13.683049] pcifront pci-0: claiming resource 0000:00:00.0/2
> [   13.683055] pcifront pci-0: claiming resource 0000:00:00.0/3
> [   13.683061] pcifront pci-0: claiming resource 0000:00:00.0/6
> [   14.036142] e1000e: Intel(R) PRO/1000 Network Driver
> [   14.036179] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [   14.036982] e1000e 0000:00:00.0: Xen PCI mapped GSI11 to IRQ13
> [   14.044561] e1000e 0000:00:00.0: Interrupt Throttling Rate (ints/sec) set 
> to dynamic conservative mode
> [   14.045188] BUG: unable to handle page fault for address: ffffc9004069100c
> [   14.045197] #PF: supervisor write access in kernel mode
> [   14.045202] #PF: error_code(0x0003) - permissions violation
> [   14.045211] PGD 18f1c067 P4D 18f1c067 PUD 4dbd067 PMD 4fba067 PTE 
> 80100000febd4075

I'm curious what lives at physical address FEBD4000. The maximum verbosity
hypervisor log may also have a hint as to why this is a read-only PTE.

> [   14.045227] Oops: 0003 [#1] SMP NOPTI
> [   14.045234] CPU: 0 PID: 234 Comm: kworker/0:2 Tainted: G        W         
> 5.14.0-rc7-1.fc32.qubes.x86_64 #15
> [   14.045245] Workqueue: events work_for_cpu_fn
> [   14.045259] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
> [   14.045271] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 
> 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 
> 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
> [   14.045284] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
> [   14.045290] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: 
> ffffc9004069105c
> [   14.045296] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: 
> ffffc90040691000
> [   14.045302] RBP: 0000000000000003 R08: 0000000000000000 R09: 
> 00000000febd404f
> [   14.045308] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: 
> ffff88800ed41000
> [   14.045313] R13: 0000000000000000 R14: 0000000000000040 R15: 
> 00000000feba0000
> [   14.045393] FS:  0000000000000000(0000) GS:ffff888018400000(0000) 
> knlGS:0000000000000000
> [   14.045401] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   14.045407] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 
> 0000000000000660
> [   14.045420] Call Trace:
> [   14.045431]  e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
> [   14.045479]  e1000_probe+0x41f/0xdb0 [e1000e]

Otoh, from this it's pretty clear it's not a device Xen may have found
a need to access for its own purposes. If aforementioned address covers
(or is adjacent to) the MSI-X table of a device drive by this driver,
then it would also be helpful to know how many MSI-X entries the device
reports its table can have.

Jan

> [   14.045506]  local_pci_probe+0x42/0x80
> [   14.045515]  work_for_cpu_fn+0x16/0x20
> [   14.045522]  process_one_work+0x1ec/0x390
> [   14.045529]  worker_thread+0x53/0x3e0
> [   14.045534]  ? process_one_work+0x390/0x390
> [   14.045540]  kthread+0x127/0x150
> [   14.045548]  ? set_kthread_struct+0x40/0x40
> [   14.045554]  ret_from_fork+0x22/0x30
> [   14.045565] Modules linked in: e1000e(+) edac_mce_amd rfkill xen_pcifront 
> pcspkr xt_REDIRECT ip6table_filter ip6table_mangle ip6table_raw ip6_tables 
> ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack iptable_filter iptable_mangle 
> iptable_raw xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 
> nf_defrag_ipv4 xen_scsiback target_core_mod xen_netback xen_privcmd 
> xen_gntdev xen_gntalloc xen_blkback xen_evtchn fuse drm bpf_preload ip_tables 
> overlay xen_blkfront
> [   14.045620] CR2: ffffc9004069100c
> [   14.045627] ---[ end trace 307f5bb3bd9f30b4 ]---
> [   14.045632] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
> [   14.045640] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 
> 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 
> 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
> [   14.045652] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
> [   14.045657] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: 
> ffffc9004069105c
> [   14.045663] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: 
> ffffc90040691000
> [   14.045668] RBP: 0000000000000003 R08: 0000000000000000 R09: 
> 00000000febd404f
> [   14.045674] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: 
> ffff88800ed41000
> [   14.045679] R13: 0000000000000000 R14: 0000000000000040 R15: 
> 00000000feba0000
> [   14.045698] FS:  0000000000000000(0000) GS:ffff888018400000(0000) 
> knlGS:0000000000000000
> [   14.045706] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   14.045711] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 
> 0000000000000660
> [   14.045718] Kernel panic - not syncing: Fatal exception
> [   14.045726] Kernel Offset: disabled
> 
> I've bisected it down to this commit:
> 
>     commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f
>     Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>     Date:   Thu Jul 29 23:51:41 2021 +0200
> 
>         PCI/MSI: Mask all unused MSI-X entries
> 
> I can reliably reproduce it on Xen 4.14 and Xen 4.8, so I don't think
> Xen version matters here.
> 
> Any idea how to fix it?
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.