[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455



On 11/04/2015 22:05, Sander Eikelenboom wrote:
> Saturday, April 11, 2015, 10:22:16 PM, you wrote:
>
>> On 11/04/2015 20:33, Sander Eikelenboom wrote:
>>> Saturday, April 11, 2015, 8:25:52 PM, you wrote:
>>>
>>>> On 11/04/15 18:42, Sander Eikelenboom wrote:
>>>>> Saturday, April 11, 2015, 7:35:57 PM, you wrote:
>>>>>
>>>>>> On 11/04/15 18:25, Sander Eikelenboom wrote:
>>>>>>> Saturday, April 11, 2015, 6:38:17 PM, you wrote:
>>>>>>>
>>>>>>>> On 11/04/15 17:32, Andrew Cooper wrote:
>>>>>>>>> On 11/04/15 17:21, Sander Eikelenboom wrote:
>>>>>>>>>> Saturday, April 11, 2015, 4:21:56 PM, you wrote:
>>>>>>>>>>
>>>>>>>>>>> On 11/04/15 15:11, Sander Eikelenboom wrote:
>>>>>>>>>>>> Friday, April 10, 2015, 8:55:27 PM, you wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/04/15 11:24, Sander Eikelenboom wrote:
>>>>>>>>>>>>>> Hi Andrew,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Finally got some time to figure this out .. and i have narrowed 
>>>>>>>>>>>>>> it down to:
>>>>>>>>>>>>>> git://xenbits.xen.org/staging/qemu-upstream-unstable.git
>>>>>>>>>>>>>> commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 "Xen: Use the 
>>>>>>>>>>>>>> ioreq-server API when available"
>>>>>>>>>>>>>> A straight revert of this commit prevents the issue from 
>>>>>>>>>>>>>> happening.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The reason i had a hard time figuring this out was:
>>>>>>>>>>>>>> - I wasn't aware of this earlier, since git pulling the main xen 
>>>>>>>>>>>>>> tree, doesn't 
>>>>>>>>>>>>>>   auto update the qemu-* trees.
>>>>>>>>>>>>> This has caught me out so many times.  It is very non-obvious 
>>>>>>>>>>>>> behaviour.
>>>>>>>>>>>>>> - So i happen to get this when i cloned a fresh tree to try to 
>>>>>>>>>>>>>> figure out the 
>>>>>>>>>>>>>>   other issue i was seeing.
>>>>>>>>>>>>>> - After that checking out previous versions of the main xen tree 
>>>>>>>>>>>>>> didn't resolve 
>>>>>>>>>>>>>>   this new issue, because the qemu tree doesn't get auto updated 
>>>>>>>>>>>>>> and is set 
>>>>>>>>>>>>>>   "master".
>>>>>>>>>>>>>> - Cloning a xen-stable-4.5.0 made it go away .. because that has 
>>>>>>>>>>>>>> a specific 
>>>>>>>>>>>>>>   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag 
>>>>>>>>>>>>>> which is not 
>>>>>>>>>>>>>>   master.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *sigh* 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is tested with xen main tree at last commit 
>>>>>>>>>>>>>> 3a28f760508fb35c430edac17a9efde5aff6d1d5
>>>>>>>>>>>>>> (normal xen-unstable, not the staging branch)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok so i have added some extra debug info (see attached diff) and 
>>>>>>>>>>>>>> this is the 
>>>>>>>>>>>>>> output when it crashes due to something the commit above 
>>>>>>>>>>>>>> triggered, the 
>>>>>>>>>>>>>> level is out of bounds and the pfn looks fishy too.
>>>>>>>>>>>>>> Complete serial log from both bad and good (specific commit 
>>>>>>>>>>>>>> reverted) are 
>>>>>>>>>>>>>> attached.
>>>>>>>>>>>>> Just to confirm, you are positively identifying a qemu changeset 
>>>>>>>>>>>>> as
>>>>>>>>>>>>> causing this crash?
>>>>>>>>>>>>> If so, the qemu change has discovered a pre-existing issue in the
>>>>>>>>>>>>> toolstack pci-passthrough interface.  Whatever qemu is or isn't 
>>>>>>>>>>>>> doing,
>>>>>>>>>>>>> it should not be able to cause a crash like this.
>>>>>>>>>>>>> With this in mind, I need to brush up on my AMD-Vi details.
>>>>>>>>>>>>> In the meantime, can you run with the following patch to identify 
>>>>>>>>>>>>> what
>>>>>>>>>>>>> is going on, domctl wise?  I assume it is the assign_device which 
>>>>>>>>>>>>> is
>>>>>>>>>>>>> failing, but it will be nice to observe the differences between 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> working and failing case, which might offer a hint.
>>>>>>>>>>>> Hrrm with your patch i end up with a fatal page fault in 
>>>>>>>>>>>> iommu_do_pci_domctl:
>>>>>>>>>>>>
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:31.833] ----[ Xen-4.6-unstable  x86_64  
>>>>>>>>>>>> debug=y  Tainted:    C ]----
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:31.857] CPU:    5
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:31.868] RIP:    e008:[<ffff82d08014c52c>] 
>>>>>>>>>>>> iommu_do_pci_domctl+0x2dc/0x740
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:31.894] RFLAGS: 0000000000010256   
>>>>>>>>>>>> CONTEXT: hypervisor
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:31.915] rax: 0000000000000008   rbx: 
>>>>>>>>>>>> 0000000000000800   rcx: ffffffffffebe5ed
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:31.942] rdx: 0000000000000800   rsi: 
>>>>>>>>>>>> 0000000000000000   rdi: ffff830256ef7e38
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:31.968] rbp: ffff830256ef7c98   rsp: 
>>>>>>>>>>>> ffff830256ef7c08   r8:  00000000deadbeef
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:31.995] r9:  00000000deadbeef   r10: 
>>>>>>>>>>>> ffff82d08024e500   r11: 0000000000000282
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.022] r12: 0000000000000000   r13: 
>>>>>>>>>>>> 0000000000000008   r14: 0000000000000000
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.049] r15: 0000000000000000   cr0: 
>>>>>>>>>>>> 0000000080050033   cr4: 00000000000006f0
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.076] cr3: 00000002336a6000   cr2: 
>>>>>>>>>>>> 0000000000000000
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.096] ds: 0000   es: 0000   fs: 0000   
>>>>>>>>>>>> gs: 0000   ss: e010   cs: e008
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.121] Xen stack trace from 
>>>>>>>>>>>> rsp=ffff830256ef7c08:
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.141]    ffff830256ef7c78 
>>>>>>>>>>>> ffff82d08012c178 ffff830256ef7c28 ffff830256ef7c28
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.168]    0000000000000010 
>>>>>>>>>>>> 0000000000000000 0000000000000000 0000000000000000
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.195]    00000000000006f0 
>>>>>>>>>>>> 00007fe300000000 ffff830256eb7790 ffff83025cc6d300
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.222]    ffff82d080330c60 
>>>>>>>>>>>> 00007fe396bab004 0000000000000000 00007fe396bab004
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.249]    0000000000000000 
>>>>>>>>>>>> 0000000000000005 ffff830256ef7ca8 ffff82d08014900b
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.276]    ffff830256ef7d98 
>>>>>>>>>>>> ffff82d080161f2d 0000000000000010 0000000000000000
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.303]    0000000000000000 
>>>>>>>>>>>> ffff830256ef7ce8 ffff82d08018b655 ffff830256ef7d48
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.330]    ffff830256ef7cf8 
>>>>>>>>>>>> ffff82d08018b66a ffff830256ef7d38 ffff82d08012925e
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.357]    ffff830256efc068 
>>>>>>>>>>>> 0000000800000001 800000022e12c167 0000000000000000
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.384]    0000000000000002 
>>>>>>>>>>>> ffff830256ef7e38 0000000800000000 800000022e12c167
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.411]    0000000000000003 
>>>>>>>>>>>> ffff830256ef7db8 0000000000000000 00007fe396780eb0
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.439]    0000000000000202 
>>>>>>>>>>>> ffffffffffffffff 0000000000000000 00007fe396bab004
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.466]    0000000000000000 
>>>>>>>>>>>> 0000000000000005 ffff830256ef7ef8 ffff82d08010497f
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.493]    0000000000000001 
>>>>>>>>>>>> 0000000000100001 800000022e12c167 ffff88001f7ecc00
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.520]    00007fe396780eb0 
>>>>>>>>>>>> ffff88001c849508 0000000e00000007 ffffffff8105594a
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.547]    000000000000e033 
>>>>>>>>>>>> 0000000000000202 ffff88001ece3d40 000000000000e02b
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.574]    ffff830256ef7e28 
>>>>>>>>>>>> ffff82d080194933 000000000000beef ffffffff81bd6c85
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.601]    ffff830256ef7f08 
>>>>>>>>>>>> ffff82d080193edd 0000000b0000002d 0000000000000001
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.628]    0000000100000800 
>>>>>>>>>>>> 00007fe3962abbd0 ffff000a81050001 00007fe39656ce6e
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.655]    00007ffdf2a654f0 
>>>>>>>>>>>> 00007fe39656d0c9 00007fe39656ce6e 00007fe3969a9a55
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.682] Xen call trace:
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.695]    [<ffff82d08014c52c>] 
>>>>>>>>>>>> iommu_do_pci_domctl+0x2dc/0x740
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.718]    [<ffff82d08014900b>] 
>>>>>>>>>>>> iommu_do_domctl+0x17/0x1a
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.739]    [<ffff82d080161f2d>] 
>>>>>>>>>>>> arch_do_domctl+0x2469/0x26e1
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.762]    [<ffff82d08010497f>] 
>>>>>>>>>>>> do_domctl+0x1a1f/0x1d60
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.783]    [<ffff82d080234c6b>] 
>>>>>>>>>>>> syscall_enter+0xeb/0x145
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.804] 
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.813] Pagetable walk from 
>>>>>>>>>>>> 0000000000000000:
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.831]  L4[0x000] = 0000000234075067 
>>>>>>>>>>>> 000000000001f2a8
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.852]  L3[0x000] = 0000000229ad4067 
>>>>>>>>>>>> 0000000000014c49
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.873]  L2[0x000] = 0000000000000000 
>>>>>>>>>>>> ffffffffffffffff 
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.894] 
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.903] 
>>>>>>>>>>>> ****************************************
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.922] Panic on CPU 5:
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.935] FATAL PAGE FAULT
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.948] [error_code=0000]
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.961] Faulting linear address: 
>>>>>>>>>>>> 0000000000000000
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:32.981] 
>>>>>>>>>>>> ****************************************
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:33.000] 
>>>>>>>>>>>> (XEN) [2015-04-11 14:03:33.009] Reboot in five seconds...
>>>>>>>>>>>>
>>>>>>>>>>>> The RIP resolves to the prink added by your patch in:
>>>>>>>>>>>>
>>>>>>>>>>>>     case XEN_DOMCTL_test_assign_device:
>>>>>>>>>>>>         ret = xsm_test_assign_device(XSM_HOOK, 
>>>>>>>>>>>> domctl->u.assign_device.machine_sbdf);
>>>>>>>>>>>>         if ( ret )
>>>>>>>>>>>>             break;
>>>>>>>>>>>>
>>>>>>>>>>>>         seg = domctl->u.assign_device.machine_sbdf >> 16;
>>>>>>>>>>>>         bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
>>>>>>>>>>>>         devfn = domctl->u.assign_device.machine_sbdf & 0xff;
>>>>>>>>>>>>
>>>>>>>>>>>>         printk("*** %pv->d%d: 
>>>>>>>>>>>> test_assign_device({%04x:%02x:%02x.%u})\n",
>>>>>>>>>>>>                current, d->domain_id,
>>>>>>>>>>>>                seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>>>>>>>>>>>>
>>>>>>>>>>>>         if ( device_assigned(seg, bus, devfn) )
>>>>>>>>>>>>         {
>>>>>>>>>>>>             printk(XENLOG_G_INFO
>>>>>>>>>>>>                    "%04x:%02x:%02x.%u already assigned, or 
>>>>>>>>>>>> non-existent\n",
>>>>>>>>>>>>                    seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>>>>>>>>>>>>             ret = -EINVAL;
>>>>>>>>>>>>         }
>>>>>>>>>>>>         break;
>>>>>>>>>>> hmm - 'd' is NULL.  This ought to work better.
>>>>>>>>>>> diff --git a/xen/drivers/passthrough/pci.c 
>>>>>>>>>>> b/xen/drivers/passthrough/pci.c
>>>>>>>>>>> index 9f3413c..85ff1fc 100644
>>>>>>>>>>> --- a/xen/drivers/passthrough/pci.c
>>>>>>>>>>> +++ b/xen/drivers/passthrough/pci.c
>>>>>>>>>>> @@ -1532,6 +1532,11 @@ int iommu_do_pci_domctl(
>>>>>>>>>>>          max_sdevs = domctl->u.get_device_group.max_sdevs;
>>>>>>>>>>>          sdevs = domctl->u.get_device_group.sdev_array;
>>>>>>>>>>>  
>>>>>>>>>>> +        printk("*** %pv->d%d: get_device_group({%04x:%02x:%02x.%u, 
>>>>>>>>>>> %u})\n",
>>>>>>>>>>> +               current, d->domain_id,
>>>>>>>>>>> +               seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>>>>>>>>>>> +               max_sdevs);
>>>>>>>>>>> +
>>>>>>>>>>>          ret = iommu_get_device_group(d, seg, bus, devfn, sdevs, 
>>>>>>>>>>> max_sdevs);
>>>>>>>>>>>          if ( ret < 0 )
>>>>>>>>>>>          {
>>>>>>>>>>> @@ -1558,6 +1563,9 @@ int iommu_do_pci_domctl(
>>>>>>>>>>>          bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
>>>>>>>>>>>          devfn = domctl->u.assign_device.machine_sbdf & 0xff;
>>>>>>>>>>>  
>>>>>>>>>>> +        printk("*** %pv: 
>>>>>>>>>>> test_assign_device({%04x:%02x:%02x.%u})\n",
>>>>>>>>>>> +               current, seg, bus, PCI_SLOT(devfn), 
>>>>>>>>>>> PCI_FUNC(devfn));
>>>>>>>>>>> +
>>>>>>>>>>>          if ( device_assigned(seg, bus, devfn) )
>>>>>>>>>>>          {
>>>>>>>>>>>              printk(XENLOG_G_INFO
>>>>>>>>>>> @@ -1582,6 +1590,10 @@ int iommu_do_pci_domctl(
>>>>>>>>>>>          bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
>>>>>>>>>>>          devfn = domctl->u.assign_device.machine_sbdf & 0xff;
>>>>>>>>>>>  
>>>>>>>>>>> +        printk("*** %pv->d%d: 
>>>>>>>>>>> assign_device({%04x:%02x:%02x.%u})\n",
>>>>>>>>>>> +               current, d->domain_id,
>>>>>>>>>>> +               seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>>>>>>>>>>> +
>>>>>>>>>>>          ret = device_assigned(seg, bus, devfn) ?:
>>>>>>>>>>>                assign_device(d, seg, bus, devfn);
>>>>>>>>>>>          if ( ret == -ERESTART )
>>>>>>>>>>> @@ -1604,6 +1616,10 @@ int iommu_do_pci_domctl(
>>>>>>>>>>>          bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
>>>>>>>>>>>          devfn = domctl->u.assign_device.machine_sbdf & 0xff;
>>>>>>>>>>>  
>>>>>>>>>>> +        printk("*** %pv->d%d: 
>>>>>>>>>>> deassign_device({%04x:%02x:%02x.%u})\n",
>>>>>>>>>>> +               current, d->domain_id,
>>>>>>>>>>> +               seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>>>>>>>>>>> +
>>>>>>>>>>>          spin_lock(&pcidevs_lock);
>>>>>>>>>>>          ret = deassign_device(d, seg, bus, devfn);
>>>>>>>>>>>          spin_unlock(&pcidevs_lock);
>>>>>>>>>> Hi Andrew,
>>>>>>>>>>
>>>>>>>>>> Attached are the serial logs good (with revert) and bad (without):
>>>>>>>>>>
>>>>>>>>>> Some things that seems strange to me:
>>>>>>>>>> - The numerous calls to get the device 08:00.0 assigned ... for 
>>>>>>>>>> 0a:00.0 there 
>>>>>>>>>>   was only one call to both test assign and assign.
>>>>>>>>>> - However these numerous calls are there in both the good and the 
>>>>>>>>>> bad case,
>>>>>>>>>>   so perhaps it's strange and wrong .. but not the cause ..
>>>>>>>>>> - I had a hunch it could be due to the 08:00.0 using MSI-X, but when 
>>>>>>>>>> only 
>>>>>>>>>>   passing through 0a:00.0, i get the same numerous calls but now for 
>>>>>>>>>> the 
>>>>>>>>>>   0a:00.0 which uses IntX, so I think that is more related to being 
>>>>>>>>>> the *first*
>>>>>>>>>>   device to be passed through to a guest.
>>>>>>>>> I have also observed this behaviour, but not had time to investigate. 
>>>>>>>>> It doesn't appear problematic in the longrun but it probably a 
>>>>>>>>> toolstack
>>>>>>>>> issue which wants fixing (if only in the name of efficiency).
>>>>>>>> And just after I sent this email, I have realised why.
>>>>>>>> The first assign device will have to build IO pagetables, which is a
>>>>>>>> long operation and subject to hypercall continuations.  The second
>>>>>>>> device will reused the same pagetables, so is quick to complete.
>>>>>>> So .. is the ioreq patch from Paul involved in providing something used 
>>>>>>> in building 
>>>>>>> the pagetables .. and could it have say some off by one resulting in 
>>>>>>> the 
>>>>>>> 0xffffffffffff .. which could lead to the pagetable building going 
>>>>>>> beserk, 
>>>>>>> requiring a paging_mode far greater than normally would be required .. 
>>>>>>> which 
>>>>>>> get's set .. since that isn't checked properly .. leading to things 
>>>>>>> breaking 
>>>>>>> a bit further when it does get checked ? 
>>>>>> A -1 is slipping in somewhere and ending up in the gfn field.
>>>>>> The result is that update_paging_mode() attempts to construct
>>>>>> iopagetables to cover a 76bit address space, which is how level ends up
>>>>>> at 8.  (Note that a level of 7 is reserved, and a level of anything
>>>>>> greater than 4 is implausible on your system.)
>>>>>> I think the crash is collateral damage following on from
>>>>>> update_paging_mode() not properly sanitising its input, but that there
>>>>>> is still some other issue causing -1 to be passed in the first place.
>>>>>> I am still trying to locate where a -1 might plausibly be coming from.
>>>>> I have just added some extra debug code to store the values from the 
>>>>> start 
>>>>> of update_paging_mode() .. so i can print them at the end if the 
>>>>> paging_mode 
>>>>> gets out of band and do a dump_stack() as well. Hopefully is will confirm 
>>>>> this.
>>>> Right - arch_iommu_populate_page_table() is falling over a page
>>>> allocated to the domain which doesn't have a valid gfn.
>>>> The ioreq server allocates itself some guest pages and then shoots them
>>>> out as part of setting the server up.  (This is a kudge to work around
>>>> the fact that Xen doesn't have an interface for device models etc to
>>>> allocate memory on behalf of the domain which will strictly never find
>>>> its way into the guest physmap.)
>>>> Can you try this patch and see whether some of the numbers printed out
>>>> start matching up?
>>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>>> index bfde380..d1adfa7 100644
>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>> @@ -534,6 +534,10 @@ static int hvm_map_ioreq_page(
>>>>  static void hvm_remove_ioreq_gmfn(
>>>>      struct domain *d, struct hvm_ioreq_page *iorp)
>>>>  {
>>>> +    printk("*** %s() d%d, page %p, mfn %lx, gfn %lx, va %p\n",
>>>> +           __func__, d->domain_id,
>>> +           iorp->>page, page_to_mfn(iorp->page), iorp->gmfn, iorp->va);
>>>> +
>>>>      guest_physmap_remove_page(d, iorp->gmfn,
>>>>                                page_to_mfn(iorp->page), 0);
>>>>      clear_page(iorp->va);
>>>> diff --git a/xen/drivers/passthrough/x86/iommu.c
>>>> b/xen/drivers/passthrough/x86/iommu.c
>>>> index 9eb8d33..048a1a9 100644
>>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>>> @@ -59,7 +59,16 @@ int arch_iommu_populate_page_table(struct domain *d)
>>>>          if ( has_hvm_container_domain(d) ||
>>>>              (page->u.inuse.type_info & PGT_type_mask) ==
>>>> PGT_writable_page )
>>>>          {
>>>> -            BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page))));
>>>> +            unsigned long mfn = page_to_mfn(page);
>>>> +            unsigned long gfn = mfn_to_gmfn(d, mfn);
>>>> +
>>>> +            BUG_ON(SHARED_M2P(gfn));
>>>> +
>>>> +            if ( gfn == INVALID_MFN )
>>>> +            {
>>>> +                printk("*** %s() d%d, page %p, mfn %lx, gfn %lx - about
>>>> to break\n",
>>>> +                       __func__, d->domain_id, page, mfn, gfn);
>>>> +            }
>>>>              rc = hd->platform_ops->map_page(
>>>>                  d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page),
>>>>                  IOMMUF_readable|IOMMUF_writable);
>>> Ok .. so here we go:
>>>
>>> (XEN) [2015-04-11 19:24:59.418] *** hvm_remove_ioreq_gmfn() d1, page 
>>> ffff82e0049f7700, mfn 24fbb8, gfn feff0, va ffff82c00082b000
>>> (XEN) [2015-04-11 19:24:59.452] *** hvm_remove_ioreq_gmfn() d1, page 
>>> ffff82e0049f76e0, mfn 24fbb7, gfn feff1, va ffff82c00082d000
>>> (XEN) [2015-04-11 19:25:00.158] *** d0v5: test_assign_device({0000:0a:00.0})
>>> (XEN) [2015-04-11 19:25:02.221] io.c:429: d1: bind: m_gsi=47 g_gsi=36 
>>> dev=00.00.5 intx=0
>>> (XEN) [2015-04-11 19:25:02.248] *** d0v1->d1: assign_device({0000:0a:00.0})
>>> (XEN) [2015-04-11 19:25:02.268] ?!?!? d1: pci_dev:0000:0a:00.0 
>>> hd->arch.paging_mode:2
>>> (XEN) [2015-04-11 19:25:02.290] *** d0v1->d1: assign_device({0000:0a:00.0})
>>> (XEN) [2015-04-11 19:25:02.310] ?!?!? d1: pci_dev:0000:0a:00.0 
>>> hd->arch.paging_mode:2
>>> (XEN) [2015-04-11 19:25:02.333] *** d0v1->d1: assign_device({0000:0a:00.0})
>>> (XEN) [2015-04-11 19:25:02.353] ?!?!? d1: pci_dev:0000:0a:00.0 
>>> hd->arch.paging_mode:2
>>> (XEN) [2015-04-11 19:25:02.375] *** d0v1->d1: assign_device({0000:0a:00.0})
>>> (XEN) [2015-04-11 19:25:02.395] ?!?!? d1: pci_dev:0000:0a:00.0 
>>> hd->arch.paging_mode:2
>>> <BIG SNIP>
>>> (XEN) [2015-04-11 19:25:45.444] *** d0v1->d1: assign_device({0000:0a:00.0})
>>> (XEN) [2015-04-11 19:25:45.464] ?!?!? d1: pci_dev:0000:0a:00.0 
>>> hd->arch.paging_mode:2
>>> (XEN) [2015-04-11 19:25:45.486] *** d0v1->d1: assign_device({0000:0a:00.0})
>>> (XEN) [2015-04-11 19:25:45.506] ?!?!? d1: pci_dev:0000:0a:00.0 
>>> hd->arch.paging_mode:2
>>> (XEN) [2015-04-11 19:25:45.529] *** d0v1->d1: assign_device({0000:0a:00.0})
>>> (XEN) [2015-04-11 19:25:45.549] ?!?!? d1: pci_dev:0000:0a:00.0 
>>> hd->arch.paging_mode:2
>>> (XEN) [2015-04-11 19:25:45.571] *** arch_iommu_populate_page_table() d1, 
>>> page ffff82e0049f7700, mfn 24fbb8, gfn ffffffffffffffff - about to break
>>> (XEN) [2015-04-11 19:25:45.610] AMD-Vi: ?!?!? amd_iommu_map_page level 
>>> before:3 gfn:0xffffffffffffffff mfn:0x24fbb8 flags:3
>>> (XEN) [2015-04-11 19:25:45.642] AMD-Vi: ?!?!? update_paging_mode level 
>>> before:3 gfn:0xffffffffffffffff 
>>> (XEN) [2015-04-11 19:25:45.669] AMD-Vi: ?!?!? update_paging_mode end: 
>>> paging_mode:6 offset:31 root: old_root_mfn:0x11d55b new_root_mfn:0x11d55a 
>>> gfn:0xffffffffffffffff req_id:0 PTE_PER_TABLE_SIZE:512 
>>> (XEN) [2015-04-11 19:25:45.722] AMD-Vi: ?!?!? update_paging_mode end: 
>>> values at start: paging_mode:3 offset:-1 gfn:0xffffffffffffffff 
>>> (XEN) [2015-04-11 19:25:45.757] AMD-Vi: ?!?!? amd_iommu_map_page level 
>>> after update paging mode:6 gfn:0xffffffffffffffff mfn:0x24fbb8 flags:3
>>> (XEN) [2015-04-11 19:25:45.794] AMD-Vi: ?!?!? amd_iommu_map_page level 
>>> end:6  gfn:0xffffffffffffffff mfn:0x24fbb8 flags:3
>>> (XEN) [2015-04-11 19:25:45.826] *** arch_iommu_populate_page_table() d1, 
>>> page ffff82e0049f76e0, mfn 24fbb7, gfn ffffffffffffffff - about to break
>>> (XEN) [2015-04-11 19:25:45.864] AMD-Vi: ?!?!? amd_iommu_map_page level 
>>> before:6 gfn:0xffffffffffffffff mfn:0x24fbb7 flags:3
>>> (XEN) [2015-04-11 19:25:45.897] AMD-Vi: ?!?!? update_paging_mode level 
>>> before:6 gfn:0xffffffffffffffff 
>>> (XEN) [2015-04-11 19:25:45.924] AMD-Vi: ?!?!? update_paging_mode end: 
>>> paging_mode:8 offset:1 root: old_root_mfn:0x11d554 new_root_mfn:0x11d553 
>>> gfn:0xffffffffffffffff req_id:0 PTE_PER_TABLE_SIZE:512 
>>> (XEN) [2015-04-11 19:25:45.976] AMD-Vi: ?!?!? update_paging_mode end: 
>>> values at start: paging_mode:6 offset:524287 gfn:0xffffffffffffffff 
>>> (XEN) [2015-04-11 19:25:46.013] AMD-Vi: ?!?!? amd_iommu_map_page level 
>>> after update paging mode:8 gfn:0xffffffffffffffff mfn:0x24fbb7 flags:3
>>> (XEN) [2015-04-11 19:25:46.050] AMD-Vi: ?!?!? iommu_pde_from_gfn: domid:1 
>>> table:1 level:8 pfn:0xffffffffffffffff
>>> (XEN) [2015-04-11 19:25:46.079] Xen BUG at iommu_map.c:459
>>> (XEN) [2015-04-11 19:25:46.095] ----[ Xen-4.6-unstable  x86_64  debug=y  
>>> Tainted:    C ]----
>>> (XEN) [2015-04-11 19:25:46.119] CPU:    2
>>> (XEN) [2015-04-11 19:25:46.131] RIP:    e008:[<ffff82d080155d03>] 
>>> iommu_pde_from_gfn+0x82/0x47a
>>> (XEN) [2015-04-11 19:25:46.156] RFLAGS: 0000000000010202   CONTEXT: 
>>> hypervisor
>>> (XEN) [2015-04-11 19:25:46.177] rax: 0000000000000000   rbx: 
>>> 0000000000000008   rcx: 0000000000000000
>>> (XEN) [2015-04-11 19:25:46.203] rdx: ffff830256f20000   rsi: 
>>> 000000000000000a   rdi: ffff82d0802986c0
>>> (XEN) [2015-04-11 19:25:46.230] rbp: ffff830256f27ad8   rsp: 
>>> ffff830256f27a78   r8:  ffff830256f30000
>>> (XEN) [2015-04-11 19:25:46.257] r9:  0000000000000002   r10: 
>>> 0000000000000032   r11: 0000000000000002
>>> (XEN) [2015-04-11 19:25:46.284] r12: ffff82e0023aaa60   r13: 
>>> 000000000024fb00   r14: 00000000000000e9
>>> (XEN) [2015-04-11 19:25:46.311] r15: 00007d2000000000   cr0: 
>>> 0000000080050033   cr4: 00000000000006f0
>>> (XEN) [2015-04-11 19:25:46.337] cr3: 000000025f176000   cr2: 
>>> ffff8000007f6800
>>> (XEN) [2015-04-11 19:25:46.358] ds: 0000   es: 0000   fs: 0000   gs: 0000   
>>> ss: e010   cs: e008
>>> (XEN) [2015-04-11 19:25:46.383] Xen stack trace from rsp=ffff830256f27a78:
>>> (XEN) [2015-04-11 19:25:46.403]    ffff83025f7bd000 ffff830256f27b30 
>>> ffffffffffffffff ffff830200000030
>>> (XEN) [2015-04-11 19:25:46.430]    ffff830256f27ae8 ffff830256f27aa8 
>>> 0000000000000000 ffff83025f7bd000
>>> (XEN) [2015-04-11 19:25:46.457]    ffff82e0049f76e0 000000000024fbb7 
>>> 00000000000000e9 00007d2000000000
>>> (XEN) [2015-04-11 19:25:46.484]    ffff830256f27b98 ffff82d0801562a9 
>>> 0000000000000206 ffff830256f27b08
>>> (XEN) [2015-04-11 19:25:46.511]    000000000024fbb7 0000000000000003 
>>> ffff83025f7bd938 ffffffffffffffff
>>> (XEN) [2015-04-11 19:25:46.538]    ffff83025f7bd000 ffff83025f7bd000 
>>> 000000000024fbb7 0000000000000000
>>> (XEN) [2015-04-11 19:25:46.565]    0000000000000000 0000000000000000 
>>> 0000000000000000 0000000000000000
>>> (XEN) [2015-04-11 19:25:46.592]    0000000000000000 0000000000000000 
>>> ffff83025f7bd938 ffff83025f7bd000
>>> (XEN) [2015-04-11 19:25:46.619]    ffff82e0049f76e0 000000000024fbb7 
>>> 00000000000000e9 00007d2000000000
>>> (XEN) [2015-04-11 19:25:46.646]    ffff830256f27bf8 ffff82d08015a7f8 
>>> 0000000000000000 ffff83025f7bd020
>>> (XEN) [2015-04-11 19:25:46.673]    000000000024fbb7 ffff830256f20000 
>>> ffff830256f27bf8 0000000000000000
>>> (XEN) [2015-04-11 19:25:46.700]    000000000000000a 00007f017c8d1004 
>>> 0000000000000000 ffff83025f7bd000
>>> (XEN) [2015-04-11 19:25:46.727]    ffff830256f27c98 ffff82d08014c6e1 
>>> ffff830200000002 ffff82d08012c178
>>> (XEN) [2015-04-11 19:25:46.754]    0000000000000000 ffff830256f27c28 
>>> 0000000000000001 0000000000000000
>>> (XEN) [2015-04-11 19:25:46.781]    0000000000000000 0000000000000000 
>>> 00007f017c8d1004 0000000000000000
>>> (XEN) [2015-04-11 19:25:46.808]    ffff82d080331034 ffff830256f20000 
>>> 000000000025f176 00007f017c8d1004
>>> (XEN) [2015-04-11 19:25:46.835]    ffff83025f7bd000 00007f017c8d1004 
>>> ffff83025f7bd000 0000000000000005
>>> (XEN) [2015-04-11 19:25:46.863]    ffff830256f27ca8 ffff82d08014900b 
>>> ffff830256f27d98 ffff82d080161f2d
>>> (XEN) [2015-04-11 19:25:46.890]    000000000023468c 0000000000000002 
>>> 0000000000000005 0000000000000001
>>> (XEN) [2015-04-11 19:25:46.917]    ffff82d080331bb8 0000000000000001 
>>> ffff830256f27de8 ffff82d080120c10
>>> (XEN) [2015-04-11 19:25:46.944] Xen call trace:
>>> (XEN) [2015-04-11 19:25:46.956]    [<ffff82d080155d03>] 
>>> iommu_pde_from_gfn+0x82/0x47a
>>> (XEN) [2015-04-11 19:25:46.979]    [<ffff82d0801562a9>] 
>>> amd_iommu_map_page+0x1ae/0x5ec
>>> (XEN) [2015-04-11 19:25:47.002]    [<ffff82d08015a7f8>] 
>>> arch_iommu_populate_page_table+0x164/0x4c3
>>> (XEN) [2015-04-11 19:25:47.028]    [<ffff82d08014c6e1>] 
>>> iommu_do_pci_domctl+0x491/0x740
>>> (XEN) [2015-04-11 19:25:47.051]    [<ffff82d08014900b>] 
>>> iommu_do_domctl+0x17/0x1a
>>> (XEN) [2015-04-11 19:25:47.073]    [<ffff82d080161f2d>] 
>>> arch_do_domctl+0x2469/0x26e1
>>> (XEN) [2015-04-11 19:25:47.095]    [<ffff82d08010497f>] 
>>> do_domctl+0x1a1f/0x1d60
>>> (XEN) [2015-04-11 19:25:47.116]    [<ffff82d080234c6b>] 
>>> syscall_enter+0xeb/0x145
>>> (XEN) [2015-04-11 19:25:47.137] 
>>> (XEN) [2015-04-11 19:25:47.146] 
>>> (XEN) [2015-04-11 19:25:47.155] ****************************************
>>> (XEN) [2015-04-11 19:25:47.174] Panic on CPU 2:
>>> (XEN) [2015-04-11 19:25:47.187] Xen BUG at iommu_map.c:459
>>> (XEN) [2015-04-11 19:25:47.203] ****************************************
>>> (XEN) [2015-04-11 19:25:47.222] 
>>> (XEN) [2015-04-11 19:25:47.231] Reboot in five seconds...
>>>  
>>>
>> Right - does this fix the issue for you?
> Affirmative :)
> It survives and the device seems to work properly as well,
> will do some more tests tomorrow.
>
> Thanks for tracking it down !

I am not certain that it is the correct way to fix the issue, nor that
the ioreq server code is the only way to trigger it.  There are several
ways to shoot a gfn mapping from the guests physmap.

At least we now understand why it happens.  I will defer to others CC'd
on this thread for their opinions in the matter.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.