[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH] Re: SMP dom0 with 8 cpus of i386



Keir, Ian,
   With PCI mmconfig option on, and with the PCI express enabled BIOS,
the dom0 kernel reads the PCI config from fix-mapped PCI mmconfig space.
   The PCI mmconfig space is of 256MB size, and it's access is
implemented differently on i386 & x86_64. On x86_64 the whole 256MB is
mapped in the Kernel virtual address space. On i386 it will consume too
much of the kernels virtual address space, hence it is implemented using
a single fix-mapped page. This page is mapped to the desired physical
address for every PCI mmconfig access, as seen in the following code
from mmconfig.c .

static inline void pci_exp_set_dev_base(int bus, int devfn)
{
    u32 dev_base = pci_mmcfg_base_addr | (bus << 20) | (devfn << 12);
    if (dev_base != mmcfg_last_accessed_device) {
        mmcfg_last_accessed_device = dev_base;
        set_fixmap_nocache(FIX_PCIE_MCFG, dev_base);
    }
}

static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
              unsigned int devfn, int reg, int len, u32 *value)
{
    unsigned long flags;

    if (!value || (bus > 255) || (devfn > 255) || (reg > 4095))
        return -EINVAL;

    spin_lock_irqsave(&pci_config_lock, flags);

    pci_exp_set_dev_base(bus, devfn);

    switch (len) {

   At the time of boot the PCI mmconfig space is accessed thousands
times, one after another; that causes fixed map & unmap continuously
very fast for a long time. Currently the fix-mapped virtual address for
Shared_info_page for dom0 & the PCI mmconfig page are adjacent in the
fixed_addresses in the fixedmap.h.

#ifdef CONFIG_PCI_MMCONFIG
    FIX_PCIE_MCFG,
#endif
    FIX_SHARED_INFO,
    FIX_GNTTAB_BEGIN,

   I am suspecting that this is causing a race condition because of
writable page tables. While accessing the PCI mmconfig on i386 the dom0
kernel (cpu 0) is continuously rewriting into the pte for FIX_PCIE_MCFG
at a very fast rate. With writable page tables the updates to ptes are
deferred. In the SMP case other CPUs are getting the interrupts (timer)
at the same time, interrupts handlers access the shared_info page to
notify the dom0 of the events such as timer event. The problem possibly
is that because of the writable page tables, the L1 page is getting
evicted during the mmconfig access, and the shared_page translation
needed for event notification is also in the same L1 page. All the cpus
are using the same page tables at this time. While writing the pte, the
L2 page is getting cut off from the page table. This is somehow causing
corruption in the dom0 page tables, and we see the errors.
    I belive this issue is not on x86_64 because each mmconfig access
does not map/unmap fixmap, and the racing condition accessing the l2
page is not there.
   The current work around working for me is to disable PCI_MMCONFIG for
i386 in the xen0 kernel config. Today or later other people will also
notice this corruption on SMP boxes with SNMP dom0. I can see it once in
a while on a 4 way box. 

Can we disable PCI_MMCONFIG for i386 in the xen0 config till we solve
the race condition issue? Attached is the patch for the config.
   As I have a workaround and I am seeing issues with VMX guests, I am
trying to fix those issues now.

Thanks & Regards,
Nitin
------------------------------------------------------------------------
-----------
Sr Software Engineer
Open Source Technology Center, Intel Corp
-----Original Message-----
From: Kamble, Nitin A 
Sent: Tuesday, August 30, 2005 10:06 AM
To: Keir Fraser
Cc: xen-devel
Subject: RE: [Xen-devel] Re: SMP dom0 with 8 cpus of i386

> Default but with smp enabled.
Same here. I am seeing the issue inconsistently on a 4 way box. 8 way
system does not have any issue with maxcpus=1. with 8 cpus it is
consistent. More no of cpus are causing some corruption. It is always
happening at the time of reading/writing the pci mmconfig space.
  I am debugging here. 

Thanks & Regards,
Nitin

Attachment: nopcimmconfig_i386.patch
Description: nopcimmconfig_i386.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.