On 11/15/2010 03:11 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Nov 15, 2010 at 07:57:47PM +0000, Keir Fraser wrote:
>> On 15/11/2010 19:32, "Jeremy Fitzhardinge" <jeremy@xxxxxxxx> wrote:
>>
>>>> Or, is there much disadvantage, to having a static really big PCI hole? Say
>>>> starting at 1GB? The advantage of this would be the ability to hotplug PCI
>>>> devices to a domU even across save/restore/migrate -- this may not work so
>>>> well if you commit yourself to the hole size of the original host, and the
>>>> restore/migrate target host has a bigger hole!
>>> Well, the other question is whether the devices have to have the same
>>> pfn as mfn within the hole. We're emulating the PCI config space anyway
>>> - couldn't we stick the passthrough PCI space at 3G regardless of where
>>> it is on the real hardware?
> Your thinking is that on the Linux side, any of the pfns that are within
> those System RAM gaps (irregardless if they are above or below 4GB) would
> be consultated during PTE creation/lookup (xen_pte_val..).
>
> And if those PFNs are within those System RAM gaps, we would store the
> MFN in the P2M list and instead of doing:
> val = ((pteval_t)pfn << PAGE_SHIFT) | flags
>
> we would actually do mfn = pfn_to_mfn(pfn) and stick on the _PAGE_IOMAP flag.
>
> And example patch (compiled tested, not tested any other way) attached at the
> end of this email.
Right, it basically depends on dropping _PAGE_IOMAP and populating the
p2m with the correct mapping for both memory and hardware pages.
> How does that work on the Xen side? Does the hypervisor depend on the pages
> that belong to the DOM_IO domain to have a INVALID_MFN value in the mfn_list?
Xen wouldn't care. I don't think its necessary to explicitly do a
cross-domain mapping with DOM_IO as we currently do; that's overkill
and/or a misunderstanding on my part.
> We do make the PTE that refer to physical devices to be the DOM_IO domain..
I think Xen will sort that out for itself when presented with a
hardware/device mfn.
>> Well, I don't know. It sounds pretty sensible to me. :-)
>>
>> Certain virtualisation feature sdisappearing after a save/restore/migrate --
>> or worsse, becoming unreliable -- would be a bit sad.
> So having the option of the PCI hole being passed through, and giving
> the tools the value (pci_hole) would mean we could migrate an SR-IOV type
> device from one machine to another. Constructing the PCI hole using the
> XENMEM_machine_memory_map could generate different E820 for the two guests,
> which
> would be indeed a bit sad.
>
>
> --- the patch ---
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 50dc626..96a08ef 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -699,7 +699,7 @@ static bool xen_page_pinned(void *ptr)
>
> static bool xen_iomap_pte(pte_t pte)
> {
> - return pte_flags(pte) & _PAGE_IOMAP;
> + return xen_pfn_is_pci(pte_mfn(pte));
> }
I think populating the p2m appropriately in advance is better than this
test; this is OK for prototyping I guess, but way to expensive for every
set_pte.
J
>
> void xen_set_domain_pte(pte_t *ptep, pte_t pteval, unsigned domid)
> @@ -801,11 +801,6 @@ void set_pte_mfn(unsigned long vaddr, unsigned long mfn,
> pgprot_t flags)
> void xen_set_pte_at(struct mm_struct *mm, unsigned long addr,
> pte_t *ptep, pte_t pteval)
> {
> - if (xen_iomap_pte(pteval)) {
> - xen_set_iomap_pte(ptep, pteval);
> - goto out;
> - }
> -
> ADD_STATS(set_pte_at, 1);
> // ADD_STATS(set_pte_at_pinned, xen_page_pinned(ptep));
> ADD_STATS(set_pte_at_current, mm == current->mm);
> @@ -889,19 +884,6 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
> return val;
> }
>
> -static pteval_t iomap_pte(pteval_t val)
> -{
> - if (val & _PAGE_PRESENT) {
> - unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
> - pteval_t flags = val & PTE_FLAGS_MASK;
> -
> - /* We assume the pte frame number is a MFN, so
> - just use it as-is. */
> - val = ((pteval_t)pfn << PAGE_SHIFT) | flags;
> - }
> -
> - return val;
> -}
>
> pteval_t xen_pte_val(pte_t pte)
> {
> @@ -913,8 +895,8 @@ pteval_t xen_pte_val(pte_t pte)
> pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
> }
>
> - if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
> - return pteval;
> + if (xen_pfn_is_pci(pte_mfn(pte)))
> + pteval |= _PAGE_IOMAP;
>
> return pte_mfn_to_pfn(pteval);
> }
> @@ -974,13 +956,14 @@ pte_t xen_make_pte(pteval_t pte)
> * mappings are just dummy local mappings to keep other
> * parts of the kernel happy.
> */
> - if (unlikely(pte & _PAGE_IOMAP) &&
> - (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> - pte = iomap_pte(pte);
> - } else {
> + if ((unlikely(pte & _PAGE_IOMAP) &&
> + (xen_initial_domain() || addr >= ISA_END_ADDRESS)) ||
> + (unlikely(xen_pfn_is_pci(PFN_UP(addr)))))
> + pte |= _PAGE_IOMAP;
> + else
> pte &= ~_PAGE_IOMAP;
> - pte = pte_pfn_to_mfn(pte);
> - }
> +
> + pte = pte_pfn_to_mfn(pte);
>
> return native_make_pte(pte);
> }
> @@ -1037,10 +1020,8 @@ void xen_set_pud(pud_t *ptr, pud_t val)
>
> void xen_set_pte(pte_t *ptep, pte_t pte)
> {
> - if (xen_iomap_pte(pte)) {
> + if (xen_iomap_pte(pte))
> xen_set_iomap_pte(ptep, pte);
> - return;
> - }
>
> ADD_STATS(pte_update, 1);
> // ADD_STATS(pte_update_pinned, xen_page_pinned(ptep));
> @@ -1058,10 +1039,8 @@ void xen_set_pte(pte_t *ptep, pte_t pte)
> #ifdef CONFIG_X86_PAE
> void xen_set_pte_atomic(pte_t *ptep, pte_t pte)
> {
> - if (xen_iomap_pte(pte)) {
> + if (xen_iomap_pte(pte))
> xen_set_iomap_pte(ptep, pte);
> - return;
> - }
>
> set_64bit((u64 *)ptep, native_pte_val(pte));
> }
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 5a1f22d..bb424e3 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -196,6 +196,31 @@ unsigned long xen_find_max_pfn(void)
> xen_raw_printk("E820 max_pfn = %ld (nr_pages: %ld)\n", max_pfn,
> xen_start_info->nr_pages);
> return max_pfn;
> }
> +
> +int xen_pfn_is_pci(unsigned long pfn)
> +{
> + static struct e820entry map[E820MAX] __initdata;
> + int rc, op, i;
> + struct xen_memory_map memmap;
> + unsigned long long addr = PFN_PHYS(pfn);
> + memmap.nr_entries = E820MAX;
> + set_xen_guest_handle(memmap.buffer, map);
> +
> + op = xen_initial_domain() ?
> + XENMEM_machine_memory_map :
> + XENMEM_memory_map;
> + rc = HYPERVISOR_memory_op(op, &memmap);
> + BUG_ON(rc);
> +
> + for (i = 0; i < memmap.nr_entries; i++) {
> + unsigned long long end = map[i].addr + map[i].size;
> + if (map[i].type != E820_RAM)
> + continue;
> + if (addr >= map[i].addr && addr <= end)
> + return 0;
> + }
> + return 1;
> +}
> /**
> * machine_specific_memory_setup - Hook for machine specific memory setup.
> **/
> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
> index eee2045..f859b04 100644
> --- a/include/xen/xen-ops.h
> +++ b/include/xen/xen-ops.h
> @@ -31,4 +31,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
> extern phys_addr_t xen_extra_mem_start;
> unsigned long xen_find_max_pfn(void);
>
> +int xen_pfn_is_pci(unsigned long pfn);
> +
> #endif /* INCLUDE_XEN_OPS_H */
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|