|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] RFC: AMD support for paging
At 08:06 -0800 on 15 Feb (1329293205), Andres Lagar-Cavilla wrote:
> > Where will p2m access bit be stored? Please note that bit 52 - bit 58
> > for pte must be zero for p2m_ram_rw pages. For iommu pte, only bit 63
> > and bit 1 - bit 8 are free to use if PR bit = 1.
>
> Wei,
>
> Why *must* bits 52-58 be zero for p2m_ram_rw pages? Or is it just the
> current convention?
>
> I propose limiting p2m type to bits 52-55, and storing p2m access on bits
> 56-58. So, p2m_ram_rw pages with a non-zero access would break the "must
> be zero" requirement/convention.
Eh, let's step back a bit from this. The problem is that the IOMMU
can't handle these bits being set, but the IOMMU can't handle paged-out
or CoW memory either (because there's no way to cause the peripheral to
retry a DMA that faulted).
So can we just say that any VM that's going to use paging, sharing
or access can't have a unified p2m and IOMMU table? That's a bit ugly
since the admin will have to make the choice, but it seems like the
hardware is forcing us into it.
Maybe the sensible choice is to default to _not_ sharing p2m and IOMMU
tables, and have thet be something an expert user can turn on (and which
disables the other features)?
Should we have a hypervisor-level interlock between PCI passthrough and
paging/sharing/access anyway? Doing both is unlikely to lead to happiness.
Tim.
> Right now, without any considerations for paging, we're storing the p2m
> type in bits 52-58 when PR=1. That p2m type can be non zero. p2m_ram_ro,
> p2m_mmio_dm, p2m_logdirty, p2m_populate_on_demand are all par for the
> course. Given your statement above, and table 14 in the IOMMU manual, how
> is this even working? Or is it not working?
>
> Would a violation of these rules cause a VMEXIT_SHUTDOWN?
>
> Thanks a lot for the info!
> Andres
> >
> > Thanks,
> > Wei
> >
> >> An alternative to enabling mem_access on AMD processors will be to limit
> >> the access types to 3 bits. This would force disabling support for two
> >> modes. I'm inclined to disable two out of X, W and WX. I don't think
> >> they
> >> make sense without R permissions.
> >> Thanks!
> >> Andres
> >>
> >>>
> >>> Thanks,
> >>> Wei
> >>>
> >>>>
> >>>> Finally, the triple fault. Maybe I'm missing something obvious.
> >>>> Comments
> >>>> welcome.
> >>>>
> >>>> Patch inline below, thanks!
> >>>> Andres
> >>>>
> >>>> Enable AMD support for paging.
> >>>>
> >>>> Signed-off-by: Andres Lagar-Cavilla<andres@xxxxxxxxxxxxxxxx>
> >>>> Signed-off-by: Adin Scannell<adin@xxxxxxxxxxx>
> >>>>
> >>>> diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/mem_event.c
> >>>> --- a/xen/arch/x86/mm/mem_event.c
> >>>> +++ b/xen/arch/x86/mm/mem_event.c
> >>>> @@ -537,10 +537,6 @@ int mem_event_domctl(struct domain *d, x
> >>>> if ( !hap_enabled(d) )
> >>>> break;
> >>>>
> >>>> - /* Currently only EPT is supported */
> >>>> - if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
> >>>> - break;
> >>>> -
> >>>> rc = -EXDEV;
> >>>> /* Disallow paging in a PoD guest */
> >>>> if ( p2m->pod.entry_count )
> >>>> diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/p2m-pt.c
> >>>> --- a/xen/arch/x86/mm/p2m-pt.c
> >>>> +++ b/xen/arch/x86/mm/p2m-pt.c
> >>>> @@ -53,6 +53,20 @@
> >>>> #define P2M_BASE_FLAGS \
> >>>> (_PAGE_PRESENT | _PAGE_USER | _PAGE_DIRTY | _PAGE_ACCESSED)
> >>>>
> >>>> +#ifdef __x86_64__
> >>>> +/* l1e_from_pfn is not designed to have INVALID_MFN stored. The
> >>>> 0xff..ff
> >>>> + * value tramples over the higher-order bits used for flags (NX,
> >>>> p2mt,
> >>>> + * etc.) This happens for paging entries. Thus we do this clip/unclip
> >>>> + * juggle for l1 entries only (no paging superpages!) */
> >>>> +#define EFF_MFN_WIDTH (PADDR_BITS-PAGE_SHIFT) /* 40 bits */
> >>>> +#define clipped_mfn(mfn) ((mfn)& ((1UL<< EFF_MFN_WIDTH) - 1))
> >>>> +#define unclip_mfn(mfn) ((clipped_mfn((mfn)) == INVALID_MFN) ? \
> >>>> + INVALID_MFN : (mfn))
> >>>> +#else
> >>>> +#define clipped_mfn(mfn) (mfn)
> >>>> +#define unclip_mfn(mfn) (mfn)
> >>>> +#endif /* __x86_64__ */
> >>>> +
> >>>> static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
> >>>> {
> >>>> unsigned long flags;
> >>>> @@ -77,6 +91,9 @@ static unsigned long p2m_type_to_flags(p
> >>>> case p2m_invalid:
> >>>> case p2m_mmio_dm:
> >>>> case p2m_populate_on_demand:
> >>>> + case p2m_ram_paging_out:
> >>>> + case p2m_ram_paged:
> >>>> + case p2m_ram_paging_in:
> >>>> default:
> >>>> return flags;
> >>>> case p2m_ram_ro:
> >>>> @@ -168,7 +185,7 @@ p2m_next_level(struct p2m_domain *p2m, m
> >>>> shift, max)) )
> >>>> return 0;
> >>>>
> >>>> - /* PoD: Not present doesn't imply empty. */
> >>>> + /* PoD/paging: Not present doesn't imply empty. */
> >>>> if ( !l1e_get_flags(*p2m_entry) )
> >>>> {
> >>>> struct page_info *pg;
> >>>> @@ -384,8 +401,9 @@ p2m_set_entry(struct p2m_domain *p2m, un
> >>>> 0, L1_PAGETABLE_ENTRIES);
> >>>> ASSERT(p2m_entry);
> >>>>
> >>>> - if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) )
> >>>> - entry_content = l1e_from_pfn(mfn_x(mfn),
> >>>> + if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) ||
> >>>> + (p2mt == p2m_ram_paged) || (p2mt == p2m_ram_paging_in) )
> >>>> + entry_content = l1e_from_pfn(clipped_mfn(mfn_x(mfn)),
> >>>> p2m_type_to_flags(p2mt,
> >>>> mfn));
> >>>> else
> >>>> entry_content = l1e_empty();
> >>>> @@ -393,7 +411,7 @@ p2m_set_entry(struct p2m_domain *p2m, un
> >>>> if ( entry_content.l1 != 0 )
> >>>> {
> >>>> p2m_add_iommu_flags(&entry_content, 0,
> >>>> iommu_pte_flags);
> >>>> - old_mfn = l1e_get_pfn(*p2m_entry);
> >>>> + old_mfn = unclip_mfn(l1e_get_pfn(*p2m_entry));
> >>>> }
> >>>> /* level 1 entry */
> >>>> p2m->write_p2m_entry(p2m, gfn, p2m_entry, table_mfn,
> >>>> entry_content, 1);
> >>>> @@ -615,11 +633,12 @@ pod_retry_l1:
> >>>> sizeof(l1e));
> >>>>
> >>>> if ( ret == 0 ) {
> >>>> + unsigned long l1e_mfn = unclip_mfn(l1e_get_pfn(l1e));
> >>>> p2mt = p2m_flags_to_type(l1e_get_flags(l1e));
> >>>> - ASSERT(l1e_get_pfn(l1e) != INVALID_MFN || !p2m_is_ram(p2mt));
> >>>> + ASSERT( (l1e_mfn != INVALID_MFN || !p2m_is_ram(p2mt)) ||
> >>>> + (l1e_mfn == INVALID_MFN&& p2m_is_paging(p2mt)) );
> >>>>
> >>>> - if ( p2m_flags_to_type(l1e_get_flags(l1e))
> >>>> - == p2m_populate_on_demand )
> >>>> + if ( p2mt == p2m_populate_on_demand )
> >>>> {
> >>>> /* The read has succeeded, so we know that the mapping
> >>>> * exits at this point. */
> >>>> @@ -641,7 +660,7 @@ pod_retry_l1:
> >>>> }
> >>>>
> >>>> if ( p2m_is_valid(p2mt) || p2m_is_grant(p2mt) )
> >>>> - mfn = _mfn(l1e_get_pfn(l1e));
> >>>> + mfn = _mfn(l1e_mfn);
> >>>> else
> >>>> /* XXX see above */
> >>>> p2mt = p2m_mmio_dm;
> >>>> @@ -783,18 +802,26 @@ pod_retry_l2:
> >>>> pod_retry_l1:
> >>>> if ( (l1e_get_flags(*l1e)& _PAGE_PRESENT) == 0 )
> >>>> {
> >>>> + p2m_type_t l1t = p2m_flags_to_type(l1e_get_flags(*l1e));
> >>>> /* PoD: Try to populate */
> >>>> - if ( p2m_flags_to_type(l1e_get_flags(*l1e)) ==
> >>>> p2m_populate_on_demand )
> >>>> + if ( l1t == p2m_populate_on_demand )
> >>>> {
> >>>> if ( q != p2m_query ) {
> >>>> if ( !p2m_pod_demand_populate(p2m, gfn,
> >>>> PAGE_ORDER_4K,
> >>>> q) )
> >>>> goto pod_retry_l1;
> >>>> } else
> >>>> *t = p2m_populate_on_demand;
> >>>> + } else {
> >>>> + if ( p2m_is_paging(l1t) )
> >>>> + {
> >>>> + *t = l1t;
> >>>> + /* No need to unclip due to check below */
> >>>> + mfn = _mfn(l1e_get_pfn(*l1e));
> >>>> + }
> >>>> }
> >>>>
> >>>> unmap_domain_page(l1e);
> >>>> - return _mfn(INVALID_MFN);
> >>>> + return (l1t == p2m_ram_paging_out) ? mfn : _mfn(INVALID_MFN);
> >>>> }
> >>>> mfn = _mfn(l1e_get_pfn(*l1e));
> >>>> *t = p2m_flags_to_type(l1e_get_flags(*l1e));
> >>>> @@ -914,7 +941,7 @@ static void p2m_change_type_global(struc
> >>>> flags = l1e_get_flags(l1e[i1]);
> >>>> if ( p2m_flags_to_type(flags) != ot )
> >>>> continue;
> >>>> - mfn = l1e_get_pfn(l1e[i1]);
> >>>> + mfn = unclip_mfn(l1e_get_pfn(l1e[i1]));
> >>>> gfn = i1 + (i2 + (i3
> >>>> #if CONFIG_PAGING_LEVELS>= 4
> >>>> + (i4 * L3_PAGETABLE_ENTRIES)
> >>>> @@ -923,7 +950,7 @@ static void p2m_change_type_global(struc
> >>>> * L2_PAGETABLE_ENTRIES) *
> >>>> L1_PAGETABLE_ENTRIES;
> >>>> /* create a new 1le entry with the new type */
> >>>> flags = p2m_type_to_flags(nt, _mfn(mfn));
> >>>> - l1e_content = l1e_from_pfn(mfn, flags);
> >>>> + l1e_content = l1e_from_pfn(clipped_mfn(mfn),
> >>>> flags);
> >>>> p2m->write_p2m_entry(p2m, gfn,&l1e[i1],
> >>>> l1mfn, l1e_content, 1);
> >>>> }
> >>>> @@ -1073,7 +1100,7 @@ long p2m_pt_audit_p2m(struct p2m_domain
> >>>> entry_count++;
> >>>> continue;
> >>>> }
> >>>> - mfn = l1e_get_pfn(l1e[i1]);
> >>>> + mfn = unclip_mfn(l1e_get_pfn(l1e[i1]));
> >>>> ASSERT(mfn_valid(_mfn(mfn)));
> >>>> m2pfn = get_gpfn_from_mfn(mfn);
> >>>> if ( m2pfn != gfn&&
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >>>> http://lists.xensource.com/xen-devel
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >
> >
> >
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |