Xen project Mailing List

[Xen-devel] [PATCH] x86/ctxt-switch: Document and improve GDT handling

To: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 4 Jul 2019 18:57:32 +0100

Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=andrew.cooper3@xxxxxxxxxx; spf=Pass smtp.mailfrom=Andrew.Cooper3@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx

Cc: Juergen Gross <jgross@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Thu, 04 Jul 2019 17:57:49 +0000

Ironport-sdr: SDAk2dBxItClEqB0tW2kUg5FIpvpP4sTKFIfoVTY8XKFNte/W3CSM+P+HCKYGFvM/lOHMQ5DGa r9v1ZD2EPNrH/2wNOzmUFsoGFKyHtTdJtUz6EzouQ3avOwMPMxcb3Js3fqycSrGwuHn553wIjS OzvZCdU63KuSpCNg2/X3ut0MfUFckbH37F61u0VlbGle9rwqbyse2Od8ydn2PBAUzIXDxsAR32 kmzDZxwSZv4HjML8hHZrV1xvTA6wH4NcP4ek33dGU9fxPQODPh4wo5UagwlMijMkwf32ng7aDp IFM=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

write_full_gdt_ptes() has a latent bug. Using virt_to_mfn() and iterating with (mfn + i) is wrong, because of PDX compression. The context switch path only functions correctly because NR_RESERVED_GDT_PAGES is 1. As this is exceedingly unlikely to change moving foward, drop the loop rather than inserting a BUILD_BUG_ON(NR_RESERVED_GDT_PAGES != 1). With the loop dropped, write_full_gdt_ptes() becomes more obviously a poor name, so rename it to update_xen_slot_in_full_gdt(). Furthermore, calling virt_to_mfn() in the context switch path is a lot of wasted cycles for a result which is constant after boot. Begin by documenting how Xen handles the GDTs across context switch. From this, we observe that load_full_gdt() is completely independent of the current CPU, and load_default_gdt() only gets passed the current CPU regular GDT. Add two extra per-cpu variables which cache the L1e for the regular and compat GDT, calculated in cpu_smpboot_alloc()/trap_init() as appropriate, so update_xen_slot_in_full_gdt() doesn't need to waste time performing the same calculation on every context switch. Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> --- CC: Jan Beulich <JBeulich@xxxxxxxx> CC: Wei Liu <wl@xxxxxxx> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> CC: Juergen Gross <jgross@xxxxxxxx> Slightly RFC. I'm fairly confident this is better, but Juergen says that the some of his scheduling perf tests notice large difference from subtle changes in __context_switch(), so it would be useful to get some numbers from this change. The delta from this change is: add/remove: 2/0 grow/shrink: 1/1 up/down: 320/-127 (193) Function old new delta cpu_smpboot_callback 1152 1456 +304 per_cpu__gdt_table_l1e - 8 +8 per_cpu__compat_gdt_table_l1e - 8 +8 __context_switch 1238 1111 -127 Total: Before=3339227, After=3339420, chg +0.01% I'm not overly happy about the special case in trap_init() but I can't think of a better place to put this. Also, it should now be very obvious to people that Xen's current GDT handling for non-PV vcpus is a recipe subtle bugs, if we ever manage to execute a stray mov/pop %sreg instruction. We really ought to have Xen's regular GDT in an area where slots 0-13 are either mapped to the zero page, or not present, so we don't risk loading a non-faulting garbage selector. --- xen/arch/x86/domain.c | 52 ++++++++++++++++++++++++++++++---------------- xen/arch/x86/smpboot.c | 4 ++++ xen/arch/x86/traps.c | 10 +++++++++ xen/include/asm-x86/desc.h | 2 ++ 4 files changed, 50 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 84cafbe558..147f96a09e 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1635,23 +1635,42 @@ static void _update_runstate_area(struct vcpu *v) v->arch.pv.need_update_runstate_area = 1; } +/* + * Overview of Xen's GDTs. + * + * Xen maintains per-CPU compat and regular GDTs which are both a single page + * in size. Some content is specific to each CPU (the TSS, the per-CPU marker + * for #DF handling, and optionally the LDT). The compat and regular GDTs + * differ by the layout and content of the guest accessible selectors. + * + * The Xen selectors live from 0xe000 (slot 14 of 16), and need to always + * appear in this position for interrupt/exception handling to work. + * + * A PV guest may specify GDT frames of their own (slots 0 to 13). Room for a + * full GDT exists in the per-domain mappings. + * + * To schedule a PV vcpu, we point slot 14 of the guest's full GDT at the + * current CPU's compat or regular (as appropriate) GDT frame. This is so + * that the per-CPU parts still work correctly after switching pagetables and + * loading the guests full GDT into GDTR. + * + * To schedule Idle or HVM vcpus, we load a GDT base address which causes the + * regular per-CPU GDT frame to appear with selectors at the appropriate + * offset. + */ static inline bool need_full_gdt(const struct domain *d) { return is_pv_domain(d) && !is_idle_domain(d); } -static void write_full_gdt_ptes(seg_desc_t *gdt, const struct vcpu *v) +static void update_xen_slot_in_full_gdt(const struct vcpu *v, unsigned int cpu) { - unsigned long mfn = virt_to_mfn(gdt); - l1_pgentry_t *pl1e = pv_gdt_ptes(v); - unsigned int i; - - for ( i = 0; i < NR_RESERVED_GDT_PAGES; i++ ) - l1e_write(pl1e + FIRST_RESERVED_GDT_PAGE + i, - l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW)); + l1e_write(pv_gdt_ptes(v) + FIRST_RESERVED_GDT_PAGE, + !is_pv_32bit_vcpu(v) ? per_cpu(gdt_table_l1e, cpu) + : per_cpu(compat_gdt_table_l1e, cpu)); } -static void load_full_gdt(const struct vcpu *v, unsigned int cpu) +static void load_full_gdt(const struct vcpu *v) { struct desc_ptr gdt_desc = { .limit = LAST_RESERVED_GDT_BYTE, @@ -1661,11 +1680,12 @@ static void load_full_gdt(const struct vcpu *v, unsigned int cpu) lgdt(&gdt_desc); } -static void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu) +static void load_default_gdt(unsigned int cpu) { struct desc_ptr gdt_desc = { .limit = LAST_RESERVED_GDT_BYTE, - .base = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY), + .base = (unsigned long)(per_cpu(gdt_table, cpu) - + FIRST_RESERVED_GDT_ENTRY), }; lgdt(&gdt_desc); @@ -1678,7 +1698,6 @@ static void __context_switch(void) struct vcpu *p = per_cpu(curr_vcpu, cpu); struct vcpu *n = current; struct domain *pd = p->domain, *nd = n->domain; - seg_desc_t *gdt; ASSERT(p != n); ASSERT(!vcpu_cpu_dirty(n)); @@ -1718,15 +1737,12 @@ static void __context_switch(void) psr_ctxt_switch_to(nd); - gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) : - per_cpu(compat_gdt_table, cpu); - if ( need_full_gdt(nd) ) - write_full_gdt_ptes(gdt, n); + update_xen_slot_in_full_gdt(n, cpu); if ( need_full_gdt(pd) && ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) ) - load_default_gdt(gdt, cpu); + load_default_gdt(cpu); write_ptbase(n); @@ -1739,7 +1755,7 @@ static void __context_switch(void) if ( need_full_gdt(nd) && ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) ) - load_full_gdt(n, cpu); + load_full_gdt(n); if ( pd != nd ) cpumask_clear_cpu(cpu, pd->dirty_cpumask); diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 730fe141fa..004285d14c 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -985,6 +985,8 @@ static int cpu_smpboot_alloc(unsigned int cpu) if ( gdt == NULL ) goto out; per_cpu(gdt_table, cpu) = gdt; + per_cpu(gdt_table_l1e, cpu) = + l1e_from_pfn(virt_to_mfn(gdt), __PAGE_HYPERVISOR_RW); memcpy(gdt, boot_cpu_gdt_table, NR_RESERVED_GDT_PAGES * PAGE_SIZE); BUILD_BUG_ON(NR_CPUS > 0x10000); gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu; @@ -992,6 +994,8 @@ static int cpu_smpboot_alloc(unsigned int cpu) per_cpu(compat_gdt_table, cpu) = gdt = alloc_xenheap_pages(order, memflags); if ( gdt == NULL ) goto out; + per_cpu(compat_gdt_table_l1e, cpu) = + l1e_from_pfn(virt_to_mfn(gdt), __PAGE_HYPERVISOR_RW); memcpy(gdt, boot_cpu_compat_gdt_table, NR_RESERVED_GDT_PAGES * PAGE_SIZE); gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu; diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 8097ef3bf5..25b4b47e5e 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -97,7 +97,9 @@ DEFINE_PER_CPU(uint64_t, efer); static DEFINE_PER_CPU(unsigned long, last_extable_addr); DEFINE_PER_CPU_READ_MOSTLY(seg_desc_t *, gdt_table); +DEFINE_PER_CPU_READ_MOSTLY(l1_pgentry_t, gdt_table_l1e); DEFINE_PER_CPU_READ_MOSTLY(seg_desc_t *, compat_gdt_table); +DEFINE_PER_CPU_READ_MOSTLY(l1_pgentry_t, compat_gdt_table_l1e); /* Master table, used by CPU0. */ idt_entry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE) @@ -2059,6 +2061,14 @@ void __init trap_init(void) } } + /* Cache {,compat_}gdt_table_l1e now that physically relocation is done. */ + this_cpu(gdt_table_l1e) = + l1e_from_pfn(virt_to_mfn(boot_cpu_gdt_table), + __PAGE_HYPERVISOR_RW); + this_cpu(compat_gdt_table_l1e) = + l1e_from_pfn(virt_to_mfn(boot_cpu_compat_gdt_table), + __PAGE_HYPERVISOR_RW); + percpu_traps_init(); cpu_init(); diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h index 85e83bcefb..e565727dc0 100644 --- a/xen/include/asm-x86/desc.h +++ b/xen/include/asm-x86/desc.h @@ -206,8 +206,10 @@ struct __packed desc_ptr { extern seg_desc_t boot_cpu_gdt_table[]; DECLARE_PER_CPU(seg_desc_t *, gdt_table); +DECLARE_PER_CPU(l1_pgentry_t, gdt_table_l1e); extern seg_desc_t boot_cpu_compat_gdt_table[]; DECLARE_PER_CPU(seg_desc_t *, compat_gdt_table); +DECLARE_PER_CPU(l1_pgentry_t, compat_gdt_table_l1e); extern void load_TR(void); -- 2.11.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.