Xen project Mailing List

Re: [Xen-devel] [PATCH v9 13/15] x86/microcode: Synchronize late microcode loading

To: Chao Gao <chao.gao@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>

Date: Mon, 19 Aug 2019 11:27:36 +0100

Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=sergey.dyasli@xxxxxxxxxx; spf=Pass smtp.mailfrom=sergey.dyasli@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx

Autocrypt: addr=sergey.dyasli@xxxxxxxxxx; keydata= mQINBFtMVHEBEADc/hZcLexrB6vGTdGqEUsYZkFGQh6Z1OO7bCtM1go1RugSMeq9tkFHQSOc 9c7W9NVQqLgn8eefikIHxgic6tGgKoIQKcPuSsnqGao2YabsTSSoeatvmO5HkR0xGaUd+M6j iqv3cD7/WL602NhphT4ucKXCz93w0TeoJ3gleLuILxmzg1gDhKtMdkZv6TngWpKgIMRfoyHQ jsVzPbTTjJl/a9Cw99vuhFuEJfzbLA80hCwhoPM+ZQGFDcG4c25GQGQFFatpbQUhNirWW5b1 r2yVOziSJsvfTLnyzEizCvU+r/Ek2Kh0eAsRFr35m2X+X3CfxKrZcePxzAf273p4nc3YIK9h cwa4ZpDksun0E2l0pIxg/pPBXTNbH+OX1I+BfWDZWlPiPxgkiKdgYPS2qv53dJ+k9x6HkuCy i61IcjXRtVgL5nPGakyOFQ+07S4HIJlw98a6NrptWOFkxDt38x87mSM7aSWp1kjyGqQTGoKB VEx5BdRS5gFdYGCQFc8KVGEWPPGdeYx9Pj2wTaweKV0qZT69lmf/P5149Pc81SRhuc0hUX9K DnYBa1iSHaDjifMsNXKzj8Y8zVm+J6DZo/D10IUxMuExvbPa/8nsertWxoDSbWcF1cyvZp9X tUEukuPoTKO4Vzg7xVNj9pbK9GPxSYcafJUgDeKEIlkn3iVIPwARAQABtChTZXJnZXkgRHlh c2xpIDxzZXJnZXkuZHlhc2xpQGNpdHJpeC5jb20+iQJOBBMBCgA4FiEEkI7HMI5EbM2FLA1L Aa+w5JvbyusFAltMVHECGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQAa+w5JvbyuuQ JBAAry/oRK6m0I+ck1Tarz9a1RrF73r1YoJUk5Bw+PSxsBJOPp3vDeAz3Kqw58qmBXeNlMU4 1cqAxFxCCKMtER1gpmrKWBA1/H1ZoBRtzhaHgPTQLyR7LB1OgdpgwEOjN1Q5gME8Pk21y/3N cG5YBgD/ZHbq8nWS/G3r001Ie3nX55uacGk/Ry175cS48+asrerShKMDNMT1cwimo9zH/3Lm RTpWloh2dG4jjwtCXqB7s+FEE5wQVCpPp9p55+9pPd+3DXmsQEcJ/28XHo/UJW663WjRlRc4 wgPwiC9Co1HqaMKSzdPpZmI5D4HizWH8jF7ppUjWoPapwk4dEA7Al0vx1Bz3gbJAL8DaRgQp H4j/16ifletfGUNbHJR2vWljZ5SEf2vMVcdubf9eFUfBF/9OOR1Kcj1PISP8sPhcP7oCfFtH RcxXh1OStrRFtltJt2VlloKXAUggdewwyyD4xl9UHCfI4lSexOK37wNSQYPQcVcOS1bl4NhQ em6pw2AC32NsnQE5PmczFADDIpWhO/+WtkTFeE2HHfAn++y3YDtKQd7xes9UJjQNiGziArST l6Zrx4/nShVLeYRVW76l27gI5a8BZLWwBVRsWniGM50OOJULvSag7kh+cjsrXXpNuA4rfEoB Bxr7pso9e5YghupDc8XftsYd7mlAgOTCAC8uZme5Ag0EW0xUcQEQAMKi97v3DwwPgYVPYIbQ JAvoMgubJllC9RcE0PQsE6nEKSrfOT6Gh5/LHOXLbQI9nzU/xdr6kMfwbYVTnZIY/SwsLrJa gSKm64t11MjC1Vf03/sncx1tgI7nwqMMIAYLsXnQ9X/Up5L/gLO2YDIPxrQ6g4glgRYPT53i r6/hTz3dlpqyPCorpuF+WY7P2ujhlFlXCAaD6btPPM/9LZSmI0xS4aCBLH+pZeCr0UGSMhsX JYN0QRLjfsIDGyqaXVH9gwV2Hgsq6z8fNPQlBc3IpDvfXa1rYtgldYBfG521L3wnsMcKoFSr R5dpH7Jtvv5YBuAk8r571qlMhyAmVKiEnc+RonWl503D5bAHqNmFNjV248J5scyRD/+BcYLI 2CFG28XZrCvjxq3ux5hpmg2fCu+y98h6/yuwB/JhbFlDOSoluEpysiEL3R5GTKbxOF664q5W fiSObxNONxs86UtghqNDRUJgyS0W6TfykGOnZDVYAC9Gg8SbQDta1ymA0q76S/NG2MrJEOIr 1GtOr/UjNv2x4vW56dzX/3yuhK1ilpgzh1q504ETC6EKXMaFT8cNgsMlk9dOvWPwlsIJ249+ PizMDFGITxGTIrQAaUBO+HRLSBYdHNrHJtytkBoTjykCt7M6pl7l+jFYjGSw4fwexVy0MqsD AZ2coH82RTPb6Q7JABEBAAGJAjYEGAEKACAWIQSQjscwjkRszYUsDUsBr7Dkm9vK6wUCW0xU cQIbDAAKCRABr7Dkm9vK6+9uD/9Ld3X5cvnrwrkFMddpjFKoJ4yphtX2s+EQfKT6vMq3A1dJ tI7zHTFm60uBhX6eRbQow8fkHPcjXGJEoCSJf8ktwx/HYcBcnUK/aulHpvHIIYEma7BHry4x L+Ap7oBbBNiraS3Wu1k+MaX07BWhYYkpu7akUEtaYsCceVc4vpYNITUzPYCHeMwc5pLICA+7 VdI1rrTSAwlCtLGBt7ttbvaAKN4dysiN+/66Hlxnn8n952lZdG4ThPPzafG50EgcTa+dASgm tc6HaQAmJiwb4iWUOoUoM+udLRHcN6cE0bQivyH1bqF4ROeFBRz00MUJKvzUynR9E50F9hmd DOBJkyM3Z5imQ0RayEkRHhlhj7uECaojnUeewq4zjpAg2HTSMkdEzKRbdMEyXCdQXFnSCmUB 5yMIULuDbOODWo3EufExLjAKzIRWEKQ/JidLzO6hrhlQffsJ7MPTU+Hg7WxqWfn4zhuUcIQB SlkiRMalSiJITC2jG7oQRRh9tyNaDMkKzTbeFtHKRmUUAuhE0LBXP8Wc+5W7b3WOf2SO8JMR 4TqDZ0K06s66S5fOTW0h56iCCxTsAnRvM/tA4SERyRoFs/iTqJzboskZY0yKeWV4/IQxfOyC YwdU3//zANM1ZpqeE/8lnW/kx+fyzVyEioLSwkjDvdG++4GQ5r6PHQ7BbdEWhA==

Cc: "sergey.dyasli@xxxxxxxxxx >> Sergey Dyasli" <sergey.dyasli@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Ashok Raj <ashok.raj@xxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Borislav Petkov <bp@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Mon, 19 Aug 2019 10:27:51 +0000

Ironport-sdr: qSIKGOXnZmwbKbALyKciegCVhSBddOBAiGkIU46sGiDaMUUx6uAY3+g1eno8On8o+sI51PhQod GZhpSGn0xsjQyZv25A297iBiyFi98f6wvOFQ7sqKikaChtOoEY88HhDK2LVusEU7Nw2VwlJCTX LY5fzcvpvjqGD4L0iz0tNBKK5QwbPoc7f98CTmRasbFEXA273j8jZV8Ltx9hHarDTt9+ZRyjSw aasOgLCh+tgqYmmL1RmBcCcFvd/JcRSfU2z0YbluTT759L8g5D/R9TSgv1iA2ekbhKdmZ83xSv x30=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 19/08/2019 02:25, Chao Gao wrote: > This patch ports microcode improvement patches from linux kernel. > > Before you read any further: the early loading method is still the > preferred one and you should always do that. The following patch is > improving the late loading mechanism for long running jobs and cloud use > cases. > > Gather all cores and serialize the microcode update on them by doing it > one-by-one to make the late update process as reliable as possible and > avoid potential issues caused by the microcode update. > > Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx> > Tested-by: Chao Gao <chao.gao@xxxxxxxxx> > [linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff] > [linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7] > Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx> > Cc: Ashok Raj <ashok.raj@xxxxxxxxx> > Cc: Borislav Petkov <bp@xxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Cc: Jan Beulich <jbeulich@xxxxxxxx> > --- > Changes in v9: > - log __buildin_return_address(0) when timeout > - divide CPUs into three logical sets and they will call different > functions during ucode loading. The 'control thread' is chosen to > coordinate ucode loading on all CPUs. Since only control thread would > set 'loading_state', we can get rid of 'cmpxchg' stuff in v8. > - s/rep_nop/cpu_relax > - each thread updates its revision number itself > - add XENLOG_ERR prefix for each line of multi-line log messages > > Changes in v8: > - to support blocking #NMI handling during loading ucode > * introduce a flag, 'loading_state', to mark the start or end of > ucode loading. > * use a bitmap for cpu callin since if cpu may stay in #NMI handling, > there are two places for a cpu to call in. bitmap won't be counted > twice. > * don't wait for all CPUs callout, just wait for CPUs that perform the > update. We have to do this because some threads may be stuck in NMI > handling (where cannot reach the rendezvous). > - emit a warning if the system stays in stop_machine context for more > than 1s > - comment that rdtsc is fine while loading an update > - use cmpxchg() to avoid panic being called on multiple CPUs > - Propagate revision number to other threads > - refine comments and prompt messages > > Changes in v7: > - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int. > - reword the comment above microcode_update_cpu() to clearly state that > one thread per core should do the update. > --- > xen/arch/x86/microcode.c | 289 > +++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 267 insertions(+), 22 deletions(-) > > diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c > index bdd9c9f..91f9e81 100644 > --- a/xen/arch/x86/microcode.c > +++ b/xen/arch/x86/microcode.c > @@ -30,18 +30,52 @@ > #include <xen/smp.h> > #include <xen/softirq.h> > #include <xen/spinlock.h> > +#include <xen/stop_machine.h> > #include <xen/tasklet.h> > #include <xen/guest_access.h> > #include <xen/earlycpio.h> > +#include <xen/watchdog.h> > > +#include <asm/delay.h> > #include <asm/msr.h> > #include <asm/processor.h> > #include <asm/setup.h> > #include <asm/microcode.h> > > +/* > + * Before performing a late microcode update on any thread, we > + * rendezvous all cpus in stop_machine context. The timeout for > + * waiting for cpu rendezvous is 30ms. It is the timeout used by > + * live patching > + */ > +#define MICROCODE_CALLIN_TIMEOUT_US 30000 > + > +/* > + * Timeout for each thread to complete update is set to 1s. It is a > + * conservative choice considering all possible interference. > + */ > +#define MICROCODE_UPDATE_TIMEOUT_US 1000000 > + > static module_t __initdata ucode_mod; > static signed int __initdata ucode_mod_idx; > static bool_t __initdata ucode_mod_forced; > +static unsigned int nr_cores; > + > +/* > + * These states help to coordinate CPUs during loading an update. > + * > + * The semantics of each state is as follow: > + * - LOADING_PREPARE: initial state of 'loading_state'. > + * - LOADING_CALLIN: CPUs are allowed to callin. > + * - LOADING_ENTER: all CPUs have called in. Initiate ucode loading. > + * - LOADING_EXIT: ucode loading is done or aborted. > + */ > +static enum { > + LOADING_PREPARE, > + LOADING_CALLIN, > + LOADING_ENTER, > + LOADING_EXIT, > +} loading_state; > > /* > * If we scan the initramfs.cpio for the early microcode code > @@ -190,6 +224,16 @@ static DEFINE_SPINLOCK(microcode_mutex); > DEFINE_PER_CPU(struct cpu_signature, cpu_sig); > > /* > + * Count the CPUs that have entered, exited the rendezvous and succeeded in > + * microcode update during late microcode update respectively. > + * > + * Note that a bitmap is used for callin to allow cpu to set a bit multiple > + * times. It is required to do busy-loop in #NMI handling. > + */ > +static cpumask_t cpu_callin_map; > +static atomic_t cpu_out, cpu_updated; > + > +/* > * Return a patch that covers current CPU. If there are multiple patches, > * return the one with the highest revision number. Return error If no > * patch is found and an error occurs during the parsing process. Otherwise > @@ -232,6 +276,34 @@ bool microcode_update_cache(struct microcode_patch > *patch) > return true; > } > > +/* Wait for a condition to be met with a timeout (us). */ > +static int wait_for_condition(int (*func)(void *data), void *data, > + unsigned int timeout) > +{ > + while ( !func(data) ) > + { > + if ( !timeout-- ) > + { > + printk("CPU%u: Timeout in %pS\n", > + smp_processor_id(), __builtin_return_address(0)); > + return -EBUSY; > + } > + udelay(1); > + } > + > + return 0; > +} > + > +static int wait_cpu_callin(void *nr) > +{ > + return cpumask_weight(&cpu_callin_map) >= (unsigned long)nr; > +} > + > +static int wait_cpu_callout(void *nr) > +{ > + return atomic_read(&cpu_out) >= (unsigned long)nr; > +} > + > /* > * Load a microcode update to current CPU. > * > @@ -265,37 +337,155 @@ static int microcode_update_cpu(const struct > microcode_patch *patch) > return err; > } > > -static long do_microcode_update(void *patch) > +static int slave_thread_fn(void) > +{ > + unsigned int cpu = smp_processor_id(); > + unsigned int master = cpumask_first(this_cpu(cpu_sibling_mask)); > + > + while ( loading_state != LOADING_CALLIN ) > + cpu_relax(); > + > + cpumask_set_cpu(cpu, &cpu_callin_map); > + > + while ( loading_state != LOADING_EXIT ) > + cpu_relax(); > + > + /* Copy update revision from the "master" thread. */ > + this_cpu(cpu_sig).rev = per_cpu(cpu_sig, master).rev; > + > + return 0; > +} > + > +static int master_thread_fn(const struct microcode_patch *patch) > +{ > + unsigned int cpu = smp_processor_id(); > + int ret = 0; > + > + while ( loading_state != LOADING_CALLIN ) > + cpu_relax(); > + > + cpumask_set_cpu(cpu, &cpu_callin_map); > + > + while ( loading_state != LOADING_ENTER ) > + cpu_relax(); If I'm reading it right, this will wait forever in case when... > + > + /* > + * If an error happened, control thread would set 'loading_state' > + * to LOADING_EXIT. Don't perform ucode loading for this case > + */ > + if ( loading_state == LOADING_EXIT ) > + return ret; > + > + ret = microcode_ops->apply_microcode(patch); > + if ( !ret ) > + atomic_inc(&cpu_updated); > + atomic_inc(&cpu_out); > + > + while ( loading_state != LOADING_EXIT ) > + cpu_relax(); > + > + return ret; > +} > + > +static int control_thread_fn(const struct microcode_patch *patch) > { > - unsigned int cpu; > + unsigned int cpu = smp_processor_id(), done; > + unsigned long tick; > + int ret; > > - /* Store the patch after a successful loading */ > - if ( !microcode_update_cpu(patch) && patch ) > + /* Allow threads to call in */ > + loading_state = LOADING_CALLIN; > + smp_mb(); > + > + cpumask_set_cpu(cpu, &cpu_callin_map); > + > + /* Waiting for all threads calling in */ > + ret = wait_for_condition(wait_cpu_callin, > + (void *)(unsigned long)num_online_cpus(), > + MICROCODE_CALLIN_TIMEOUT_US); > + if ( ret ) { > + loading_state = LOADING_EXIT; > + return ret; > + } ...this condition holds. Have you actually tested this case? > + > + /* Let master threads load the given ucode update */ > + loading_state = LOADING_ENTER; > + smp_mb(); > + > + ret = microcode_ops->apply_microcode(patch); > + if ( !ret ) > + atomic_inc(&cpu_updated); > + atomic_inc(&cpu_out); > + > + tick = rdtsc_ordered(); > + /* Waiting for master threads finishing update */ > + done = atomic_read(&cpu_out); > + while ( done != nr_cores ) > { > - spin_lock(&microcode_mutex); > - microcode_update_cache(patch); > - spin_unlock(&microcode_mutex); > - patch = NULL; > + /* > + * During each timeout interval, at least a CPU is expected to > + * finish its update. Otherwise, something goes wrong. > + * > + * Note that RDTSC (in wait_for_condition()) is safe for threads to > + * execute while waiting for completion of loading an update. > + */ > + if ( wait_for_condition(wait_cpu_callout, > + (void *)(unsigned long)(done + 1), > + MICROCODE_UPDATE_TIMEOUT_US) ) > + panic("Timeout when finished updating microcode (finished > %u/%u)", > + done, nr_cores); > + > + /* Print warning message once if long time is spent here */ > + if ( tick && rdtsc_ordered() - tick >= cpu_khz * 1000 ) > + { > + printk(XENLOG_WARNING > + "WARNING: UPDATING MICROCODE HAS CONSUMED MORE THAN 1 > SECOND!\n"); > + tick = 0; > + } > + done = atomic_read(&cpu_out); > } > > - if ( microcode_ops->end_update ) > - microcode_ops->end_update(); > + /* Mark loading is done to unblock other threads */ > + loading_state = LOADING_EXIT; > + smp_mb(); > > - cpu = cpumask_next(smp_processor_id(), &cpu_online_map); > - if ( cpu < nr_cpu_ids ) > - return continue_hypercall_on_cpu(cpu, do_microcode_update, patch); > + return ret; > +} > > - /* Free the patch if no CPU has loaded it successfully. */ > - if ( patch ) > - microcode_free_patch(patch); > +static int do_microcode_update(void *patch) > +{ > + unsigned int cpu = smp_processor_id(); > + /* > + * "master" thread is the one with the lowest thread id among all > siblings > + * thread in a core or a compute unit. It is chosen to load a microcode > + * update. > + */ > + unsigned int master = cpumask_first(this_cpu(cpu_sibling_mask)); > + int ret; > > - return 0; > + /* > + * The control thread set state to coordinate ucode loading. Master > threads > + * load the given ucode patch. Slave threads just wait for the completion > + * of the ucode loading process. > + */ > + if ( cpu == cpumask_first(&cpu_online_map) ) > + ret = control_thread_fn(patch); > + else if ( cpu == master ) > + ret = master_thread_fn(patch); > + else > + ret = slave_thread_fn(); > + > + if ( microcode_ops->end_update ) > + microcode_ops->end_update(); > + > + return ret; > } > > int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long > len) > { > int ret; > void *buffer; > + unsigned int cpu, updated; > struct microcode_patch *patch; > > if ( len != (uint32_t)len ) > @@ -314,11 +504,18 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) > buf, unsigned long len) > goto free; > } > > + /* cpu_online_map must not change during update */ > + if ( !get_cpu_maps() ) > + { > + ret = -EBUSY; > + goto free; > + } > + > if ( microcode_ops->start_update ) > { > ret = microcode_ops->start_update(); > if ( ret != 0 ) > - goto free; > + goto put; > } > > patch = parse_blob(buffer, len); > @@ -326,19 +523,67 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) > buf, unsigned long len) > { > ret = PTR_ERR(patch); > printk(XENLOG_INFO "Parsing microcode blob error %d\n", ret); > - goto free; > + goto put; > } > > if ( !patch ) > { > printk(XENLOG_INFO "No ucode found. Update aborted!\n"); > ret = -EINVAL; > - goto free; > + goto put; > + } > + > + cpumask_clear(&cpu_callin_map); > + atomic_set(&cpu_out, 0); > + atomic_set(&cpu_updated, 0); > + loading_state = LOADING_PREPARE; > + > + /* Calculate the number of online CPU core */ > + nr_cores = 0; > + for_each_online_cpu(cpu) > + if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) ) > + nr_cores++; > + > + printk(XENLOG_INFO "%u cores are to update their microcode\n", nr_cores); > + > + /* > + * We intend to disable interrupt for long time, which may lead to > + * watchdog timeout. > + */ > + watchdog_disable(); > + /* > + * Late loading dance. Why the heavy-handed stop_machine effort? > + * > + * - HT siblings must be idle and not execute other code while the other > + * sibling is loading microcode in order to avoid any negative > + * interactions cause by the loading. > + * > + * - In addition, microcode update on the cores must be serialized until > + * this requirement can be relaxed in the future. Right now, this is > + * conservative and good. > + */ > + ret = stop_machine_run(do_microcode_update, patch, NR_CPUS); > + watchdog_enable(); > + > + updated = atomic_read(&cpu_updated); > + if ( updated > 0 ) > + { > + spin_lock(&microcode_mutex); > + microcode_update_cache(patch); > + spin_unlock(&microcode_mutex); > } > + else > + microcode_free_patch(patch); > > - ret = continue_hypercall_on_cpu(cpumask_first(&cpu_online_map), > - do_microcode_update, patch); > + if ( updated && updated != nr_cores ) > + printk(XENLOG_ERR "ERROR: Updating microcode succeeded on %u cores > and failed\n" > + XENLOG_ERR "on other %u cores. A system with differing > microcode \n" > + XENLOG_ERR "revisions is considered unstable. Please reboot > and do not\n" > + XENLOG_ERR "load the microcode that triggersthis warning!\n", > + updated, nr_cores - updated); > > + put: > + put_cpu_maps(); > free: > xfree(buffer); > return ret; > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.