[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] RFC x86/msr: Use WRMSRNS $imm when available
On 11/08/2025 9:16 am, Jan Beulich wrote: > On 09.08.2025 00:20, Andrew Cooper wrote: >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >> --- >> CC: Jan Beulich <JBeulich@xxxxxxxx> >> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> >> >> This is on top of the FRED series for the wrmsrns() cleanup, but otherwise >> unrelated. >> >> The code generation isn't entirely ideal >> >> Function old new delta >> init_fred 255 274 +19 >> vmx_set_reg 248 256 +8 >> enter_state_helper.cold 1014 1018 +4 >> __start_xen 8893 8897 +4 >> >> but made worse by the the prior codegen for wrmsrns(MSR_STAR, ...) being mad: >> >> mov $0xc0000081,%ecx >> mov $0xe023e008,%edx >> movabs $0xe023e00800000000,%rax >> cs wrmsr >> >> The two sources of code expansion come from the compiler not being able to >> construct %eax and %edx separately, and not being able propagate constants. >> >> Loading 0 is possibly common enough to warrant another specialisation where >> we >> can use "a" (0), "d" (0) and forgo the MOV+SHR. >> >> I'm probably overthinking things. The addition will be in the noise in >> practice, and Intel are sure the advantage of MSR_IMM will not be. > It's not entirely clear to me what the overall effects are now with your > 02/22 reply on the FRED series. The delta gets larger now that 2/22 isn't quite as bad as it was. > Nevertheless a nit or two here. > >> --- a/xen/arch/x86/include/asm/msr.h >> +++ b/xen/arch/x86/include/asm/msr.h >> @@ -38,9 +38,46 @@ static inline void wrmsrl(unsigned int msr, uint64_t val) >> wrmsr(msr, lo, hi); >> } >> >> +/* >> + * Non-serialising WRMSR with a compile-time constant index, when available. >> + * Falls back to plain WRMSRNS, or to a serialising WRMSR. >> + */ >> +static always_inline void __wrmsrns_imm(uint32_t msr, uint64_t val) >> +{ >> + /* >> + * For best performance, WRMSRNS %r64, $msr is recommended. For >> + * compatibility, we need to fall back to plain WRMSRNS, or to WRMSR. >> + * >> + * The combined ABI is awkward, because WRMSRNS $imm takes a single r64, >> + * whereas WRMSR{,NS} takes a split edx:eax pair. >> + * >> + * Always use WRMSRNS %rax, $imm, because it has the most in common with >> + * the legacy forms. When MSR_IMM isn't available, emit setup logic for >> + * %ecx and %edx too. >> + */ >> + alternative_input_2( >> + "mov $%c[msr], %%ecx\n\t" > Simply %[msr] here? > > And then, might it make sense to pull out this and ... > >> + "mov %%rax, %%rdx\n\t" >> + "shr $32, %%rdx\n\t" >> + ".byte 0x2e; wrmsr", >> + >> + /* WRMSRNS %rax, $msr */ >> + ".byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]", >> X86_FEATURE_MSR_IMM, >> + >> + "mov $%c[msr], %%ecx\n\t" > ... this, to ... > >> + "mov %%rax, %%rdx\n\t" >> + "shr $32, %%rdx\n\t" >> + ".byte 0x0f,0x01,0xc6", X86_FEATURE_WRMSRNS, >> + >> + [msr] "i" (msr), "a" (val) : "rcx", "rdx"); > [msr] "i" (msr), "a" (val), "c" (msr) : "rdx"); > > allowing the compiler to actually know what's put in %ecx? That'll make > original and 2nd replacement code 10 bytes, better balancing with the 9 > bytes of the 1st replacement. And I'd guess that the potentially dead > MOV to %ecx would be hidden in the noise as well. I considered that, but what can the compiler do as a result of knowing %ecx? That said, we do need an RDMSR form (which I desperately want to make foo = rdmsr(MSR_BAR) but my cleanup series from 2019 got nowhere), and in a read+write case I suppose the compiler could deduplicate the setup of %ecx. A dead mov $imm, %ecx probably is as close to a nop as makes no difference. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |