|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/PV: avoid indirect call for I/O emulation quirk hook
On 18.01.2024 12:04, Andrew Cooper wrote:
> On 17/01/2024 9:37 am, Jan Beulich wrote:
>> --- a/xen/arch/x86/ioport_emulate.c
>> +++ b/xen/arch/x86/ioport_emulate.c
>> @@ -8,11 +8,10 @@
>> #include <xen/sched.h>
>> #include <xen/dmi.h>
>>
>> -unsigned int (*__read_mostly ioemul_handle_quirk)(
>> - uint8_t opcode, char *io_emul_stub, struct cpu_user_regs *regs);
>> +bool __ro_after_init ioemul_handle_quirk;
>>
>> -static unsigned int cf_check ioemul_handle_proliant_quirk(
>> - u8 opcode, char *io_emul_stub, struct cpu_user_regs *regs)
>> +unsigned int ioemul_handle_proliant_quirk(
>> + uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs)
>> {
>> static const char stub[] = {
>> 0x9c, /* pushf */
>
> Something occurred to me.
>
> diff --git a/xen/arch/x86/ioport_emulate.c b/xen/arch/x86/ioport_emulate.c
> index 23cba842b22e..70f94febe255 100644
> --- a/xen/arch/x86/ioport_emulate.c
> +++ b/xen/arch/x86/ioport_emulate.c
> @@ -13,7 +13,7 @@ bool __ro_after_init ioemul_handle_quirk;
> unsigned int ioemul_handle_proliant_quirk(
> uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs)
> {
> - static const char stub[] = {
> + const char stub[] = {
> 0x9c, /* pushf */
> 0xfa, /* cli */
> 0xee, /* out %al, %dx */
>
> is an improvement, confirmed by bloat-o-meter:
>
> add/remove: 0/1 grow/shrink: 1/0 up/down: 1/-9 (-8)
> Function old new delta
> ioemul_handle_proliant_quirk 58 59 +1
> stub 9 - -9
>
> The reason is that we've got a 9 byte object that's decomposed into two
> rip-relative accesses. i.e. we've got more pointer than data in this case.
I wouldn't mind this as a separate change, but I don't see how it would
fit right here.
> But this adjustment seems to tickle a GCC bug. With that change in
> place, GCC emits:
>
> <ioemul_handle_proliant_quirk>:
> 48 83 ec 10 sub $0x10,%rsp
> ...
> 48 83 c4 10 add $0x10,%rsp
> c3 retq
>
> i.e. we get a stack frame (space at least, no initialisation) despite
> the object having been converted entirely to instruction immediates.
>
> Or in other words, there's a further 12 byte saving available when GCC
> can be persuaded to not even emit the stack frame.
>
> What is even more weird is that I see this GCC-10, and a build of gcc
> master from last week, but not when I try to reproduce in
> https://godbolt.org/z/PnachbznW so there's probably some other setting
> used by Xen which tickles this bug.
Yeah, I've observed such pointless frame allocation elsewhere as well,
so far without being able what exactly triggers it.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |