|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/PV: avoid indirect call for I/O emulation quirk hook
On 18/01/2024 11:09 am, Jan Beulich wrote:
> On 18.01.2024 12:04, Andrew Cooper wrote:
>> On 17/01/2024 9:37 am, Jan Beulich wrote:
>>> --- a/xen/arch/x86/ioport_emulate.c
>>> +++ b/xen/arch/x86/ioport_emulate.c
>>> @@ -8,11 +8,10 @@
>>> #include <xen/sched.h>
>>> #include <xen/dmi.h>
>>>
>>> -unsigned int (*__read_mostly ioemul_handle_quirk)(
>>> - uint8_t opcode, char *io_emul_stub, struct cpu_user_regs *regs);
>>> +bool __ro_after_init ioemul_handle_quirk;
>>>
>>> -static unsigned int cf_check ioemul_handle_proliant_quirk(
>>> - u8 opcode, char *io_emul_stub, struct cpu_user_regs *regs)
>>> +unsigned int ioemul_handle_proliant_quirk(
>>> + uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs)
>>> {
>>> static const char stub[] = {
>>> 0x9c, /* pushf */
>> Something occurred to me.
>>
>> diff --git a/xen/arch/x86/ioport_emulate.c b/xen/arch/x86/ioport_emulate.c
>> index 23cba842b22e..70f94febe255 100644
>> --- a/xen/arch/x86/ioport_emulate.c
>> +++ b/xen/arch/x86/ioport_emulate.c
>> @@ -13,7 +13,7 @@ bool __ro_after_init ioemul_handle_quirk;
>> unsigned int ioemul_handle_proliant_quirk(
>> uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs)
>> {
>> - static const char stub[] = {
>> + const char stub[] = {
>> 0x9c, /* pushf */
>> 0xfa, /* cli */
>> 0xee, /* out %al, %dx */
>>
>> is an improvement, confirmed by bloat-o-meter:
>>
>> add/remove: 0/1 grow/shrink: 1/0 up/down: 1/-9 (-8)
>> Function old new delta
>> ioemul_handle_proliant_quirk 58 59 +1
>> stub 9 - -9
>>
>> The reason is that we've got a 9 byte object that's decomposed into two
>> rip-relative accesses. i.e. we've got more pointer than data in this case.
> I wouldn't mind this as a separate change, but I don't see how it would
> fit right here.
I'm not suggesting changing this patch. I just linked here because I
noticed it because of this patch.
We've got similar patterns elsewhere, so I was intending to do a patch
covering all of them.
>
>> But this adjustment seems to tickle a GCC bug. With that change in
>> place, GCC emits:
>>
>> <ioemul_handle_proliant_quirk>:
>> 48 83 ec 10 sub $0x10,%rsp
>> ...
>> 48 83 c4 10 add $0x10,%rsp
>> c3 retq
>>
>> i.e. we get a stack frame (space at least, no initialisation) despite
>> the object having been converted entirely to instruction immediates.
>>
>> Or in other words, there's a further 12 byte saving available when GCC
>> can be persuaded to not even emit the stack frame.
>>
>> What is even more weird is that I see this GCC-10, and a build of gcc
>> master from last week, but not when I try to reproduce in
>> https://godbolt.org/z/PnachbznW so there's probably some other setting
>> used by Xen which tickles this bug.
> Yeah, I've observed such pointless frame allocation elsewhere as well,
> so far without being able what exactly triggers it.
Ok - more experimentation required, I guess.
~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |