[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 08/11] x86/shadow: reduce effort of hash calculation
On 06.01.2023 03:03, Andrew Cooper wrote: > On 05/01/2023 4:05 pm, Jan Beulich wrote: >> The "n" input is a GFN value and hence bounded by the physical address >> bits in use on a system. > > The one case where this isn't obviously true is in sh_audit(). It comes > from a real MFN in the system, not a GFN, which will have the same > property WRT PADDR_BITS. I'm afraid I was more wrong with that than just for the audit case. Only FL1 shadows use GFNs. All other shadows us MFNs. I'll update the sentence. >> The hash quality won't improve by also >> including the upper always-zero bits in the calculation. To keep things >> as compile-time-constant as they were before, use PADDR_BITS (not >> paddr_bits) for loop bounding. This reduces loop iterations from 8 to 5. > > While this is all true, you'll get a much better improvement by not > forcing 'n' onto the stack just to access it bytewise. Right now, the > loop looks like: > > <shadow_hash_insert>: > 48 83 ec 10 sub $0x10,%rsp > 49 89 c9 mov %rcx,%r9 > 41 89 d0 mov %edx,%r8d > 48 8d 44 24 08 lea 0x8(%rsp),%rax > 48 8d 4c 24 10 lea 0x10(%rsp),%rcx > 48 89 74 24 08 mov %rsi,0x8(%rsp) > 0f 1f 80 00 00 00 00 nopl 0x0(%rax) > /-> 0f b6 10 movzbl (%rax),%edx > | 48 83 c0 01 add $0x1,%rax > | 45 69 c0 3f 00 01 00 imul $0x1003f,%r8d,%r8d > | 41 01 d0 add %edx,%r8d > | 48 39 c1 cmp %rax,%rcx > \-- 75 ea jne ffff82d0402efda0 > <shadow_hash_insert+0x20> > > > which doesn't even have a compile-time constant loop bound. It's > runtime calculated by the second lea constructing the upper pointer bound. > > Given this further delta: > > diff --git a/xen/arch/x86/mm/shadow/common.c > b/xen/arch/x86/mm/shadow/common.c > index 4a8bcec10fe8..902c749f2724 100644 > --- a/xen/arch/x86/mm/shadow/common.c > +++ b/xen/arch/x86/mm/shadow/common.c > @@ -1397,13 +1397,12 @@ static unsigned int shadow_get_allocation(struct > domain *d) > typedef u32 key_t; > static inline key_t sh_hash(unsigned long n, unsigned int t) > { > - unsigned char *p = (unsigned char *)&n; > key_t k = t; > int i; > > BUILD_BUG_ON(PADDR_BITS > BITS_PER_LONG + PAGE_SHIFT); > - for ( i = 0; i < (PADDR_BITS - PAGE_SHIFT + 7) / 8; i++ ) > - k = p[i] + (k << 6) + (k << 16) - k; > + for ( i = 0; i < (PADDR_BITS - PAGE_SHIFT + 7) / 8; i++, n >>= 8 ) > + k = (uint8_t)n + (k << 6) + (k << 16) - k; > > return k % SHADOW_HASH_BUCKETS; > } > > the code gen becomes: > > <shadow_hash_insert>: > 41 89 d0 mov %edx,%r8d > 49 89 c9 mov %rcx,%r9 > b8 05 00 00 00 mov $0x5,%eax > /-> 45 69 c0 3f 00 01 00 imul $0x1003f,%r8d,%r8d > | 40 0f b6 d6 movzbl %sil,%edx > | 48 c1 ee 08 shr $0x8,%rsi > | 41 01 d0 add %edx,%r8d > | 83 e8 01 sub $0x1,%eax > \-- 75 e9 jne ffff82d0402efd8b > <shadow_hash_insert+0xb> > > with an actual constant loop bound, and not a memory operand in sight. > This form (even at 8 iterations) will easily execute faster than the > stack-spilled form. Oh, yes, good idea. Will adjust. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |