[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: -mno-tls-direct-seg-refs support in glibc for i386 PV Xen



On 27.05.2020 15:39, Andrew Cooper wrote:
> On 27/05/2020 14:03, Florian Weimer wrote:
>> I'm about to remove nosegneg support from upstream glibc, special builds
>> that use -mno-tls-direct-seg-refs, and the ability load different
>> libraries built in this mode automatically, when the Linux kernel tells
>> us to do that.  I think the intended effect is that these special builds
>> do not use operands of the form %gs:(%eax) when %eax has the MSB set
>> because that had a performance hit with paravirtualization on 32-bit
>> x86.  Instead, the thread pointer is first loaded from %gs:0, and the
>> actual access does not use a segment prefix.
>>
>> Before doing that, I'd like to ask if anybody is still using this
>> feature?
>>
>> I know that we've been carrying nosegneg libraries for many years, in
>> some cases even after we stopped shipping 32-bit kernels. 8-/ The
>> feature has always been rather poorly documented, and the way the
>> dynamic loader selects those nosegneg library variants is still very
>> bizarre.
> 
> I wasn't even aware of this feature, or that there was a problem wanting
> fixing.
> 
> That said, I have found:
> 
> # 32-bit x86 does not perform well with -ve segment accesses on Xen.
> CFLAGS-$(CONFIG_X86_32) += $(call cc-option,$(CC),-mno-tls-direct-seg-refs)
> 
> in one of our makefiles.
> 
> Why does the MSB make any difference?  %gs still needs to remain intact
> so the thread pointer can be pulled out, so there is nothing that Xen or
> Linux can do in the way of lazy loading.
> 
> Beyond that, its straight up segment base semantics in x86.  There will
> be a 1-cycle AGU delay from a non-zero base, but that nothing to do with
> Xen and applies to all segment based TLS accesses on x86, and you'll win
> that back easily through reduced register pressure.
> 
> Are there any further details on the perf problem claim?  I find it
> suspicious.

To guard the hypervisor area, 32-bit Xen reduced the limits of guest
usable segment descriptors. While this works fine for flat ones (you
just chop off some space at the top), there's no way to represent a
full segment with a non-zero base. You can have the descriptor map
only the [base,XenBase] part or the [0,base) one. Hence Xen, from its
#GP handler, flipped the descriptor between the two options depending
on whether the current access was to the positive of negative part of
the TLS seg. (An in-practice use of expand down segments, as you'll
surely notice.)

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.