[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] x86/mm: Suppresses vm_events caused by page-walks


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Wed, 19 Sep 2018 14:41:57 +0100
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, xen-devel@xxxxxxxxxxxxx, aisaila@xxxxxxxxxxxxxxx
  • Delivery-date: Wed, 19 Sep 2018 13:42:12 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 19/09/18 09:53, Jan Beulich wrote:
>>>> On 18.09.18 at 20:20, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 18/09/18 11:17, Jan Beulich wrote:
>>>>>> On 18.09.18 at 11:47, <aisaila@xxxxxxxxxxxxxxx> wrote:
>>>> On Thu, 2018-09-13 at 08:17 -0600, Jan Beulich wrote:
>>>>>>>> On 12.09.18 at 11:47, <aisaila@xxxxxxxxxxxxxxx> wrote:
>>>>>> The original version of the patch emulated the current instruction
>>>>>> (which, as a side-effect, emulated the page-walk as well), however
>>>>>> we
>>>>>> need finer-grained control. We want to emulate the page-walk, but
>>>>>> still
>>>>>> get an EPT violation event if the current instruction would trigger
>>>>>> one.
>>>>>> This patch performs just the page-walk emulation.
>>>>> Rather than making this basically a revision log, could you please
>>>>> focus
>>>>> on what you actually want to achieve?
>>>>>
>>>>> As to the title: "Suppress ..." please.
>>>>>
>>>>>> @@ -149,6 +151,10 @@ guest_walk_tables(struct vcpu *v, struct
>>>>>> p2m_domain *p2m,
>>>>>>      ar_and &= gflags;
>>>>>>      ar_or  |= gflags;
>>>>>>  
>>>>>> +    if ( set_ad && set_ad_bits(&l4p[guest_l4_table_offset(va)].l4,
>>>>>> +                               &gw->l4e.l4, false) )
>>>>>> +        accessed = true;
>>>>> It is in particular this seemingly odd (and redundant with what's
>>>>> done
>>>>> later in the function) which needs thorough explanation.
>>>> On this patch I've followed Andrew Cooper's suggestion on how to set
>>>> A/D Bits:
>>>>
>>>> "While walking down the levels, set any missing A bits and remember if we
>>>> set any.  If we set A bits, consider ourselves complete and exit back to
>>>> the guest.  If no A bits were set, and the access was a write (which we
>>>> know from the EPT violation information), then set the leaf D bit."
>>>>
>>>> If I misunderstood the comment please clarify.
>>> It doesn't look to me as if you misunderstood anything, but only Andrew
>>> can say for sure. However, none of this was in the description of your
>>> patch (neither as part of the description, nor as code comment), and I
>>> think you'd even have to greatly extend on this in order to explain to
>>> everyone why the resulting behavior is still architecturally correct. In no
>>> case should you assume anyone reading your patch (now or in the
>>> future) has participated in the earlier discussion.
>> The problem we have is that, while we know the EPT Violation was for a
>> write of an A or D bit to a write-protected guest pagetable, we don't
>> know if it was the A or the D bit which was attempting to be set.
>>
>> Furthermore (without emulating the instruction, which is what we are
>> trying to avoid), we can't reconstruct the access.
>>
>> Access bits are only written if they were missing before, but may be set
>> speculatively.  Dirty bits are only set when a write is retired.  From a
>> practical point of view, the pipeline sets A and D bits as separate actions.
>>
>> Following this logic (and assuming for now a single vcpu), if we get a
>> GPT EPT Violation, and there are missing access bits on the walk, then
>> the fault is definitely from setting an access bit.
> Definitely?

Yes

>  Is there anything guaranteeing architecturally that an access
> bit related EPT violation would be delivered earlier than any other one
> on that same or a lower page table level?

No, but why does that matter?

Architecturally defined or not, we know that the action the processor
was trying to perform was to set an A/D bit, because we got a vmexit
telling us so.

>  It doesn't matter much for
> the implementation (because of it being permissible to set the A bits
> speculatively, as you also say further down, and any other violation
> then re-occurring after exiting back to the guest once the A bits are
> all set), but since we're discussing here what exactly the patch
> description should contain, I think I'd prefer this to be fully correct there.
>
> Or wait - I think I can agree with "definitely", provided you further
> restrict the context: "..., if we get a GPT EPT Write Violation ...". But
> from what I can tell the patch'es change to p2m_mem_access_check()
> doesn't apply (or pass on) any of these qualifications at all.

I've not looked at the patch in detail yet.  I'm tempted to suggest
rearranging guest_walk_tables() to just set the access bits on the
decent, rather than at the end.  This matches how some hardware behaves
when pulling entries into the paging structure cache.

>
>>  Set all access bits
>> and call it done.  If we get a GPT EPT Violation and all access bits
>> were set, then it was definitely from setting the Dirty bit.
>>
>> For multi-vcpu scenarios, things get racy.  Setting all the Access bits
>> is safe because its a speculative action, but a speculatively load on
>> one vcpu can race with a write (to a read-only mapping) on the other
>> vcpu, and would trick this algorithm into setting the dirty bit when the
>> write would have faulted (and not set the dirty bit).
>>
>> Do we have numbers on how many the GPT EPT Violations are for (only)
>> access sets, and how many are for dirty tsets?  Would the first half of
>> the algorithm (which is definitely not racy) still be a net perf win?
> Does Windows make use of A bits at all? I'd expect most OSes to
> simply set them right away, and actively use of the D bits.

What gives you the expectation that OSes wouldn't use A bits?

For paging out, the best options are non-accessed non-dirty page because
their contents can be discarded immediately and reread from disk at a
later point.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.