In case we can detect single-threaded guest processes (by checking
whether we can account for all root page table uses locally on the vCPU
that's running), there's no point in issuing a sync IPI upon an L4 entry
update, as no other vCPU of the guest will have that page table loaded.

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
This will apply cleanly only on top of all of the previously posted
follow-ups to the Meltdown band-aid, but it wouldn't be difficult to
move it ahead of some or all of them.

On my test system, this improves kernel build times only 0.5...1%, but
the effect may well be bigger on larger systems. But of course there's
no improvement expected at heavily multi-threaded guests/processes.

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3683,8 +3683,18 @@ long do_mmu_update(
                     rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
                                       cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    if ( !rc )
-                        sync_guest = !cpu_has_no_xpti;
+                    /*
+                     * No need to sync if all uses of the page can be accounted
+                     * to the lock we hold, its pinned status, and uses on this
+                     * (v)CPU.
+                     */
+                    if ( !rc && !cpu_has_no_xpti &&
+                         ((page->u.inuse.type_info & PGT_count_mask) >
+                          (1 + !!(page->u.inuse.type_info & PGT_pinned) +
+                           (pagetable_get_pfn(curr->arch.guest_table) == mfn) +
+                           (pagetable_get_pfn(curr->arch.guest_table_user) ==
+                            mfn))) )
+                        sync_guest = true;
                 case PGT_writable_page:

