[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support



I've long wanted to get stacktraces when profiling Xen, otherwise all
you'd see is e.g. the address of memcpy, but without knowing which
function called it you can't optimize it.

Once you have stacktraces, even a simple low (prime) frequency timer
based profile can show hotspots that would be optimization candidates,
aka Flamegraphs. (even if the sample doesn't
always hit within the same function and individually they'd be too small
to be noticable, it should hit in one of the parents if it is a
bottleneck).

Example flamegraph produced using these patches:
 * workload: an otherwise idle VM migrated on localhost by XAPI in a loop:
 
https://cdn.jsdelivr.net/gh/edwintorok/xen@pmustack-coverletter/docs/tmp/migrate-localhost.svg?x=473.2&y=2197&s=null
 * workload: VM migrated between 2 hosts by XAPI (NFS storage):
 
https://cdn.jsdelivr.net/gh/edwintorok/xen@pmustack-coverletter/docs/tmp/migrate-send.svg?x=950.6&y=2197
 
https://cdn.jsdelivr.net/gh/edwintorok/xen@pmustack-coverletter/docs/tmp/migrate-receive.svg?x=906.6&y=869

There might be other approaches that could be tried in the future, e.g. Last 
Branch Record, but:
 * although both Intel and AMD support it, AFAIK Xen doesn't support it on AMD
 yet
 * there is a hardware limit to how deep it can be (~32?)
 * LBR may need some additional configuration to enable it to trace the
  hypervisor
 * Intel PMU is completely broken on the system I tried it on, so I
  would've had to first fix that

This is some very early experimental work, thought I'd share it to get
feedback on:
 * the desired ABI additions in pmu.h and arch-x86/pmu.h
 * any bugs you may spot
 * if anyone wants to port the python symbol lookup to perf itself
 (actually latest perf ships a flamegraph.py too)

It also starts to become useful enough to spot performance hotspots in
Xen, e.g. the rwlock.c scaling issue with large CONFIG_NR_CPUS, or
unexpected page faults in 'unmap_page_range' (spotted by Andrew).

It builds on top of:
 * the existing VPMU support, documented by Boris Ostrovsky in this thread: 
https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg03244.html
 * a python script by Andriy to post-process the perf output

Steps to enable:
 1. ensure that you've got a build of Xen with CONFIG_FRAME_POINTER=y.
 Debug builds would have it, but for performance testing creating a
 release build with frame pointers enabled is recommended.

 2. Apply both the Linux and Xen patches.
    I tested on top of ~6.6.22, and Xen 4.21+ 
(5c798ac8854af3528a78ca5a622c9ea68399809b)

 3. ensure that VPMU is enabled in Xen, e.g. a GRUB line like:
 ```
 multiboot2 /boot/xen.efi dom0_mem=4288M,max:4288M crashkernel=256M,below=4G 
console=vga vga=mode-0x0311 watchdog=0 vpmu=on dom0_vcpus_pin
 ```
 On a XenServer system that can be achieved by:

 ```
 /opt/xensource/libexec/xen-cmdline --set-xen watchdog=0
 /opt/xensource/libexec/xen-cmdline --set-xen vpmu=on
 /opt/xensource/libexec/xen-cmdline --delete-xen dom0_max_vcpus=1-16
 /opt/xensource/libexec/xen-cmdline --set-xen dom0_vcpus_pin
 reboot
 ```

 4. On everyboot: enable desired vPMU features:
 ```
 echo 9 >/sys/hypervisor/pmu/pmu_features
 echo all >/sys/hypervisor/pmu/pmu_mode
 ```

 5. Record a trace, e.g. a simple timer based stacktrace, useful for
 initial investigation with a flamegraph:
 ```
 perf kvm --host --guest record -a -F 97 -g
 ```

 Or if you also want to trace userspace:
 ```
 perf kvm --host --guest record -a -F 97 --call-graph=dwarf
 ```

 6. Look at the report: perf kvm --host --guest report.
  This will contain hex addresses for now, but a script can be used to
  resolve them.

 7. Use the provided python script, and look at symbolized output

Caveats:
 * x86-only for now
 * only tested on AMD EPYC 8124P
 * Xen PMU support was broken to begin with on Xeon Silver 4514Y, so I
 wasn't able to test there ('perf top' fails to parse samples). I'll
 try to figure out what is wrong there separately
 * for now I edit the release config in xen.spec to enable frame
 pointers. Eventually it might be useful to have a 3rd build variant:
 release-fp. Or teach Xen to produce/parse ORC or SFrame formats without
 requiring frame pointers.
 * perf produces raw hex addresses, and a python script is used to
 post-process it and obtain symbols. Eventually perf should be updated
 to do this processing itself (there was an old patch for Linux 3.12 by 
Borislav Petkov)
 * I've only tested capturing Dom0 stack traces. Linux doesn't support
  guest stacktraces yet (it can only lookup the guest RIP)
 * the Linux patch will need to be forwarded ported to master before submission
 * All the caveats for using regular VPMU apply, except for the lack of
  stacktraces, that is fixed here!
    * Dom0 must run hard pinned on all host CPUs
    * Watchdog must be disabled
    * not security supported
    * x86 only
    * secureboot needs to be disabled


Edwin Török (10):
  pmu.h: add a BUILD_BUG_ON to ensure it fits within one page
  arch-x86/pmu.h: document current memory layout for VPMU
  arch-x86/pmu.h: convert ascii art drawing to Unicode
  vpmu.c: factor out register conversion
  pmu.h: introduce a stacktrace area
  arch-x86/pmu.h: convert ascii art diagram to Unicode
  arch-x86/vpmu.c: store guest registers when domain_id == DOMID_XEN
  pmu.h: expose a hypervisor stacktrace feature
  vpmu.c hypervisor stacktrace support in vPMU
  xen/tools/pyperf.py: example script to parse perf output

 xen/arch/x86/cpu/vpmu.c           | 130 ++++++++++++++++++++------
 xen/arch/x86/cpu/vpmu_amd.c       |   2 +-
 xen/arch/x86/cpu/vpmu_intel.c     |   2 +-
 xen/arch/x86/include/asm/vpmu.h   |   1 +
 xen/include/public/arch-arm.h     |   1 +
 xen/include/public/arch-ppc.h     |   1 +
 xen/include/public/arch-riscv.h   |   1 +
 xen/include/public/arch-x86/pmu.h | 101 ++++++++++++++++++++-
 xen/include/public/pmu.h          |  41 ++++++++-
 xen/tools/pyperf.py               | 146 ++++++++++++++++++++++++++++++
 10 files changed, 395 insertions(+), 31 deletions(-)
 create mode 100644 xen/tools/pyperf.py

-- 
2.47.1




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.