# Tracing QEMU-KVM Interactions But default in linux you can figure out how many times and for what reasons there is a VM Exit from a VM into the kvm kernel module. However given the ubiquity of vhost and the ability of kvm to emulate most device models directly in the kernel, most of those VM exits do not result in a transition from host kernel into the QEMU. The transitions from VM -> kvm -> QEMU are typically the most expensive. Here we try to figure out how many of the VM Exits result in the invocation of QEMU. ## Tracking VM-KVM Interactions This can be done very simply with perf ``` sudo ./perf stat -e 'kvm:*' -a sleep 1s ``` This will give you statistics collected over 1s for all VM exits (across VMs) You can also do this directly and get the full trace logs via ``` echo 1 > /sys/kernel/debug/tracing/events/kvm/enable cat /sys/kernel/debug/tracing/trace_pipe ``` ## Background All *primary* exits from kvm to QEMU are via the return path of KVM_RUN ioctl. Fortunately for us QEMU already has built in instrumentation for tracing. We will leverage that to selectively trace the return path of KVM_RUN and record the exit reasons There is a second class of KVM-QEMU interaction that go through the ioeventfs,irqfd mechanism which we will instrument later. ### Code that triggers tracing of the (explicit) kvm to qemu exit https://github.com/qemu/qemu/blob/7e56accdaf35234b69c33c85e4a44a5d56325e53/accel/kvm/kvm-all.c#L2041 ### QEMU Tracing Documentation http://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/devel/tracing.txt;hb=HEAD https://fossies.org/linux/qemu/docs/tracing.txt # ftrace backend Enabling this backend allows you to trace both the VM-KVM events as well the KVM-QEMU events. ### Build QEMU with tracking with ftrace backend ``` ./configure --disable-bluez --disable-brlapi --disable-bzip2 --disable-curl --disable-curses --disable-debug-tcg --disable-fdt --disable-glusterfs --disable-gtk --disable-libiscsi --disable-libnfs --disable-libssh2 --disable-libusb --disable-linux-aio --disable-lzo --disable-opengl --disable-qom-cast-debug --disable-rbd --disable-rdma --disable-sdl --disable-seccomp --disable-slirp --disable-snappy --disable-spice --disable-strip --disable-tcg-interpreter --disable-tcmalloc --disable-tools --disable-tpm --disable-usb-redir --disable-uuid --disable-vnc --disable-vnc-jpeg --disable-vnc-png --disable-vnc-sasl --disable-vte --disable-xen --enable-attr --enable-cap-ng --enable-kvm --enable-virtfs --target-list=x86_64-softmmu --extra-cflags="-fno-semantic-interposition -O3 -falign-functions=32" --datadir=/usr/share/qemu-lite --libdir=/usr/lib64/qemu-lite --libexecdir=/usr/libexec/qemu-lite --enable-vhost-net --disable-docs --enable-trace-backends=ftrace ``` ### Launch QEMU with tracing enabled Enable the tracking of just the kvm_run exit. ``` rm -f /tmp/event echo kvm_run_exit > /tmp/events sudo ../x86_64-softmmu/qemu-system-x86_64 -trace events=/tmp/events -machine pc-lite,accel=kvm,kernel_irqchip,nvdimm -cpu host -m 256,maxmem=1G,slots=2 -smp 2 -no-user-config -nodefaults -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -kernel ./vmlinux-4.9.34-63.1.container -append "reboot=k panic=1 rw tsc=reliable no_timer_check noreplace-smp root=/dev/pmem0p1 init=/usr/lib/systemd/systemd initcall_debug rootfstype=ext4 rootflags=dax,data=ordered dhcp rcupdate.rcu_expedited=1 clocksource=kvm-clock console=hvc0 single iommu=false " -device virtio-serial-pci,id=virtio-serial0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -nographic -object memory-backend-file,id=mem0,share,mem-path=./clear-16160-containers.img,size=235929600 -device nvdimm,memdev=mem0,id=nv0 -no-reboot -device virtio-net-pci,netdev=net0,mac=DE:AD:BE:EF:07:83 -netdev tap,id=net0,ifname=net0,script=no,downscript=no,queues=2,vhost=off -qmp tcp:localhost:4444,server,nowait ``` ### Tracing I/O interactions The kvm_run exit only records the explicit control exits. However for handing I/O QEMU-KVM irqfd and ioeventfd interaction mechanism. Unfortunately there does not exit a direct way to tap into the ioeventfd. However we can trace the actuall callback functions ultimately invoked. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d34e6b175e61821026893ec5298cc8e7558df43a http://blog.allenx.org/2015/07/05/kvm-irqfd-and-ioeventfd ## Obtained the combined trace logs If you had enabled kvm tracing before, you will see a mix of KVM exits as well as KVM to QEMU transistion traces. ``` cat /sys/kernel/debug/tracing/trace_pipe ``` ## Simple Backend The following example shows how to trace using the simple backend. This is another method that is simpler and directly creates trace files on disk. ### Build QEMU with tracing ``` ./configure --disable-bluez --disable-brlapi --disable-bzip2 --disable-curl --disable-curses --disable-debug-tcg --disable-fdt --disable-glusterfs --disable-gtk --disable-libiscsi --disable-libnfs --disable-libssh2 --disable-libusb --disable-linux-aio --disable-lzo --disable-opengl --disable-qom-cast-debug --disable-rbd --disable-rdma --disable-sdl --disable-seccomp --disable-slirp --disable-snappy --disable-spice --disable-strip --disable-tcg-interpreter --disable-tcmalloc --disable-tools --disable-tpm --disable-usb-redir --disable-uuid --disable-vnc --disable-vnc-jpeg --disable-vnc-png --disable-vnc-sasl --disable-vte --disable-xen --enable-attr --enable-cap-ng --enable-kvm --enable-virtfs --target-list=x86_64-softmmu --extra-cflags="-fno-semantic-interposition -O3 -falign-functions=32" --datadir=/usr/share/qemu-lite --libdir=/usr/lib64/qemu-lite --libexecdir=/usr/libexec/qemu-lite --enable-vhost-net --disable-docs --enable-trace-backends=simple ``` ### Launch QEMU with tracing enabled ``` rm -f /tmp/event echo kvm_run_exit > /tmp/events sudo ../x86_64-softmmu/qemu-system-x86_64 -trace events=/tmp/events -machine pc-lite,accel=kvm,kernel_irqchip,nvdimm -cpu host -m 256,maxmem=1G,slots=2 -smp 2 -no-user-config -nodefaults -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -kernel ./vmlinux-4.9.34-63.1.container -append "reboot=k panic=1 rw tsc=reliable no_timer_check noreplace-smp root=/dev/pmem0p1 init=/usr/lib/systemd/systemd initcall_debug rootfstype=ext4 rootflags=dax,data=ordered dhcp rcupdate.rcu_expedited=1 clocksource=kvm-clock console=hvc0 single iommu=false " -device virtio-serial-pci,id=virtio-serial0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -nographic -object memory-backend-file,id=mem0,share,mem-path=./clear-16160-containers.img,size=235929600 -device nvdimm,memdev=mem0,id=nv0 -no-reboot -device virtio-net-pci,netdev=net0,mac=DE:AD:BE:EF:07:83 -netdev tap,id=net0,ifname=net0,script=no,downscript=no,queues=2,vhost=off -qmp tcp:localhost:4444,server,nowait ``` ### Analyze the Trace Events ``` simpletrace.py trace-events-all trace-6440 &> /tmp/xxx ``` Note: - The trace-events-all file will be generated at the time of qemu compilation - The trace-xyz file will be generated at qemu execution ## KVM-QEMU Exit Reasons This will help you co-relate exit reasons to the trace logs. Teh most common ones will be for IO (2), as these are used for device modeling. https://github.com/qemu/qemu/blob/0748b3526e8cb78b9cd64208426bfc3d54a72b04/linux-headers/linux/kvm.h#L182 ``` #define KVM_EXIT_UNKNOWN 0 #define KVM_EXIT_EXCEPTION 1 #define KVM_EXIT_IO 2 #define KVM_EXIT_HYPERCALL 3 #define KVM_EXIT_DEBUG 4 #define KVM_EXIT_HLT 5 #define KVM_EXIT_MMIO 6 #define KVM_EXIT_IRQ_WINDOW_OPEN 7 #define KVM_EXIT_SHUTDOWN 8 #define KVM_EXIT_FAIL_ENTRY 9 #define KVM_EXIT_INTR 10 #define KVM_EXIT_SET_TPR 11 #define KVM_EXIT_TPR_ACCESS 12 #define KVM_EXIT_S390_SIEIC 13 #define KVM_EXIT_S390_RESET 14 #define KVM_EXIT_DCR 15 /* deprecated */ #define KVM_EXIT_NMI 16 #define KVM_EXIT_INTERNAL_ERROR 17 #define KVM_EXIT_OSI 18 #define KVM_EXIT_PAPR_HCALL 19 #define KVM_EXIT_S390_UCONTROL 20 #define KVM_EXIT_WATCHDOG 21 #define KVM_EXIT_S390_TSCH 22 #define KVM_EXIT_EPR 23 #define KVM_EXIT_SYSTEM_EVENT 24 #define KVM_EXIT_S390_STSI 25 #define KVM_EXIT_IOAPIC_EOI 26 #define KVM_EXIT_HYPERV 27 ``` # Tracing all events All events supported by QEMU can be found in the trace-events-all file. You can extract them all (or a subset) and feed them via a script such as the one below ``` cat trace-events-all | grep "(" | grep -v "#" | cut -f 1 -d "(" > /tmp/events ```