Improving the performance of a Windows Guest on KVM/QEMU

Table of content
1.
Preamble & Prerequisites
1.1. Prerequisites
1.2. Reference Setup
1.3. Kernel Versions
2.
Guest Optimizations
2.1. CPU mode & Topology
2.2. CPU Pinning
2.3. Enlightenments
2.4. Clock Optimizations
3.
Host Optimizations
3.1. CPU Isolation
4.
I/O Optimization
4.1 Virtio drivers
4.2 Disable caching for RAW disks
4.3 Enabling TRIM
5.
Other Improvements
5.1 Improving the boot time of your machine
5.2 Enabling hugepages
6. Notes
6.1 Conclusion
6.2 Sources
Last Update: 2022–09–26

1 —Preamble & Prerequisites

Without tuning your VM, you may experience stuttering, high CPU usage and slow interrupts and I/O but it may still be usable. That said, if you want to get closer to bare metal speeds, you will need to perform several interventions on your guest and host alike.

1.1 — Prerequisites

Mandatory
- Working Windows 10 or Windows 11 Guest
- CPU supporting VT-D or AMD-Vi
- Kernel: 5.8 +
- QEMU 6+
- Libvirt 7+
- KVM
Optional
- Working GPU Passthrough
- Kernel 5.15+ (Best performances)
- Kernel 5+ Compiled with voluntary Preemption

1.2 — Reference Setup

The reference setup is describe in details with instruction on how to reproduce it in this article.

Hardware
CPU: i9–9880H
GPU: Quadro T1000 Mobile
IGPU: Intel UHD 630
Software
Distribution: Manjaro XFCE
Libvirt: 7.9.0
QEMU: 6.1.0
Kernel: 5.15.2
Kernel Parameters:
- preempt=voluntary
- intel_iommu=on
- iommu.passthrough=1
- vfio-pci.ids=10de:1b80
Virtualization Setup
Disk: RAW & NVME PCI Passthrough
Hypervisor: KVM
Chipset:Q35
Firmware: UEFI x86_64 | OVMF
CPU: Host passthrough with manual, no emulation
IGPU: GVT-G Passthrough
GPU: PCIe Passthrough
Network: virtio NAT, Linked
Input Method: Spice
Display: Looking Glass B4 & Spice W/O Graphics
Video: None
Controllers: USB, PCIe, PCI, VirtIO

1.3 — Kernel Versions

As of December 2021, I found that the kernel offering the best performance and easiest setup is version 5.15.2. Note that kernels version 5.11+ have issues with GVT-G/GVT-D and will tend to freeze up.

2 — Guest CPU Optimization

2.1— CPU mode, topology

The CPU mode you pick will be one of the biggest factor in CPU related performance. If you disable all CPU emulation and pass the CPU as-is to the VM using the “host-passthrough” mode then your performance will be as close to bare metal as can be for CPU bound tasks.

<domain type='kvm' ...>
...
<cpu mode='host-passthrough' check='none'>
<topology sockets='1' dies='1' cores='7' threads='2'/>
</cpu>
...
</domain>

2.2— CPU Pinning

The easiest and most significant way to improve performance so far is to use CPU pinning in conjunction with a host-passthrough CPU mode. The process involves setting some cores to be dedicated to virtualization processes such as I/O, some for your host and others to be dedicated to your virtual machine. It is also possible to isolate the cores so that your host does not use them but I didn’t find that it was necessary so long as I didn’t run anything intensive on the host while my VMs are running.

Physical core #1
- Virtual core 1 = 0
- Virutal core 2 = 8
Physical core #2
- Virtual core 1 = 1
- Virutal core 2 = 9
...
Physical core #8
- Virtual core 1 = 7
- Virtual core 2 = 15
<domain type='kvm' ...>
...
<vcpu placement='static'>14</vcpu>
<iothreads>1</iothreads>
<cputune>
<vcpupin vcpu='0' cpuset='1'/>
<vcpupin vcpu='1' cpuset='2'/>
<vcpupin vcpu='2' cpuset='3'/>
<vcpupin vcpu='3' cpuset='4'/>
<vcpupin vcpu='4' cpuset='5'/>
<vcpupin vcpu='5' cpuset='6'/>
<vcpupin vcpu='6' cpuset='7'/>
<vcpupin vcpu='7' cpuset='9'/>
<vcpupin vcpu='8' cpuset='10'/>
<vcpupin vcpu='9' cpuset='11'/>
<vcpupin vcpu='10' cpuset='12'/>
<vcpupin vcpu='11' cpuset='13'/>
<vcpupin vcpu='12' cpuset='14'/>
<vcpupin vcpu='13' cpuset='15'/>
<emulatorpin cpuset='0,8'/>
<iothreadpin iothread='1' cpuset='0,8'/>
</cputune>
...
</domain>

2.3— Enlightenments

One way to dramatically improve your Windows guest performance is to enable enlightenments. Now that code 43 is no longer an issue, we can tell Windows that its running in a virtual machine and thus allow it to run code optimized for virtualized contexts.

<domain type='kvm' ...>
...
<features>
<acpi/>
<apic/>
<pae/>
<hyperv>
...
<relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
<vpindex state='on'/>
<synic state='on'/>
<stimer state='on'/>
<reset state='on'/>
...
</hyperv>
</features>
<clock ...>
...
<timer name='hpet' present='no'/>
<timer name='hypervclock' present='yes'/>
...
</clock>
...
</domain>
<clock offset='localtime'>
...
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
...
</clock>

3 — Host Optimizations

3.1— CPU Isolation

If you are experiencing micro stutters or missed frame on higher resolutions (4k or 4k ultrawide) it can be beneficial to isolate the cores responsible for the virtualization I/O and emulator processes from the host and guest processes.

# Create the hook scripts directory if it doesn't exist
sudo mkdir -p /etc/libvirt/hooks/qemu.d/
# Download the script file (Github link)
curl -L -o isolate-cpus.sh https://bit.ly/3Gai56Q
# Set the script permissions
chmod +rx isolate-cpus.sh && sudo chown root:root isolate-cpus.sh
# Move it to the hooks folder
sudo mv isolate-cpus.sh /etc/libvirt/hooks/qemu.d/
# Configure your reserved cores and physical cpu topology
sudo nano /etc/libvirt/hooks/qemu.d/isolate-cpus.sh
# See the comments in the script file for more instructions
--------------------------------------------------------------------
reserved=YOUR_CORES_RESERVED_FOR_HOST
cores=YOUR_CORE_RANGE_FOR_PHYSICAL_CPU
--------------------------------------------------------------------

4 — I/O Optimizations

4.1— Using Virtio drivers for disks & network devices

Using virtio drivers is a must to improve the performance of your machine. When used in combination with CPU pinning and IO threads, using the proper drivers may improve the performance of your machine overall. So far we are using an emulated network adapter and an emulated SATA disk which is not optimal for performance as emulating physical devices is much more demanding that running drivers that are optimized for virtualization.

bcdedit /set {default} safeboot network
bcdedit /deletevalue {default} safeboot

4.2 — Disable caching for RAW disks

By going through the libvirt interface and disabling caching for RAW disk you can gain some performance.

4.3— Enabling TRIM support for you VirtIo RAW disks

To enable TRIM support select your VirtIO disk in libvirt-manager and open the Advanced Options drawer and then the Performance Options and set “Discard Mode” to “unmap”.

5 — Other Improvements

5.1— Improving the boot time of your machine

Using QEMU 5 or 6 and kernel 5.6+ (Still true as of 5.10), it takes exponentially longer to boot your machine the more RAM you add. With 24 Gb passed to one of my guest I experienced wait times of 2 to 3 minutes which are unacceptable. This is due to most linux distribution having preemption enabled for all processes and in the case of qemu, it will make ram allocation extremely slow. This can be fixed by setting preemption to optional, qemu will disable then preemption when appropriate thus speeding your VMs boot time. With 24Gb I achieve cold boot times of 10 to 15 seconds.

CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT=y
# Edit your grub file
$ sudo nano /etc/default/grub
# Locate the line containingGRUB_CMDLINE_LINUX_DEFAULT="..."# Add the following value like such
GRUB_CMDLINE_LINUX_DEFAULT="... preempt=voluntary ..."
# Save the file, quit nano and update grub
$ sudo update-grub
# Reboot
$ sudo reboot now

5.2— Enabling hugepages

https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#Huge_memory_pages

6 — Notes

6.1 — Conclusion

As you have seen there are many things that can be done to improve your guest performance. I personally stopped there because I was satisfied with my performance but there are still more steps that could be taken. Notably the use of hugepages has helped many users but its usefulness in the context of GPU pass-through is questionable and the use of core isolation which prevents the hosts from running tasks on core passed to your VM.

6.2 — Sources

https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Leduccc

Leduccc

Adept software developer with a passion for tackling complexity and virtualization enthusiast.