Splitting GPUs with NVIDIA vGPU on Proxmox: Resource Optimization for Homelab and AI – ITFROMZERO

Table of Contents

The Biggest Hurdle of Traditional GPU Passthrough

If you’re running Proxmox, you’re likely familiar with PCI Passthrough. It’s the fastest way to bring hardware power into a virtual machine (VM). However, after six months of running a lab with over 10 VMs for everything from Home Assistant to Stable Diffusion, I realized a paradox: the 1:1 ratio. If you own an RTX 3060 12GB and only want to use it to accelerate the UI for one Windows VM, you’re wasting over 10GB of the remaining VRAM.

It’s like buying a truck just to deliver a letter. Other VMs can’t touch this graphics power unless you add a new card. That’s where NVIDIA vGPU (Virtual GPU) comes in to solve this waste. This technology allows you to split a physical card into multiple independent “slices,” where each VM receives a portion of VRAM based on actual needs.

Comparing GPU Sharing Methods

Features	PCI Passthrough	API Intercept (rGPU)	NVIDIA vGPU
Resource Allocation	Entire card for 1 VM	Shared	Hard Partitioning
Real-world Performance	~100% (Native)	Significantly reduced due to latency	Maintains ~95-98%
Stability	Excellent	Frequent crashes	Enterprise-grade
Supported Card Types	All card types	Software-limited	Quadro/Tesla (or RTX unlock)

Among these three methods, vGPU offers the best balance. It’s as stable as Passthrough but as flexible as the way we virtualize CPU cores.

Why vGPU is the Top Choice for AI/ML?

GPU demand in AI projects is often bursty. When you’re writing code or preparing data, the GPU is mostly idle. It only really heats up when you hit the “Training” command. With vGPU, you can allocate 4GB of VRAM to each of 4 developers from a single 16GB card. Instead of spending 40 million VND on four workstations, you only need to invest in one sufficiently powerful Proxmox server.

Although NVIDIA’s documentation says vGPU is only for the Enterprise line, the vgpu-unlock-rs tool has changed the game. It helps consumer RTX cards perform no differently than specialized Tesla lines.

Real-world Deployment on Proxmox 8

Don’t forget to enable IOMMU in your BIOS before starting. Without this step, all subsequent configurations will be useless.

Step 1: Install the Host Driver

We need the specific NVIDIA driver for the vGPU Manager. Do not use the default Linux apt install nvidia-driver command.

# Update the environment
apt update && apt upgrade -y

# Install dependencies and kernel headers
apt install -y build-essential dkms pve-headers-$(uname -r)

# Execute the vGPU driver installer
chmod +x NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm.run
./NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm.run

Step 2: Unlocking Consumer Card Potential

If you own an RTX 3000 or 4000 series card, this step is mandatory. The unlock tool tricks the system into thinking this is a virtualization-supported card.

# Download the unlock tool from GitHub
git clone https://github.com/mbilker/vgpu-unlock-rs.git
cd vgpu-unlock-rs

# Compile using Rust
cargo build --release

# Move the library into the system
cp target/release/libvgpu_unlock_rs.so /usr/lib/

Next, declare your card’s parameters at /etc/vgpu_unlock/config.toml so the driver can identify it correctly.

Step 3: Check Mdev Profiles

Restart the machine and run the following command to see the available GPU “slices”:

mdevctl types | grep nvidia

The results will show profiles like nvidia-233 (equivalent to 1GB of VRAM). Note this code down to assign it to your virtual machine.

Step 4: Allocate to Virtual Machines

Go to the Proxmox GUI, select VM -> Hardware -> Add -> PCI Device. In the MDev Type field, choose the appropriate capacity. For example: allocate 4GB for image processing tasks or 8GB to run LLM models like Llama-3-8B.

Client-side Setup (Client VM)

In the virtual machine, you must install the “vGPU Client” driver. Note that regular Game Ready drivers will not work. Once installed, you need to configure a license to unlock the card’s full performance.

# Check the results
nvidia-smi

If the status table shows the correct VRAM you allocated, the system is ready for work.

Important Notes for Real-world Operation

After 6 months of continuous use, I’ve drawn three main lessons:

Temperature Management: When multiple VMs are rendering at once, the card will heat up very quickly. Be proactive in adjusting the fan curve on the Proxmox host, as automatic mode can sometimes be slow to react.
Backup Risks: Proxmox cannot save the GPU memory state. You should shut down the VM before backing up to avoid graphics data errors.
Kernel Updates: Every time Proxmox updates with a patch, the NVIDIA driver might drop out. Always be ready to run the dkms install command to restore the connection.

Leveraging vGPU helped me thoroughly optimize old hardware. I can provide Cloud Gaming for kids while maintaining a professional AI testing environment without spending a penny on expensive Cloud services. Good luck with your configuration!