Setting Up Fedora for Machine Learning: Installing CUDA, PyTorch, and ‘Surviving’ on Wayland

Fedora tutorial - IT technology blog
Fedora tutorial - IT technology blog

Fixing NVIDIA Errors at 2 AM: When Wayland Refuses to Cooperate

Last night, I almost smashed my computer because of that familiar line: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. I had just upgraded to Fedora 40 to experience the smooth Kernel 6.8, but things didn’t go as planned. After installing the driver, I got a black screen. Switching to Wayland caused lag so bad the cursor was skipping, while PyTorch desperately searched for a GPU.

I’ve been using Fedora as my main dev machine for over two years. Its package update speed is incredibly fast, usually just a few days behind Arch Linux. However, turning Fedora into a stable Deep Learning machine on Wayland is a real challenge. After dozens of OS reinstalls, I’ve developed a standard process so you don’t have to stay up all night like I did.

Why Choose Fedora Over Ubuntu?

AI practitioners usually default to Ubuntu. But in the Fedora vs Ubuntu debate, Fedora has a huge advantage with the latest Kernel, helping to maximize the power of RTX 40-series cards or 14th Gen Intel CPUs. The only issue is that Fedora prioritizes Wayland. This is a modern display protocol, but it has a history of “not playing well” with proprietary NVIDIA drivers.

To do ML stably, we need three main pillars:

  • NVIDIA Driver (RPM Fusion): Never use the .run file from the NVIDIA homepage. It will break your system after every kernel update.
  • CUDA Toolkit: The essential parallel computing library.
  • Miniconda/Mamba: Helps isolate environments and prevents breaking the system Python.

Step 1: Enable RPM Fusion Repositories

Don’t start with an untidy system. Fedora updates packages constantly, so it is helpful to speed up DNF for faster downloads. If the versions between the Kernel and Headers mismatch, the driver will fail immediately.

sudo dnf update -y
sudo dnf install dnf-plugins-core -y

Next, enable RPM Fusion. This is the safest and most official source for NVIDIA drivers on Fedora:

sudo dnf install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm \
https://mirrors.rpmfusion.org/non-free/fedora/rpmfusion-non-free-release-$(rpm -E %fedora).noarch.rpm

Step 2: Install NVIDIA Drivers Like a Pro

Many people reboot immediately after installation. That’s a mistake. Fedora needs time to build kernel modules via akmods. Install the driver suite and supporting CUDA libraries:

sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda -y

Extremely Important Note: After the command finishes, go grab a cup of coffee. Wait about 3-5 minutes for the system to build in the background. Check the status with the command:

modinfo -F version nvidia

If the terminal returns a version (e.g., 555.xx), you are allowed to reboot. If using Secure Boot, you must sign this module, but the quickest way is to temporarily disable it in the BIOS.

Step 3: Install CUDA Toolkit Without Conflicts

I never use dnf install cuda. This command from the NVIDIA repo often overwrites existing drivers and causes black screen errors. The cleanest way is to just get the Toolkit.

sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora40/x86_64/cuda-fedora40.repo
sudo dnf install cuda-toolkit -y

Configure environment variables in ~/.bashrc:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Type source ~/.bashrc to apply the changes immediately.

Step 4: Manage Environments with Miniconda

Using pip install directly in the global environment is suicide. One day dnf will crash due to system library conflicts. Use Miniconda to keep everything organized, a principle also vital for Professional Python Deployment.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Create a separate environment for your AI projects:

conda create -n ai_lab python=3.10
conda activate ai_lab

Step 5: Install PyTorch and Handle Wayland Lag

PyTorch now supports CUDA very well on Linux. Install the stable version:

pip3 install torch torchvision torchaudio

Check if the GPU is “connected”:

python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}'); print(f'Card: {torch.cuda.get_device_name(0)}')"

If you see the RTX card name appear, you’ve succeeded 99%.

Pro Tip for Jupyter Notebook on Wayland

When running Jupyter on Wayland, browsers often experience flickering. This is caused by XWayland. Force Chrome/Edge to run native Wayland by setting the Preferred Ozone platform flag to Wayland in chrome://flags.

Don’t forget to install the kernel for Jupyter:

pip install jupyterlab ipywidgets
python -m ipykernel install --user --name ai_lab --display-name "Python 3.10 (AI)"

When Everything Breaks: Rescue Tactics

After updating Fedora with dnf upgrade, sometimes nvidia-smi will disappear. Don’t panic. Usually, it’s because the new Kernel hasn’t had time to rebuild the module. You just need to force it to rebuild:

sudo akmods --force
sudo dracut -f

Reboot your machine, and everything will be back on track. Fedora is actually very powerful for ML if you understand how it manages packages.

Conclusion

Setting up a Linux workflow is never easy, especially with the NVIDIA and Wayland combo. However, once configured correctly, you’ll have a blazing-fast system that is always up-to-date with the latest AI technologies. Hopefully, this guide helps you stay sane and saves you from useless debugging hours.

Share: