Optimizing Jellyfin: From NFS Bottlenecks to GPU-Accelerated Bliss

I recently went down a rabbit hole trying to fix the one thing that ruins a home media experience: the loading spinner. What started as a simple storage migration turned into a deep dive into NFS protocol versions, kernel tuning, and the intricacies of GPU passthrough in virtualized environments.

The Initial Setup: A Recipe for Latency

My original Jellyfin instance was living in an Ubuntu VM provisioned with 6 vCPUs and 6GB of RAM. At the time, I wasn't using any hardware acceleration, meaning the CPU was doing the heavy lifting for every transcode. To keep things stable, I had a self-imposed "720p ceiling"—anything higher would essentially choke the CPUs, making it unwatchable in multiple streams.

The real trouble started when I migrated my media storage to a dedicated NAS over NFS. Suddenly, even simple streams would take an eternity to buffer.

The NFSv3 vs. NFSv4.2 Performance Gap

My homelab is wired for Gigabit, and I could verify near-wire speeds (900+ Mb/s) between the VM and the NAS using raw TCP tests. However, actual file transfers over the NFS mount were abysmal, topping out at around 100 Mb/s.

It turns out the default fallback was NFSv3. While v3 is a classic, it lacks the modern optimizations found in the v4.x spec. By forcing the client to use NFSv4.2 and tweaking the mount configuration—specifically leveraging nconnect to parallelize the TCP connections—my transfer speeds and Jellyfin loading times improved significantly.

To force the upgrade, I updated my /etc/fstab with the following configuration:

# Example /etc/fstab entry for high-speed NFSv4.2
192.168.90.30:/export/Entertainment  /data1  nfs4  rw,nconnect=8,rsize=524288,wsize=524288,hard,proto=tcp,noatime,_netdev 0 0

Validating the Throughput

To ensure the bottleneck wasn't the network itself, I used iperf3 to test the raw pipe between the VM and the NAS.

On the NAS (Server):

iperf3 -s

On the VM (Client):

iperf3 -c 192.168.90.30 -P 4

Note: The -P 4 flag uses parallel streams to better simulate the nconnect behavior.

Once the network was cleared, I benchmarked the actual disk I/O over the NFS mount using dd. This confirmed that moving to v4.2 pushed me past that 100 Mb/s ceiling:

# Testing write speed
dd if=/dev/zero of=/data1/testfile bs=1M count=1024 conv=fdatasync status=progress

# Testing read speed
dd if=/data1/testfile of=/dev/null bs=1M status=progress

Moving to the "Power Node" with Docker

While the NFS tuning helped, the lack of hardware transcoding was still a bottleneck. I decided to move Jellyfin into a Docker container on a more robust node. This new host runs an Open Media Vault (OMV) NAS as a VM, but critically, the underlying storage is SSD-backed, and the node has a dedicated GPU.

I set up a Docker-in-VM environment with PCIe GPU passthrough, making the NVIDIA hardware available to the containerized Jellyfin instance.

The Missing Library & GPU Transcoding Fix

Initial attempts at hardware transcoding threw cryptic errors. Digging through the logs, I realized I’d made a few mistakes: I hadn't explicitly enabled the video capability in the Docker resources, and the transcoder was failing because it couldn't load nvcuvid.

Installing libnvidia-encode on the host VM and performing a clean reboot was the final piece of the puzzle. Here is the docker-compose.yml that finally got everything humming:

services:
  jellyfin:
    image: jellyfin/jellyfin
    container_name: jellyfin
    user: 1000:1000
    ports:
      - 8096:8096/tcp
      - 7359:7359/udp
    volumes:
      - /data1:/media
      - jellyfin-cache:/cache
      - jellyfin-config:/config
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities:
                - compute
                - utility
                - video
    restart: unless-stopped

volumes:
  jellyfin-cache:
    external: true
  jellyfin-config:
    external: true

The Result

The difference is night and day. With the combination of NFSv4.2 throughput, SSD IOPS, and NVENC hardware acceleration, streams now load almost instantly. Even high-bitrate 4K files that require real-time transcoding to lower resolutions start playing without hesitation.

The loading spinner is officially a thing of the past.