K3s Edge AI GPU Setup: NVIDIA Jetson Guide 2026

Edge AI isn’t optional anymore. In 2026, running inference locally cuts latency from seconds to milliseconds, slashes cloud costs by 60-80%, and keeps sensitive data where it belongs. But standard Kubernetes is a 1GB+ behemoth that chokes on edge hardware. K3s fixes this. At less than 100MB, it brings full orchestration to NVIDIA Jetson devices and edge servers, turning them into production-ready AI clusters. Here’s the complete setup: from bare Jetson Nano to serving real-time object detection in under 30 minutes.

What You’ll Need

Hardware-wise, any NVIDIA Jetson works: Nano ($99), Xavier NX ($399), or Orin ($899+). Already running a GPU server? That works too. Software: Ubuntu 20.04+ and NVIDIA drivers (JetPack handles this on Jetson devices). You’ll need basic Kubernetes knowledge – pods, deployments, kubectl commands. If you’ve run Docker, you’re ready.

Install NVIDIA Container Toolkit First

This is critical. The NVIDIA Container Toolkit lets containers access your GPU. Install it BEFORE K3s – K3s will auto-detect it and configure GPU support automatically. Skip this order and you’ll fight manual configuration hell.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Verify with nvidia-smi. You should see your GPU, driver version, and CUDA version. No output? Fix your drivers before continuing.

Install K3s with GPU Auto-Detection

K3s installation is embarrassingly simple – one command. Since we installed Container Toolkit first, K3s automatically configures containerd for GPU access. No flags needed.

curl -sfL https://get.k3s.io | sh -

Wait 30 seconds. Check cluster status:

sudo kubectl get nodes

Your node should show Ready. K3s is running. But Kubernetes doesn’t know about your GPU yet – that’s next.

Configure GPU Runtime Access

Containerd needs explicit configuration to expose the NVIDIA runtime to pods. Create /etc/containerd/config.d/99-nvidia.toml:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
  runtime_type = "io.containerd.runc.v2"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
    BinaryName = "/usr/bin/nvidia-container-runtime"
    SystemdCgroup = true

Restart K3s to apply:

sudo systemctl restart k3s

This creates a RuntimeClass called nvidia. Pods requesting this runtime get GPU access. Clean abstraction – GPU scheduling doesn’t pollute your pod specs.

Deploy NVIDIA Device Plugin

The NVIDIA device plugin advertises GPUs as schedulable Kubernetes resources. Without it, kubectl has no idea your node has a GPU.

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.13.0/nvidia-device-plugin.yml

Verify GPU nodes:

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

You should see 1 (or more) under the GPU column. If it’s blank, check device plugin logs: kubectl logs -n kube-system -l name=nvidia-device-plugin-ds.

Run Your First GPU Workload

Time to test. Create gpu-test.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  runtimeClassName: nvidia
  containers:
  - name: cuda
    image: nvidia/cuda:11.8.0-base-ubuntu22.04
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: 1

Deploy and check logs:

kubectl apply -f gpu-test.yaml
sleep 5
kubectl logs gpu-test

You should see nvidia-smi output showing your GPU from inside the container. GPU model, driver version, CUDA version – all visible. That’s your edge cluster running GPU-accelerated workloads.

Deploy Real AI Inference

Hello world is done. Let’s run actual AI. Here’s a TensorFlow Serving deployment requesting GPU access:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tf-inference
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tf-inference
  template:
    metadata:
      labels:
        app: tf-inference
    spec:
      runtimeClassName: nvidia
      containers:
      - name: tensorflow
        image: tensorflow/tensorflow:latest-gpu
        command: ["python", "-c", "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"]
        resources:
          limits:
            nvidia.com/gpu: 1

Deploy it:

kubectl apply -f tf-inference.yaml
kubectl logs -f deployment/tf-inference

TensorFlow should detect your GPU and print physical device info. Swap that Python command for actual model serving (TensorFlow Serving, Triton, custom FastAPI) and you’ve got production inference at the edge.

When Things Break: Troubleshooting Checklist

GPU not detected by Kubernetes?

Run nvidia-smi on the host. No output = driver issue, not Kubernetes.
Check Container Toolkit: dpkg -l | grep nvidia-container-toolkit. Not installed? Go back to step one.
Verify device plugin is running: kubectl get pods -n kube-system -l name=nvidia-device-plugin-ds.
Check node GPU capacity: kubectl describe node | grep nvidia.com/gpu. Should show allocatable GPUs.

Pod can’t access GPU?

Add runtimeClassName: nvidia to your pod spec. Missing this is the #1 cause.
Specify resource request: nvidia.com/gpu: 1 under resources.limits.
Check GPU availability: kubectl describe node shows allocated vs. available. Another pod hogging it?

CUDA version mismatch errors?

Your NVIDIA driver supports specific CUDA versions. Check the NVIDIA CUDA compatibility matrix. Upgrade drivers or downgrade your container image CUDA version.

Production Hardening: What’s Next

You’ve got GPU workloads running. To productionize:

Monitoring: Deploy Prometheus with DCGM Exporter to track GPU utilization, temperature, memory. Edge clusters running blind fail silently.

Multi-node clusters: Add worker nodes with curl -sfL https://get.k3s.io | K3S_URL=https://server-ip:6443 K3S_TOKEN=your-token sh -. High availability matters at the edge – hardware fails.

Model serving: Integrate Triton Inference Server for production-grade multi-model serving with dynamic batching and model versioning.

Security: Apply network policies. Edge deployments are exposed – default-allow is a liability. Use pod security standards (restricted profile minimum).

K3s isn’t a lightweight Kubernetes – it’s a better Kubernetes for edge. Full orchestration, 1% of the bloat. GPU acceleration isn’t optional in 2026, and neither is running it where your data lives. This setup moves you from cloud-dependent to edge-capable in 30 minutes. The rest is just scale.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.