Self-Hosted LLMs on FABRIC ( public )

Overview

This artifact provides a Jupyter notebook and supporting scripts to deploy a self-hosted Large Language Model (LLM) inference stack on the FABRIC testbed. The stack consists of vLLM (high-throughput GPU inference engine), LiteLLM (OpenAI-compatible proxy with web UI and key management), and Nginx (reverse proxy with TLS), all deployed via Docker Compose.

Features

  • Automatic GPU detection — Discovers available GPU sites (Tesla T4, RTX 6000, A30, A40) and auto-selects a compatible model and vLLM configuration.
  • Two access modes — Public IP via FABNetv4Ext for external access, or SSH tunnel via FABNetv4 when public IPs are scarce.
  • OpenAI-compatible API — Works with any tool or library that supports the OpenAI API (Python SDK, curl, LangChain, etc.).
  • Secure by default — Generates unique authentication tokens (HF_TOKEN, LiteLLM master key, admin password) per deployment.
  • Docker IPv6 support — Enables IPv6 networking in Docker so containers can reach external resources (e.g., Hugging Face model downloads) even without a public IPv4 address.

Included Components

Component Description
self_hosted_llms.ipynb End-to-end notebook: site selection, slice creation, networking, Docker/GPU setup, LLM stack deployment, usage examples, and cleanup
node_tools/enable_docker.sh Installs Docker Engine with IPv6 support (Ubuntu 20–24, Rocky 8–9, Debian 11–12)
node_tools/enable_nvidia_docker.sh Installs CUDA and nvidia-container-toolkit for GPU passthrough
node_tools/setup-netplan-multihomed.sh Configures multihomed networking with policy-based routing for FABNetv4Ext

GPU & Model Compatibility

GPU Architecture Auto-Selected Model
Tesla T4 / RTX 6000 Turing (compute cap 7.5) Microsoft Phi-2 2.7B (FP16)
A30 / A40 Ampere (compute cap 8.0+) GPT-OSS 20B (mxfp4)

Prerequisites

  • FABRIC account with a valid project and authentication token
  • Completed the Configure Environment notebook
  • Familiarity with the Hello, FABRIC notebook

Architecture

  ┌─────────────────────────────────────────────────────┐
  │  FABRIC Site                                        │
  │  ┌───────────────────────────────────────────────┐  │
  │  │  Node: ai  (Ubuntu 24, GPU-equipped)          │  │
  │  │                                               │  │
  │  │  Docker Services:                             │  │
  │  │    vLLM  ──▶  LiteLLM  ──▶  Nginx (:80)      │  │
  │  │                                               │  │
  │  │  FABNetv4 (internal, always)                  │  │
  │  │  FABNetv4Ext (public IP, optional)            │  │
  │  └───────────────────────────────────────────────┘  │
  └─────────────────────────────────────────────────────┘
  

References

Views
6
Downloads
2 active (0 retired)
Versions
1
Last Updated
March 16, 2026, 8:09 p.m.
Version Created URN Downloads Actions
2026-03-03 March 3, 2026, 5:15 p.m. urn:fabric:contents:renci:e5a69df4-d6bc-4497-869c-cc3aa59d015c 2 download
University of North Carolina at Chapel Hill — kthare10@email.unc.edu