Self-Hosted LLMs on FABRIC ( public )
Overview
This artifact provides a Jupyter notebook and supporting scripts to deploy a self-hosted Large Language Model (LLM) inference stack on the FABRIC testbed. The stack consists of vLLM (high-throughput GPU inference engine), LiteLLM (OpenAI-compatible proxy with web UI and key management), and Nginx (reverse proxy with TLS), all deployed via Docker Compose.
Features
- Automatic GPU detection — Discovers available GPU sites (Tesla T4, RTX 6000, A30, A40) and auto-selects a compatible model and vLLM configuration.
- Two access modes — Public IP via FABNetv4Ext for external access, or SSH tunnel via FABNetv4 when public IPs are scarce.
- OpenAI-compatible API — Works with any tool or library that supports the OpenAI API (Python SDK, curl, LangChain, etc.).
- Secure by default — Generates unique authentication tokens (HF_TOKEN, LiteLLM master key, admin password) per deployment.
- Docker IPv6 support — Enables IPv6 networking in Docker so containers can reach external resources (e.g., Hugging Face model downloads) even without a public IPv4 address.
Included Components
| Component | Description |
|---|---|
self_hosted_llms.ipynb |
End-to-end notebook: site selection, slice creation, networking, Docker/GPU setup, LLM stack deployment, usage examples, and cleanup |
node_tools/enable_docker.sh |
Installs Docker Engine with IPv6 support (Ubuntu 20–24, Rocky 8–9, Debian 11–12) |
node_tools/enable_nvidia_docker.sh |
Installs CUDA and nvidia-container-toolkit for GPU passthrough |
node_tools/setup-netplan-multihomed.sh |
Configures multihomed networking with policy-based routing for FABNetv4Ext |
GPU & Model Compatibility
| GPU | Architecture | Auto-Selected Model |
|---|---|---|
| Tesla T4 / RTX 6000 | Turing (compute cap 7.5) | Microsoft Phi-2 2.7B (FP16) |
| A30 / A40 | Ampere (compute cap 8.0+) | GPT-OSS 20B (mxfp4) |
Prerequisites
- FABRIC account with a valid project and authentication token
- Completed the Configure Environment notebook
- Familiarity with the Hello, FABRIC notebook
Architecture
┌─────────────────────────────────────────────────────┐ │ FABRIC Site │ │ ┌───────────────────────────────────────────────┐ │ │ │ Node: ai (Ubuntu 24, GPU-equipped) │ │ │ │ │ │ │ │ Docker Services: │ │ │ │ vLLM ──▶ LiteLLM ──▶ Nginx (:80) │ │ │ │ │ │ │ │ FABNetv4 (internal, always) │ │ │ │ FABNetv4Ext (public IP, optional) │ │ │ └───────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────┘
References
Views
6
Downloads
2 active
(0 retired)
Versions
1
Last Updated
March 16, 2026, 8:09 p.m.
Versions
| Version | Created | URN | Downloads | Actions |
|---|---|---|---|---|
| 2026-03-03 | March 3, 2026, 5:15 p.m. | urn:fabric:contents:renci:e5a69df4-d6bc-4497-869c-cc3aa59d015c | 2 | download |
Authors
University of North Carolina at Chapel Hill
— kthare10@email.unc.edu