FABRIC Artifact Manager

Self-Hosted LLMs on FABRIC ( public )

Back

Overview

This artifact provides a Jupyter notebook and supporting scripts to deploy a self-hosted Large Language Model (LLM) inference stack on the FABRIC testbed. The stack consists of vLLM (high-throughput GPU inference engine), LiteLLM (OpenAI-compatible proxy with web UI and key management), and Nginx (reverse proxy with TLS), all deployed via Docker Compose.

Features

Automatic GPU detection — Discovers available GPU sites (Tesla T4, RTX 6000, A30, A40) and auto-selects a compatible model and vLLM configuration.
Two access modes — Public IP via FABNetv4Ext for external access, or SSH tunnel via FABNetv4 when public IPs are scarce.
OpenAI-compatible API — Works with any tool or library that supports the OpenAI API (Python SDK, curl, LangChain, etc.).
Secure by default — Generates unique authentication tokens (HF_TOKEN, LiteLLM master key, admin password) per deployment.
Docker IPv6 support — Enables IPv6 networking in Docker so containers can reach external resources (e.g., Hugging Face model downloads) even without a public IPv4 address.

Included Components

Component	Description
`self_hosted_llms.ipynb`	End-to-end notebook: site selection, slice creation, networking, Docker/GPU setup, LLM stack deployment, usage examples, and cleanup
`node_tools/enable_docker.sh`	Installs Docker Engine with IPv6 support (Ubuntu 20–24, Rocky 8–9, Debian 11–12)
`node_tools/enable_nvidia_docker.sh`	Installs CUDA and nvidia-container-toolkit for GPU passthrough
`node_tools/setup-netplan-multihomed.sh`	Configures multihomed networking with policy-based routing for FABNetv4Ext

GPU & Model Compatibility

GPU	Architecture	Auto-Selected Model
Tesla T4 / RTX 6000	Turing (compute cap 7.5)	Microsoft Phi-2 2.7B (FP16)
A30 / A40	Ampere (compute cap 8.0+)	GPT-OSS 20B (mxfp4)

Prerequisites

FABRIC account with a valid project and authentication token
Completed the Configure Environment notebook
Familiarity with the Hello, FABRIC notebook

Architecture

  ┌─────────────────────────────────────────────────────┐
  │  FABRIC Site                                        │
  │  ┌───────────────────────────────────────────────┐  │
  │  │  Node: ai  (Ubuntu 24, GPU-equipped)          │  │
  │  │                                               │  │
  │  │  Docker Services:                             │  │
  │  │    vLLM  ──▶  LiteLLM  ──▶  Nginx (:80)      │  │
  │  │                                               │  │
  │  │  FABNetv4 (internal, always)                  │  │
  │  │  FABNetv4Ext (public IP, optional)            │  │
  │  └───────────────────────────────────────────────┘  │
  └─────────────────────────────────────────────────────┘

References

Views

223

Downloads

63 active (0 retired)

Versions

Last Updated

March 16, 2026, 8:09 p.m.