vLLM – ITFROMZERO – Share tobe shared!

Artificial Intelligence tutorial - IT technology blog

Guide to Deploying LLM Inference with vLLM on Linux: Boosting Throughput and Saving VRAM

By admin March 16, 2026

This article shares practical experience deploying LLM inference with vLLM on Linux, aiming to boost throughput and save VRAM. It provides detailed guidance from installation, configuring important parameters, to testing and monitoring performance, helping you optimize your system.

Deploying AI Models on Your Own Server: Self-Hosting to Protect Sensitive Data

By admin March 7, 2026

A guide to self-hosting AI models (llama.cpp, vLLM) on your own server to protect sensitive data and avoid legal risks associated with cloud AI. Covers security configuration with Nginx reverse proxy, firewall rules, Docker Compose, and Python integration.