vLLM Benchmarking Basics Local Ai Homelab Series

When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network. As an Amazon Associate I earn from qualifying purchases. #ad #promotions

The basic way to get some simple benchmarks done with vLLM on your Local Ai server follow here. This guide assumes you already have followed THE PRIOR GUIDES to get up to this point. You need to have a functional vLLM to access and run these benchmarks off. You will be running two processes, not one. If you are running in Ubuntu you may need to run sudo in front of these apt cmds. If you have not downloaded qwen3.5-27b as safetensors yet this process will for you on first run.

Login to your vLLM server and install screen.

apt update && apt install screen -y

Here are a few screen cmds you will need.

-S : Create new screen

-list : Lists pid and name

-r : This resumes a session which you can -list

(cntrl key) + a then d : breaks screen session and drops you back to prior shell

Create a New screen named vllm-runner

You are now in this screens shell so you will see your screen reset view.

Enter your venv

Run your vLLM Server.

You will be running two processes eventually, this one will be longer lived possibly and running in a screen allows you to exit out of your users session and not stop the process that is running in the screen. In this case, vLLM. (you may need to redo the double dashes if it doesn’t copy well)

vllm serve Qwen/Qwen3.5-27B –served-model-name qwen35-27b –port 9876 –-gpu-memory-utilization 0.92 –-enable-chunked-prefill –-max-num-seqs 128 –-block-size 64 –-tensor-parallel-size 4 –-disable-custom-all-reduce

Watch it spew until you see it prints “INFO:     Application startup complete.”

Now (cntrl key) + a then d fast. This drops your screen session back to prior shell but leaves the vllm runner up.

Activate again. You do not enter another screen, after the benchmark is done it exits after it prints stats.

Run your benchmark script to test your performance. Run again after the first run and take that second reading as closer to what you will expect when batching against your already active endpoint.

vllm bench serve \
–-backend vllm \
–-base-url http://192.168.1.88:9876 \
–-endpoint /v1/completions \
–-model qwen35-27b \
–-tokenizer Qwen/Qwen3.5-27B \
–-dataset-name random \
–-random-input-len 1024 \
–-random-output-len 512 \
–-num-prompts 50 \
–-request-rate inf