When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network. As an Amazon Associate I earn from qualifying purchases. #ad #promotions



The basic way to get some simple benchmarks done with vLLM on your Local Ai server follow here. This guide assumes you already have followed THE PRIOR GUIDES to get up to this point. You need to have a functional vLLM to access and run these benchmarks off. You will be running two processes, not one. If you are running in Ubuntu you may need to run sudo in front of these apt cmds. If you have not downloaded qwen3.5-27b as safetensors yet this process will for you on first run.
Login to your vLLM server and install screen.
Here are a few screen cmds you will need.
-S : Create new screen
-list : Lists pid and name
-r : This resumes a session which you can -list
(cntrl key) + a then d : breaks screen session and drops you back to prior shell
Create a New screen named vllm-runner
You are now in this screens shell so you will see your screen reset view.
Enter your venv
Run your vLLM Server.
You will be running two processes eventually, this one will be longer lived possibly and running in a screen allows you to exit out of your users session and not stop the process that is running in the screen. In this case, vLLM. (you may need to redo the double dashes if it doesn’t copy well)
Watch it spew until you see it prints “INFO: Application startup complete.”
Now (cntrl key) + a then d fast. This drops your screen session back to prior shell but leaves the vllm runner up.
Activate again. You do not enter another screen, after the benchmark is done it exits after it prints stats.
Run your benchmark script to test your performance. Run again after the first run and take that second reading as closer to what you will expect when batching against your already active endpoint.