When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network. As an Amazon Associate I earn from qualifying purchases. #ad #promotions



10/19/2025 – Changed PSU from Corsair HX1500i to ASROCK RACK 1600
10/15/2024 – Added performance table for a few LLMs from the DGX comparison video
Server (AMD EPYC) Build Benchmarks
Okay I am not going to make you read tables to get to the bottom line. Here is a nice chart to outline this competition broken down by prompt processing and token generation.
Ollama vs Llama.cpp – Prompt Processing
Ollama vs Llama.cpp – Token Generation
The base of the server variant of this Ai rig is what I used for the $2000 DeepSeek R1 671b video and at 4 tokens per second, it was surprising. I have since been able to tune the pure CPU only performance of Deepseek R1 671b to 6TPS on the same rig which is somewhat exciting. Joining the Quad 3090 build fold awesomeness is also my new favorite budget flavor, the Ryzen 5 9600X based B650 setup! It is cheaper, like a lot cheaper. RAM prices have really gotten crazy.
Parts and Price Details
PRICE CATEGORY: $3892 (Ryzen) to $5500 (EPYC)
GPU VRAM: 96
PRICE/GB/VRAM: $36.45 to $57.21
Ryzen 9600X Quad 3090 Build Video: https://www.youtube.com/watch?v=So7tqRSZ0s8
EPYC 7702 Quad 3090 1 Yr Review Video: https://www.youtube.com/watch?v=gNexETeCLko
GPU ALTERNATIVES: Very many actually in the EPYC variant. You can even hook up 2 extra GPUs if your power envelope allows or use nvidia-smi to set your gpus wattage maximums per the ollama setup guides crontab. You would need around a 2kW PSU or a dual PSU setup ideally.
There are 2 versions of this build you can run of this that has a lot to do with use cases. If your use case is primarily inference for models that will fit fully in VRAM vs CPU offloaded huge LLMs you can save a LOT. Like $1300 off the server package. The RYZEN build is also somewhat faster it appears around 10% on Token generation and 12% on Prompt Processing, but that needs further research. Here are the components that differ between the two. (Prices are approximate)
Ryzen Quad GPU Build Parts
4x PCIe3 Risers(Just get high quality risers)Refer to the video for a full overview of how this impacts the use cases for the rig.
EPYC Quad GPU Build Parts
The current price delta at time of writing this is $1283.00 which is pretty significant. That lowers the $5K to $3.8K given the same GPUs/PSU/Frame/Storage are used. Those common components shared between both systems include the GPUs of note.
Shared Common Components
Total Cost of Ownership
This leads to a total system cost for the Ryzen alternative build of $620 + $3,272 = $3,892
The total system cost for the AMD EPYC variation is $1,903 + $3,272= $5,175
That is a total system cost difference in price of $5,175 – $3,892 = $1,283
Here is the first video I did after I built the original Quad 3090 rig. It has a lot of assembly alpha you need if you want to do a quad 3090 build. Much of the learnings in if also apply to other open air rig frame builds.
OG Server Video – https://www.youtube.com/watch?v=JN4EhaM7vyw
Ai Server Tips and Tricks
GPU Rack Modified Mounting Guide
Running an LLM Locally on, say OLLAMA or Llama.cpp does not require massive GPU/RAM PCIe bandwidth. A single lane of Gen 3 speeds can be fine for chat style inference. If you want to however train, image or video generate or fine-tune a model you do need to have as much PCIe bandwidth as you can which makes a server grade motherboard a good option. You can run one to six GPUs at full Gen4 PCIe 16x Bandwidth with something like this AMD EPYC. Basically for those workloads multiple GPUs running at whatever the latest and greatest PCIe gen is ideal.