Local Ai Home Server Build at Mid Range $750 Price

When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network. As an Amazon Associate I earn from qualifying purchases. #ad #promotions

Getting as much VRAM as possible is always a challenge, but hitting 24GB total under $750 is really good. Matching the possibilities of performance in the class to have minimal impact also for other use cases is an important consideration. The world of GPU buying has been fairly crazy also recently with so many options and such bad availability and price inflation. The reasons why folks still prefer (for good reasons) Nvidia GPUs over other makers is shrinking also, but the best current value for GB of VRAM that can hit 24GB I do think is this combination. The 3060 12GB is an interesting GPU that really kicked ass for its old MSRP. It sports 12GB GDDR6 VRAM on a 192-bit bus able to supply 360GB/s of crucial-for-inference bandwidth. Most came with 8 pin power and demand up to 170 watts at full load. Due to bad design choices the 4060ti 16GB lineup was disappointing for inference as they had lower bandwidth. The 4070 12GB was the next “step up” in performance in the ADA generation. Pairing two with the venerable HP Z440 feels good also as that platform can really grow and perform for a low price. Core counts ranging from 4 to 22 on the Broadwell lineup of CPUs, PCIe Bifurcation support,

MID RANGE AI SERVER

PRICE CATEGORY: $630 to $750

VRAM: 24

PRICE/GB/VRAM:  $26.88 to $31.25

REVIEW VIDEO: https://youtu.be/3XC8BA5UNBs

GPU ALTERNATIVES: One could conceivably fit in a 3090 24GB GPU that has 2x 8 pins and do a power limit to 300w and not take a horrible hit. I also thought it would be funny to have 4 Tesla P4’s and blowers, but I dont think that would be a good idea to have all the pcie slots drawing that much power. Plus since they need blowers that are not cheap, it doesn’t come out as a better deal. It would also be much much slower.

ITEM DESC QTY PRICE SUB Link
CHASSIS HP Z440 1 100 100
RAM 16DDR4 RAM (Cheap ram options) 0 15 0
MOBO included 0
CPU included 0
CPU COOLER included 0
GPU 3060 12GB 2 250 500
PSU included 0
NVMe/SSD Samsung 870 EVO 500GB 1 35 35 https://geni.us/evo-870-ssd-500GB
ACCY 6-to-8 pin adapter 2 pack 1 10 10 GPU Adapter 6pin-8pin
645

Review Video

$750 Local Ai Server Tips and Tricks

  • The 700w PSU is needed but you can usually see this in pictures if the chassis is open, but may be a good thing to ask a seller as well.
  • You can use E5 16xx and E3 26xx V3 and V4 CPU’s. I would opt for the V4 for better power utilization and 2400 speed ddr4 support.
  • DO NOT get in a situation you need to re-flash the bios if you can. I forget all the details of what I did but mine basically was dead after an update.
  • You can do need to use 6-to-8 pin adapters, they come in a two pack.
  • The 700w PSU supports 2x 6-pins.
  • The Front USB Warning screen will prevent booting, until you hit enter to pass it by, if you unplug the front usb3 chonker cable. I found a crazy video that showed someone bridging pins 4 and 4 (as viewed from the bottom of the case down into it) with the bottom one offset as it does not have “pin 1” position. Used an old mobo jumper and it does work!
  • You need to be on a modern BIOS version to have pcie bifurcation working. If the option is not in your pcie advanced section that could be an issue.

If you wanted to limit the power draw on the GPUs you can edit your crontab as follows. Type:

crontab -e

add in or replace your @reboot nvidia-smi line at the top (you followed the prior Proxmox GPU Passthrough LXC guide yes?)

@reboot nvidia-smi --pm 1
@reboot nvidia-smi --pl 150

Local LLM Performance Benchmarks

Qnt. Model Params CTX response t/s prompt t/s VRAM GB Observed SPECIAL ENV NOTES
4 Mistral Small 3.1 24 4096 broken broken broken OLLAMA_NUM_PARALLEL=2 just broken
8 Gemma 3 12 4096 19.67 335 60 OLLAMA_NUM_PARALLEL=2
8 Cogito V1 Preview 14 4096 16.65 1558 55 OLLAMA_NUM_PARALLEL=2
4 Gemma 3 27 2048 6 403 72% OLLAMA_NUM_PARALLEL=2 weird split bug maybe
8 Deepcoder Preview 14 4096 17 1650 71 OLLAMA_NUM_PARALLEL=2
4 QwQ 32 4096 11.92 1260 84% OLLAMA_NUM_PARALLEL=2

I use the following manner to determine what is a “good” size in B value (that is parameters stored btw) to Quant level. I basically never go for the top precision FP16. I do try to get a quant 5+ LLM which usually is either 4 or 8, so I pick 8. This impacts the ctx window very heavily however so you need to not use this for all tasks that might be context biased, such as programming, if you need to feed in a codebase. Also the more standardized an acceptable answer would be for any given question or prompt, the less sensitive it is to lower quants like 4. Creativity is best achieved higher quants. Some models do not implode at the next step down in B params, some do very bad. All these are factors you need to tune around and use to get good at predicting which combinations will work best for you.