When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network. As an Amazon Associate I earn from qualifying purchases. #ad #promotions



MID RANGE AI SERVER
PRICE CATEGORY: $630 to $750
VRAM: 24
PRICE/GB/VRAM: $26.88 to $31.25
REVIEW VIDEO: https://youtu.be/3XC8BA5UNBs
GPU ALTERNATIVES: One could conceivably fit in a 3090 24GB GPU that has 2x 8 pins and do a power limit to 300w and not take a horrible hit. I also thought it would be funny to have 4 Tesla P4’s and blowers, but I dont think that would be a good idea to have all the pcie slots drawing that much power. Plus since they need blowers that are not cheap, it doesn’t come out as a better deal. It would also be much much slower.
Review Video
$750 Local Ai Server Tips and Tricks
If you wanted to limit the power draw on the GPUs you can edit your crontab as follows. Type:
crontab -e
add in or replace your @reboot nvidia-smi line at the top (you followed the prior Proxmox GPU Passthrough LXC guide yes?)
Local LLM Performance Benchmarks
I use the following manner to determine what is a “good” size in B value (that is parameters stored btw) to Quant level. I basically never go for the top precision FP16. I do try to get a quant 5+ LLM which usually is either 4 or 8, so I pick 8. This impacts the ctx window very heavily however so you need to not use this for all tasks that might be context biased, such as programming, if you need to feed in a codebase. Also the more standardized an acceptable answer would be for any given question or prompt, the less sensitive it is to lower quants like 4. Creativity is best achieved higher quants. Some models do not implode at the next step down in B params, some do very bad. All these are factors you need to tune around and use to get good at predicting which combinations will work best for you.