Deepseek V3 0324 Local Ai Running on AMD EPYC with LLAMA.CPP and Unsloth GUFF

When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network. As an Amazon Associate I earn from qualifying purchases. #ad #promotions

Once more into the deepseek we adventure, this time with the latest V3 checkpoint 0324! Lets get a Tokens check on the Unsloth and CPU ONLY on the EPYC ROME 7702

Thankfully we have some workable guff files out from the Unsloth already so today we are going to hook you up with the steps to get a complete system up and running locally. At first glance of the sizes they provide up to a Q4 at 4.5bpw which they rate as great quality. Unsloth has good directions but leave out a few steps to get you downloaded I will cover here also. Yes this model is a bit larger than R1 and non-reasoning, but it should take much less time to answer and be much more usable… in theory. Reports are in of it also being BETTER vs R1 are very welcome, but lets get some tests in on the tokens per second you might expect. The rig we are using is the rig I always use for such tasks, the EPYC 7702 with 512GB DDR4 2400 (IKR) speed ram.

Base Local Ai Rig Specs

GPU Rack Frame https://geni.us/GPU_Rack_Frame

Gigabyte MZ32-AR0 Motherboard https://geni.us/mz32-ar0_motherboard

512GB DDR4 2400 RAM KIT https://geni.us/512GB_DDR4_RAM

PCIe4 Risers (x4) https://geni.us/PCIe4_Riser_Cable

EPYC 7702 https://geni.us/EPYC_7702p

Corsair H170i ELITE CAPELLIX https://geni.us/iCUE_H170i_Capellix

SP3/TR4/5 Retention Kit https://geni.us/ELITE_RetentionKit

ARTIC MX4 Thermal Paste https://geni.us/Arctic_ThermalPaste

CORSAIR HX1500i PSU https://geni.us/Corsair_HX1500iPSU

EVO 990 2TB NVMe https://geni.us/EVO_PLUS_990_2TB

4i SFF-8654 to 4i SFF-8654 (x4) https://geni.us/SFF8654_to_SFF8654

HDD Rack Screws for Fans https://geni.us/HDD_RackScrews

BIOS Settings

Performance Testing Results

Deepseek V3 Q4

MoE Bits 4.5 — Q4_K_XL — Running RAM Usage 390GB

|————model——————|—- size—-|-params-|-backend-|-ngl-|-threads-|-test-|-t/s-|
———————————————————————————————————
| deepseek2 671B Q4_K – Medium | 377.07 GiB | 671.03 B | CUDA | 99 | 124 | pp512 | 21.16 ± 0.80 |
| deepseek2 671B Q4_K – Medium | 377.07 GiB | 671.03 B | CUDA | 99 | 124 | tg128 | 3.39 ± 0.11 |


Deepseek V3 Q3

MoE Bits 3.5 — Q3_K_XL — Running RAM Usage 228GB

|————model——————|—- size—-|-params-|-backend-|-ngl-|-threads-|-test-|-t/s-|
———————————————————————————————————

| deepseek2 671B Q3_K – Medium | 298.78 GiB | 671.03 B | CUDA | 99 | 124 | pp512 | 22.56 ± 0.04 |
| deepseek2 671B Q3_K – Medium | 298.78 GiB | 671.03 B | CUDA | 99 | 124 | tg128 | 3.79 ± 0.09 |


Deepseek V3 Q2

MoE Bits 2.71 — Q2_K_XL — Running RAM Usage XXXGB

|————model——————|—- size—-|-params-|-backend-|-ngl-|-threads-|-test-|-t/s-|
———————————————————————————————————

| deepseek2 671B Q2_K – Medium | 230.59 GiB | 671.03 B | CUDA | 99 | 124 | pp512 | 23.66 ± 0.06 |
| deepseek2 671B Q2_K – Medium | 230.59 GiB | 671.03 B | CUDA | 99 | 124 | tg128 | 4.07 ± 0.08 |


Deepseek V3 Q2

MoE Bits 2.42 — IQ2_XXS — Running RAM Usage XXXGB

|————model——————|—- size—-|-params-|-backend-|-ngl-|-threads-|-test-|-t/s-|
———————————————————————————————————

| deepseek2 671B IQ2_XXS – 2.0625 bpw | 203.63 GiB | 671.03 B | CUDA | 99 | 124 | pp512 | 17.49 ± 0.02 |
| deepseek2 671B IQ2_XXS – 2.0625 bpw | 203.63 GiB | 671.03 B | CUDA | 99 | 124 | tg128 | 4.07 ± 0.10 |

Setup and Installation Instructions

Llama.cpp build: 53af4dba (4959)