Nvidia 5090, 5080, 5070 and DIGITS for Local Ai Server Inference

When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network. As an Amazon Associate I earn from qualifying purchases. #ad #promotions

The newest generation of Nvidia Geforce RTX GPUs look very good for their MSRP and specs. First let me ask you to suspend what marking speak we heard for now from CES yesterday. Many of these numbers are vague and literally just the largest number for large number sakes. Not to diminish the impacts however as the performance gains, and especially VRAM size gains, are substantial.

TOPICS WE WILL COVER

  • My Notes on the 50X0 GPU Announcement
  • My Notes on the DIGITS Announcement
  • How Do NVIDIA 50 Series GPUs compare to 40X0 and 30X0?
  • What does this mean for used 40 series cards?
  • What does this mean for used 30 series cards?
  • What does this mean for all other cards?

Quick 50 series Specs

5070 – $549 (leave this for the gamers)

5070ti – $749 (Interesting and likely realistic to get)

5080 – $999 (leave this for the gamers)

5090 – $1999 (VERY interesting, likely impossible to get)

DIGITS Household Supercomputer – $3000 (May launch, lacks details we need on RAM BW)

https://www.nvidia.com/en-us/geforce/news/rtx-50-series-graphics-cards-gpu-laptop-
announcements/

https://www.nvidia.com/en-us/project-digits/

MY NOTES ON THE TRAD GPUs

  • 5090 is going to be unattainable but also you have to try to get one. Good luck to us all!
    • If you can get one, you’re likely limited to one in person.
      • Mix and match is going to suck as PCIe generations are so different.
      • You need PCIe Gen 5 gear and if you’re thinking of training you’ll need Threadripper or Intel
        workstation or the latest EPYC 90005 series PCIe 5 chips.

        • Assumes we still would use GPUs for training, all signs point to YES
    • Sniper-to-scalper pipeline will be prolific based on past experiences
      • I likely wont try hard to get on in any online stores. Maybe I get lucky, but standing in a line just sounds vintage and fun right… RIGHT!
    • 32GB is amazing. I expect very crazy pricing from scalpers.
    • WHY did nvidia price this at 1999 is a fair question. Pricing is something that no matter what will always leave some folks vocally unhappy, so that is not surprising. However they could have gone higher safely as well. Maybe they thought that 2K would help suppress scalpers top side, and maybe it does and I am wrong. I dont think it will. I fully expect to see $5K 5090s on secondaries.
    • Valuation: 24 -> 32 VRAM valuation, +$500. ~25% actual perf uplift valuation, +$400. This is what I was calculating as a likely value + addition to the prior generations 4090 MSRP base. I think the actual value added is in reality closer to a 2400 GPU MSRP price. Without doubt
    • 2 slot FE cards will have AMAZING extra benefit of not covering your slots. This is worth at least $100 extra honestly also. $2500 GPU value at $1999. Let’s hope this FE isn’t a meltdown model.
  • Disregard all hype. Filter it out. It will be an onslaught of it for the next 2 months.

  • Skip the 5080 if it’s 16GB
    • it is for gamers
    • we might get an interesting Super edition in a year
  • 5070ti
    • This is the card you end up getting when you realize in February when the reality of the 5090 unobtainable hits.
    • We could have interesting form factors assuming 5090 really is 2 slot GPU. Could we get 5070ti in 1 slot format?
      • highly unlikely
  • This is similar to Ampere launch which featured top-end VRAM expansion.
    • PCIe 3 –> PCIe4
    • 11GB on 2080ti and then they went to 24GB in 3090
    • Those 2 things gave massive generation uplift! Some might say Tik and Tok cycle similarities but its nuanced.
  • Ampere to ADA was less impressive
    • PCIe 4 –> PCIe4
    • 24GB on 3090 and then they went to 24GB in 4090
    • Killed NVLINK
  • Expect it could be a really long time before we find out about 5060s

MODIFIED 4090 Nvidia Driver to enable P2P mode

https://github.com/tinygrad/open-gpu-kernel-modules

NOTES ON PROJECT DIGITS

This one is more complex. There is a LOT of devil in these details that is dependent on the LPDDR5 that comes in that system specifically around the Bandwidth that it will provide. It is likely at the performance level of SYS RAM and not that of VRAM. The form factor is VERY compelling, but also speaks to the physics of the processing for inference workloads. Hopefully its some balanced slice of high performance nearly that of Unified. It would be likely be DDR7 if that was the case is my first blush take however.

The lauded stat of running a 200B parameter model sounds fantastic and is absolutely achievable in 128GB ram at quant 4, which is the numbers they have been presenting consistently here in, the reality can be pretty slow. Like 7 tokens per second vs 1 token per second is very imaginable as the gulf that may exist here given the Q4 performance we see reported. Nvidia has A LOT of tricks and optimizations they could utilize no doubt on the software side, but current physics still apply. I think this device really challenges Apple at a hardware level. Is this why there are rumors of Apple discontinuing unified memory are around? I thought initially it was likely to push users to offload data to their cloud and prop MRR, but maybe they had an idea this was in the pipeline.

I doubt you will get the ability to connect more then 2 given the 80gbit interconnects capacity. Dash those thoughts I think is a decent idea. RDMA is a wonderful addition which shows they are crafting the stack down to the silicon with care imho.

Either way, at 3K this is the ultimate high end mini-PC desktop box at the moment. I really want one (no I lie, I want 2) but is it worth $3K from a performance per $ standpoint, very unlikely.

They look fucking amazing! Gold computers all of a sudden make the most sense vs silver computers. You get it Apple?

Are you better off getting a 5090 at the MSRP of 1,999.00? That is a really good question that hinges on the bandwidth we have yet to get numbers on.

How Do NVIDIA 50 Series GPUs compare to 40X0 and 30X0?

What does this mean for used 40 series cards?

On Prices – The top end 4090 is going to be under initial price pressure both used and new during the lead up to the reality the 5090 somehow all got snagged by scalpers. Expect a 20 – 35% reduction based on card quality/mfg off current prices… which trends back toward MSRP at currently inflated prices. Hit the hardest.  Feels bad as an owner of 2x 4090s who didnt sell before this announcement but likely even had I, it wouldn’t be enough to buy 1 scalped 5090.

4080 and 4070 non-ti are not typical market for home AI inference due to $/GB VRAM or available VRAM. Basically that is gamers stuff and I have no opinion.

4070ti 16GB is going to be under slight sales pressure until the Feb launch of the 5070ti but likely on new stock most. Expect up to 10% reductions on new stock. Owners really love their 4070ti cards but used markets will be interesting to watch.

4060ti 16GB is a very loved card at the prices they have been recently. It is still one of the cheapest modern rides to 16GB VRAM land also and likely holds fairly firm in its pricing as a result.

What does this mean for used 30 series cards?

3090s will be under some pressure but they are still and will still be the cheapest modernish way to hit 24GB VRAM. I’d expect we see them return to the 800 price point used and refurbished (used with extra words) but for inference they hold a lot of value. We 3090 see gamers sell off their 3090’s is where I think we see supply pressure. Likely singles. Since I own 4 I can say I have no intention of getting rid of mine unless I can somehow snag multiple 5090s are near MSRP, which is a fun dream but not reality.

What does this mean for all other cards?

Inference is in the tokenizer of the beholder. This is why I consistently talk about my expectations for performance when I an running inference workloads. Some things are just too slow for me, other things are faster than I need but thats always fine unless I am paying premium for it. We will see more selloff but small and old can and does work fairly decent if its up to your expectations. We will see normal selling used of the stock which has already been depleting over time.