Proxmox

Local Ai Models on Quadro P2000 – Homelab testing Gemma Ai, Qwen2, Smollm, Phi 3.5, Llama 3.1



Longtime homelab favorite Quadro 2000 is a 5 GB GPU that is still pretty decent and in a lot of home servers already, but how does it handle running local LLMs? This is crazy, but it works and the performance IS NOT what you expect. Must watch if you have P2000 already in your system! I will cover some tips and tricks and possible use cases and tradeoffs for this Nvidia GPU.

Nvidia Quadro P2000 GPU

Additional Home Ai Server Videos
Ai Benchmarks Dual 4090s and 1070ti
Ai Benchmarks 3090s and 3060s + VRAM Testing –
Ollama Ai Home Server Build –
Bare Metal OpenwebUI Ai Server Setup Guide –
Proxmox LXC Docker Ai Setup + GPU Passthrough –
Results Table –

Chapters
0:00 Ai Home Server Low Power GPU
1:10 Ollama Model Shopping
2:08 Microsoft Phi3.5 in 3.7b
6:22 Smollm 1.7b
8:04 Google Gemma Ai 2b
10:06 Low Wattage GPU Ai Models for Homelabs
11:14 Qwen2 7b
13:20 Llama3.1 Q3 8b
15:20 Use Cases Tips and Tricks

Be sure to 👍✅Subscribe✅👍 for more content like this!

Join this channel

Please share this video to help spread the word and drop a comment below with your thoughts or questions. Thanks for watching!

Digital Spaceport Website
🌐

🛒Shop (Channel members get a 3% or 5% discount)
Check out for great deals on hardware and merch.

*****
As an Amazon Associate I earn from qualifying purchases.

When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network.
*****

#digitalspaceport



source

Related Articles

6 Comments

  1. Anything above 10 T/s is amazing when you consider that is 600 T/min or say about 500 Words/min (as if someone can type that fast). Small language models are fun and useful for experimenting locally and privately.

  2. I'm curious, using retro MBs….. ASUS x99 w/PLX switch and an OCed Xeon E5-1660v3 (8 cores, easy 4G)…. that MoBo has the potential to have 6-x8/1-x16 or 4-x16 slots…. start plugging in "cheap" GPUs…. how well would the memory stack up…. the expensive version 4 – 3060 12GB (or more running @ x8)……

  3. I'm using a pair of 3090s for serious AI work but have also tried out the P4000 I have in a server. As you can imagine from these results, it's certainly capable of running smaller LLMs at a decent speed but the accuracy lets it down vs the larger models. Where it does work for me is in creating batches of images with Stable Diffusion/ComfyUI at low power use when I'm busy with other things and don't need them in a hurry.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button