A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you’d need a dual socket EPYC server motherboard with 768GB of RAM.
A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you’d need a dual socket EPYC server motherboard with 768GB of RAM.
It seems you can run a deepseek model very well with a ~400$ GPU. It’s not cheap, but it’s very accessible compared to having 6 gtx 4080s (1500 each) to reach 92gb Vram. Most motherboards will also handle two gpus, so you can get a second of the same later to double your Vram.
It’s not gonna be the full model like in this video but it’s still advanced enough for some tasks apparently. Instead of the 792b model you’ll get like 32 billion parameters (quantized), or 8b parameters in the unquantized.
Actually I tried with my 2019 GPU and I could run a model, it was a bit slow but nothing major. But it was not a huge model either, and you’re also limited in context size for bigger tasks. I think deepseek especially because it’s so efficient is much easier to run. Even the full deepseek model only takes up less than a terabyte of space - of course it’s a lot in absolute numbers, but it’s pretty much what any SSD comes with nowadays (and an HDD that size costs almost nothing now).
The API access is like 50 cents per 1m tokens (so about 1m words) you put into it in the off-hours too. It’s so, so cheap soon api access will probably be too cheap to even keep track of and we’ll see entirely free cloud models.
Those models are the Qwen models finetuned by DeepSeek so no comparisson to the original DeepSeek V3 and R1 really. And considering how much Qwen has been releasing latelly I’d say anyone thinking about running the distilled versions you talked about might as well try the default Qwen ones as well, with Qwen 30B-A3B being very decent for older machines as it is a MoE with only 3B active parameters which can be quite fast and can probably fit Q4_k_m in some 20GBs RAM/VRAM (I can run Qwen3 30B-A3B-Instruct-UD-Q3_K_XL with 16GBs RAM and some offloaded to the SSD with SWAP at 8+ tokens/second).
ooh now I’m stressed I’m gonna have to download and try 20 different models to find out the one I like best haha. Do you know some that are good for coding tasks? I also do design stuff (the AI helps walk through the design thinking process with me), it’s kind of a reasoning/logical task but it’s also highly specialized.
There is a new Qwen3 coder 30B-A3B that looks to be good, and people were talking about GLM4.5 32B, but I haven’t used local models for code much so I can’t give a good answer.
Fwiw i can run the deepseek7b model with RX6600 (~$200, shud be cheaper nowadays) relatively fast, tho i haven’t used it for compiicated tasks.
Sure, but you really have to watch out what kind of hard drives you’re buying. There are a lot of SMR drives out there that are sold as regular drives and the only way to tell is to look through their data sheets. I find that “regular” HDDs (CMR/PMR) cost more now than SMR drives of similar capacity and spindle speed (probably because nobody wants them lol).
SMRs are meant for data storage aren’t they? Which is not to say they can’t write at all, they just don’t have as high speeds.
For AI models specifically the file just lives on the HDD, it gets loaded into Vram (and then CPU and RAM if you don’t have enough Vram for it) when you use it. For everything else then probably yeah, I don’t even know what kind of HDDs I have lol. Seems difficult to find 7200 rpm ones over 5400 but tbh with the prices of SSDs nowadays, I’m probably going to replace my last HDDS with SSDs. If you’re looking for 10tb or huge archival size then it’s probably still worth getting an HDD, but for a 1-2tb drive it makes more sense to go SSD I think.
Well if you’re downloading, copying or creating large models that are several hundred GBs you’re going to want a normal drive. SMRs have a small staging area and once that is full it has to start re-ordering the data on the platters. Once your drive is in the process of re-ordering your write speeds are going to make it look like a failing floppy disk. I had a large file copy operation (>1TB) to a RAID pool of SMRs take like 16 hours. And I also found out that my backup drive is SMR because it took several days to do a full backup from scratch, which caused me to look up its detailed specs.
It always starts out looking great but eventually the staging area will get full and then your CPU will spend most of its time twiddling its thumbs until a chunk of staging area becomes available again; Repeat until operation is done. The greatness of Shitty Magnetic Recording.
If you want to know what recording method your drive uses, you grab the model number from:
# smartctl -a /dev/YOURDRIVE | grep "Device Model"
and look that up in a search engine. It should either lead you to a Data sheet or the manufacturers website where they list the specifications.
If durable and large SSDs were more affordable where I am I’d slowly replace all my spinning rust. But right now HDDs are overall still the better option for me, at least for mass storage.
Oh yeah, I once tried a local small 8B LLM locally too, I can’t remember if it was DeepSeek’s but I think it was, but it was writing at like one token every 2/3 seconds, and after like 5 minutes, seeing the first message not being done yet, I realized it is too much to ask to my poor GT 1030. I also heard about the cheap API, many people were delighted, since ChatGPT’s costed much much more than that. Let’s hope DeepSeek’s API becomes free soon! Even now, I’m assuming you can get days worth of conversation with just 1€.
I estimated that to translate ProleWiki from English to 5 languages (the API charges per input tokens and output tokens, i.e. what you feed it -> english content and what it outputs -> translated content) it would cost us maximum 50$ with deepseek API. ChatGPT is so expensive I didn’t even try, it was going to be in the hundreds of dollars lol. The output per 1M with deepseek is 50 cents in the off-hours (easy, just run your code during the off-hours automatically) and gpt’s is 1.6$ for their “mini” model, which is still 3x as expensive.
There are other chinese models coming along, I think xiaomi is making one. They’re also innovating in image and video generation models but for text models. One of them that came out shortly after deepseek is the one that someone said was too cheap to meter (because it literally uses so little resources to run that it makes no sense to even keep track of usage!), but I haven’t heard more about it since.