A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you’d need a dual socket EPYC server motherboard with 768GB of RAM.

    • FuckBigTech347@lemmygrad.ml
      link
      fedilink
      arrow-up
      2
      ·
      1 day ago

      DW. in like two years from now, companies will start throwing out similar machines. Just keep an eye on second-hand markets and dumpsters.

  • Kawasaki@lemmygrad.ml
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    2 days ago

    Amazing! People a few months ago were talking about how AI hit a “wall”, but recently it seems like that was never the case, as more and more advanced AI becomes accessible to everyone, even locally with this article (Still a bit pricy$$$, and requires to be a little tech savvy, but still!)

    I would like to hijack this comment though to ask your opinion about AGI, ASI, the Singularity and Fully Automated Luxury (Gay Space) Communism. Do you think that if this event (ASI/Technological Singularity) really were to happen, it would help in our fight for communism or deepen the class inequalities?

    I’m personally pretty positive about it, China seems to be the most likely candidate to achieve this level of technology anyway.

    • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
      link
      fedilink
      arrow-up
      8
      arrow-down
      1
      ·
      2 days ago

      I really like the take from Wang Jian who founded Alibaba Cloud. His view is that AGI is a meaningless term. In practice, it’s a gradient where capabilities of the models continue to improve across different spectrums, and they continue to become more useful.

      In terms of the whole singularity thing, it’s certainly not out of the realm of possibility. For example, stuff like this is already happening where the discovery of better models is becoming automated. The question is where things start to plateau.

      Overall, I’m fairly optimistic as well. I think it’s almost certain that China will drive most of the progress because they have the industries to apply this tech. We already see automation in factories, robots being increasingly used to do manual labour, stuff like self driving trucks, etc. It’s entirely likely that a lot of hard jobs will be automated within a decade or so.

      At the same time, I do expect this tech will have negative consequences in capitalist societies where it will displace labour and drive unemployment. I’d argue that deepening inequalities will necessarily lead to further radicalization of the workers, and would help convince people that capitalism is not sustainable.

    • CriticalResist8@lemmygrad.ml
      link
      fedilink
      arrow-up
      5
      arrow-down
      1
      ·
      edit-2
      2 days ago

      It seems you can run a deepseek model very well with a ~400$ GPU. It’s not cheap, but it’s very accessible compared to having 6 gtx 4080s (1500 each) to reach 92gb Vram. Most motherboards will also handle two gpus, so you can get a second of the same later to double your Vram.

      It’s not gonna be the full model like in this video but it’s still advanced enough for some tasks apparently. Instead of the 792b model you’ll get like 32 billion parameters (quantized), or 8b parameters in the unquantized.

      Actually I tried with my 2019 GPU and I could run a model, it was a bit slow but nothing major. But it was not a huge model either, and you’re also limited in context size for bigger tasks. I think deepseek especially because it’s so efficient is much easier to run. Even the full deepseek model only takes up less than a terabyte of space - of course it’s a lot in absolute numbers, but it’s pretty much what any SSD comes with nowadays (and an HDD that size costs almost nothing now).

      The API access is like 50 cents per 1m tokens (so about 1m words) you put into it in the off-hours too. It’s so, so cheap soon api access will probably be too cheap to even keep track of and we’ll see entirely free cloud models.

      • KrasnaiaZvezda@lemmygrad.ml
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        1 day ago

        It’s not gonna be the full model like in this video but it’s still advanced enough for some tasks apparently.

        Those models are the Qwen models finetuned by DeepSeek so no comparisson to the original DeepSeek V3 and R1 really. And considering how much Qwen has been releasing latelly I’d say anyone thinking about running the distilled versions you talked about might as well try the default Qwen ones as well, with Qwen 30B-A3B being very decent for older machines as it is a MoE with only 3B active parameters which can be quite fast and can probably fit Q4_k_m in some 20GBs RAM/VRAM (I can run Qwen3 30B-A3B-Instruct-UD-Q3_K_XL with 16GBs RAM and some offloaded to the SSD with SWAP at 8+ tokens/second).

        • CriticalResist8@lemmygrad.ml
          link
          fedilink
          arrow-up
          3
          arrow-down
          1
          ·
          20 hours ago

          ooh now I’m stressed I’m gonna have to download and try 20 different models to find out the one I like best haha. Do you know some that are good for coding tasks? I also do design stuff (the AI helps walk through the design thinking process with me), it’s kind of a reasoning/logical task but it’s also highly specialized.

      • FuckBigTech347@lemmygrad.ml
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        edit-2
        23 hours ago

        (and an HDD that size costs almost nothing now)

        Sure, but you really have to watch out what kind of hard drives you’re buying. There are a lot of SMR drives out there that are sold as regular drives and the only way to tell is to look through their data sheets. I find that “regular” HDDs (CMR/PMR) cost more now than SMR drives of similar capacity and spindle speed (probably because nobody wants them lol).

        • CriticalResist8@lemmygrad.ml
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          20 hours ago

          SMRs are meant for data storage aren’t they? Which is not to say they can’t write at all, they just don’t have as high speeds.

          For AI models specifically the file just lives on the HDD, it gets loaded into Vram (and then CPU and RAM if you don’t have enough Vram for it) when you use it. For everything else then probably yeah, I don’t even know what kind of HDDs I have lol. Seems difficult to find 7200 rpm ones over 5400 but tbh with the prices of SSDs nowadays, I’m probably going to replace my last HDDS with SSDs. If you’re looking for 10tb or huge archival size then it’s probably still worth getting an HDD, but for a 1-2tb drive it makes more sense to go SSD I think.

          • FuckBigTech347@lemmygrad.ml
            link
            fedilink
            arrow-up
            1
            ·
            13 hours ago

            Well if you’re downloading, copying or creating large models that are several hundred GBs you’re going to want a normal drive. SMRs have a small staging area and once that is full it has to start re-ordering the data on the platters. Once your drive is in the process of re-ordering your write speeds are going to make it look like a failing floppy disk. I had a large file copy operation (>1TB) to a RAID pool of SMRs take like 16 hours. And I also found out that my backup drive is SMR because it took several days to do a full backup from scratch, which caused me to look up its detailed specs.

            It always starts out looking great but eventually the staging area will get full and then your CPU will spend most of its time twiddling its thumbs until a chunk of staging area becomes available again; Repeat until operation is done. The greatness of Shitty Magnetic Recording.

            If you want to know what recording method your drive uses, you grab the model number from:

            # smartctl -a /dev/YOURDRIVE | grep "Device Model"
            

            and look that up in a search engine. It should either lead you to a Data sheet or the manufacturers website where they list the specifications.

            If durable and large SSDs were more affordable where I am I’d slowly replace all my spinning rust. But right now HDDs are overall still the better option for me, at least for mass storage.

      • 小莱卡@lemmygrad.ml
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        edit-2
        2 days ago

        Fwiw i can run the deepseek7b model with RX6600 (~$200, shud be cheaper nowadays) relatively fast, tho i haven’t used it for compiicated tasks.

      • Kawasaki@lemmygrad.ml
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        2 days ago

        Oh yeah, I once tried a local small 8B LLM locally too, I can’t remember if it was DeepSeek’s but I think it was, but it was writing at like one token every 2/3 seconds, and after like 5 minutes, seeing the first message not being done yet, I realized it is too much to ask to my poor GT 1030. I also heard about the cheap API, many people were delighted, since ChatGPT’s costed much much more than that. Let’s hope DeepSeek’s API becomes free soon! Even now, I’m assuming you can get days worth of conversation with just 1€.

        • CriticalResist8@lemmygrad.ml
          link
          fedilink
          arrow-up
          4
          arrow-down
          1
          ·
          2 days ago

          I estimated that to translate ProleWiki from English to 5 languages (the API charges per input tokens and output tokens, i.e. what you feed it -> english content and what it outputs -> translated content) it would cost us maximum 50$ with deepseek API. ChatGPT is so expensive I didn’t even try, it was going to be in the hundreds of dollars lol. The output per 1M with deepseek is 50 cents in the off-hours (easy, just run your code during the off-hours automatically) and gpt’s is 1.6$ for their “mini” model, which is still 3x as expensive.

          There are other chinese models coming along, I think xiaomi is making one. They’re also innovating in image and video generation models but for text models. One of them that came out shortly after deepseek is the one that someone said was too cheap to meter (because it literally uses so little resources to run that it makes no sense to even keep track of usage!), but I haven’t heard more about it since.

    • queermunist she/her@lemmy.ml
      link
      fedilink
      arrow-up
      5
      arrow-down
      1
      ·
      2 days ago

      The “wall” they’re talking about is orthodox AI not getting better despite feeding it even more data. DeepSeek sidesteps this by making multiple smaller models that can switched for different tasks, instead of the orthodox method of trying to make a “general intelligence” model that works for everything.

      • CriticalResist8@lemmygrad.ml
        link
        fedilink
        arrow-up
        8
        arrow-down
        1
        ·
        2 days ago

        just reasoning alone almost destroyed the western AI industry overnight. They scrambled to make their own reasoning models in less than a week but it changed everything

      • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
        link
        fedilink
        arrow-up
        5
        arrow-down
        1
        ·
        2 days ago

        I think the next big idea could be models dynamically training sub models on demand. There are approaches like HRM being explored that require far less training data and scope of parameters already. Another avenue being explored focuses on creating reusable memory components as seen with MemOS. It blurs the line between training and operational modes, where the model just continuously learns. What we might see is models that create an agent to learn a new task, and then once it’s learned it can be used and shared going forward.

        From what we know, human intelligence is also structured hierarchically, where the brain has regions responsible for different tasks like vision processing, and then there’s a high level reasoning system built on top of that.

  • CriticalResist8@lemmygrad.ml
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    2 days ago

    btw do you recommend running a quantized higher-parameter model (locally) or lower-parameter but not quantized, if I had to pick between the two?

    • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      2 days ago

      I find higher parameter tends to produce better output, but depends on what you’re doing too. For example, for stuff like code generation accuracy is more important. So even a smaller model that’s not quantized might do better. It also depends on the specific model as well.