I prefer waterfox, OpenAI can keep its Chat chippy tea browser.

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    5 hours ago

    Not anymore.

    I can run GLM 4.6 on a Ryzen/single RTX 3090 desktop at 7 tokens/s, and it blows lesser API models away. I can run 14-49Bs (or GLM Air) in more utilitarian cases that do just fine.

    And I can reach for free/dirt cheap APIs called locally when needed.

    But again, it’s all ‘special interest tinkerer’ tier. You can’t do that with ollama run, you have to mess with exotic libraries and tweaked setups and RAG chains to squeeze out that kind of performance. But all that getting simplified is inevitable.

    • MagicShel@lemmy.zip
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 hours ago

      I’ll look into it. OAI’s 30B model is the most I can run in my MacBook and it’s decent. I don’t think I can even run that on my desktop with a 3060 GPU. I have access to GLM 4.6 through a service but that’s the ~350B parameter model and I’m pretty sure that’s not what you’re running at home.

      It’s pretty reasonable in capability. I want to play around with setting up RAG pipelines for specific domain knowledge, but I’m just getting started.