I don’t really want companies or anyone else deciding what I’m allowed to see or learn. Are there any AI assistants out there that won’t say “sorry, I can’t talk to you about that” if I mention something modern companies don’t want us to see?

  • SpicyTaint@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    1 day ago

    If you have a good enough NVIDIA card, probably a 1080ti or better, download KoboldCPP and a .gguf model from huggingface and run it locally.

    The quality is directly tied to your GPU’s vram size and how big of a model you can load into it, so don’t expect the same results as an LLM running on a data center. For example, I can load a 20gb gguf model into a 3090 with 24gb of vram.

    • Cease@mander.xyz
      link
      fedilink
      arrow-up
      1
      ·
      11 hours ago

      Actually not 100% true, you can offload a portion of the model into ram to save VRAM to save money on a crazy gpu and still run a decent model, it just takes a bit longer. I personally can wait a minute for a detailed answer instead of needing it in 5 seconds but of course YMMV

      • SpicyTaint@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        6 minutes ago

        Is there a general term for the setting that offloads the model into RAM? I’d love to be able to load larger models.

        I thought CUDA was supposed to just supposed to treat VRAM and regular RAM as one resource, but that doesn’t seem to be correct.