I don’t really want companies or anyone else deciding what I’m allowed to see or learn. Are there any AI assistants out there that won’t say “sorry, I can’t talk to you about that” if I mention something modern companies don’t want us to see?
I don’t really want companies or anyone else deciding what I’m allowed to see or learn. Are there any AI assistants out there that won’t say “sorry, I can’t talk to you about that” if I mention something modern companies don’t want us to see?
Actually not 100% true, you can offload a portion of the model into ram to save VRAM to save money on a crazy gpu and still run a decent model, it just takes a bit longer. I personally can wait a minute for a detailed answer instead of needing it in 5 seconds but of course YMMV
Is there a general term for the setting that offloads the model into RAM? I’d love to be able to load larger models.
I thought CUDA was supposed to just supposed to treat VRAM and regular RAM as one resource, but that doesn’t seem to be correct.