I have one server with a cheap MI50 instinct. Those come for really cheap on eBay. And it’s got really good memory bandwidth with HBM2. They worked ok with ollama until recently when they dropped support for some weird reason but a lot of other software still works fine. Also older models work fine on old ollama.
The other one runs an RTX 3060 12GB. I use this for models that only work on nvidia like whisper speech recognition.
I tend to use the same models for everything so I don’t have the delay of loading the model. Mainly uncensored ones so it doesn’t choke when someone says something slightly sexual. I’m in some very open communities so standard models are pretty useless with all their prudeness.
For frontend i use OpenWebUI and i also run stuff directly against the models like scripts.
I have one server with a cheap MI50 instinct. Those come for really cheap on eBay. And it’s got really good memory bandwidth with HBM2. They worked ok with ollama until recently when they dropped support for some weird reason but a lot of other software still works fine. Also older models work fine on old ollama.
The other one runs an RTX 3060 12GB. I use this for models that only work on nvidia like whisper speech recognition.
I tend to use the same models for everything so I don’t have the delay of loading the model. Mainly uncensored ones so it doesn’t choke when someone says something slightly sexual. I’m in some very open communities so standard models are pretty useless with all their prudeness.
For frontend i use OpenWebUI and i also run stuff directly against the models like scripts.
This is the way.
…Except for ollama. It’s starting to enshittify and I would not recommend it.