Hi all, i am quite an old fart, so i just recently got excited about self hosting an AI, some LLM…
What i want to do is:
- chat with it
- eventually integrate it into other services, where needed
I read about OLLAMA, but it’s all unclear to me.
Where do i start, preferably with containers (but “bare metal”) is also fine?
(i already have a linux server rig with all the good stuff on it, from immich to forjeio to the arrs and more, reverse proxy, Wireguard and the works, i am looking for input on AI/LLM, what to self host and such, not general selfhosting hints)
There’s another community for this: !localllama@sh.itjust.works
Though we mostly discuss the news and specific questions there, beginner questions are a bit more rare.I think you already got a lot of good answers here, LMStudio, OpenWebUI, LocalAI…
I’d like to add KoboldCpp that’s kind of made for gaming/dialogue, but it can do everything. And from my experience it’s very easy to set up and bundles everything into one program.You can host ollama and open-webui on container. If you want to wire search you can connect open-webui to playwright (also container) and searxng (also container) and llm will search the web for answers
I just started using https://lemonade-server.ai/
It has so far been pretty effortless and would be good if you are new to selfhosting
I read about OLLAMA, but it’s all unclear to me.
There’s really nothing more to it than the initial instructions tell you. Literally just a “curl -fsSL https://ollama.com/install.sh | sh”. Then you’re just a “ollama run qwen3:14b” away from having a chat with the model in your terminal.
That’s the “chat with it”-part done.After that you can make it more involved by serving the model via API, manually adding .gguf quantizations (usually smaller or special-purpose modified bootleg versions of big published models) to your Ollama library with a modelcard, ditching Ollama altogether for a different environment or, the big upgrade, giving your chats a shiny frontend in the form of Open-WebUI.
Openwebui is awesome and allows u to use it as an api for all the models u have it hooked up to. Can point it at ollama or any openai api compatible endpoint (like open routers)
Sounds like you already know what you need to know to host Ollama in a Docker container. Ollama is an LLM “engine” - you can interact with LLM models via a CLI or you can integrate them into other services via an API.
To have a web page chat like ChatGPT or others, I installed OpenWebU. I love it! A friend of mine likes LMStudio, which i think is a desktop app, but I don’t know anything about it.
+1 LM Studio, so easy to use and so powerful
LM Studio is a beast.
One of these projects might be of interest to you:
Do note that CPU inference is quite a lot slower than GPU or the well known SAAS providers. I currently like the quantized deepseek models as the best balance between quality of replies and inference time when not using GPU.
there’s a good tutorial to host ollama and a vector database here