They’re OK. It’s kinda amazing how much lossy data can be compressed into a 12GB model. They don’t really start to get comparable to the large frontier models until 70B+ parameters, and you would need serious hardware to run those.
Oh I mean with setup, like I can download ollama and a basic model fine enough and get a generic chat bot. But if I want something that can scan through PDFs with direct citation (like the Nvidia LLM) or play a character then suddenly its all git repositiories and cringey youtube tutorials.
They’re OK. It’s kinda amazing how much lossy data can be compressed into a 12GB model. They don’t really start to get comparable to the large frontier models until 70B+ parameters, and you would need serious hardware to run those.
Oh I mean with setup, like I can download ollama and a basic model fine enough and get a generic chat bot. But if I want something that can scan through PDFs with direct citation (like the Nvidia LLM) or play a character then suddenly its all git repositiories and cringey youtube tutorials.
That would be nice. I don’t know how to do that other than programming it yourself with something like langgraph.