• 4 Posts
  • 843 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle
  • When Foreign Companies who are building extremely complex products, machines, and various other ‘things,’ come into the United States with massive Investments, I want them to bring their people of expertise for a period of time to teach and train our people how to make these very unique and complex products, as they phase out of our Country, and back into their land," the president posted on Truth Social, adding “I don’t want to frighten off or disincentivize Investment into America by outside Countries or Companies. We welcome them, we welcome their employees, and we are willing to proudly say we will learn from them.”

    You know what’s missing there?

    An apology.












  • And any divergence from that is “ruining games” or “being woke” to the point that we don’t even GET those games outside of the rare case of a game nobody cared about becoming popular

    I would argue the origin is sales. E.G. the publisher wants the sex appeal to sell, so that’s what they put in the game. Early ‘bro’ devs may be a part of this, but the directive from up top is the crux of it.

    And that got so normalized, it became what gamers expect. And now they whine like toddlers when anyone tries to change it, but that just happens to be an existing problem conservative movements jumped on after the fact.


    TL;DR the root cause is billionares.

    Like aways.



  • Yeah. But it also messes stuff up from the llama.cpp baseline, and hides or doesn’t support some features/optimizations, and definitely doesn’t support the more efficient iq_k quants of ik_llama.cpp and its specialzied MoE offloading.

    And that’s not even getting into the various controversies around ollama (like broken GGUFs or indications they’re going closed source in some form).

    …It just depends on how much performance you want to squeeze out, and how much time you want to spend on the endeavor. Small LLMs are kinda marginal though, so IMO its important if you really want to try; otherwise one is probably better off spending a few bucks on an API that doesn’t log requests.






  • At risk of getting more technical, ik_llama.cpp has a good built in webui:

    https://github.com/ikawrakow/ik_llama.cpp/

    Getting more technical, its also way better than ollama. You can run models way smarter than ollama can on the same hardware.

    For reference, I’m running GLM-4 (667 GB of raw weights) on a single RTX 3090/Ryzen gaming rig, at reading speed, with pretty low quantization distortion.

    And if you want a ‘look this up on the internet for me’ assistant (which you need for them to be truly useful), you need another docker project as well.

    …That’s just how LLM self hosting is now. It’s simply too hardware intense and ad hoc to be easy and smart and cheap. You can indeed host a small ‘default’ LLM without much tinkering, but its going to be pretty dumb, and pretty slow on ollama defaults.