• 4 Posts
  • 1.11K Comments
Joined 2 years ago
cake
Cake day: March 22nd, 2024

help-circle
  • Vllm is a bit better with parallelization. All the kv cache sits in a single “pool”, and it uses as many slots as will fit. If it gets a bunch of short requests, it does many in parallel. If it gets a long context request, it kinda just does that one.

    You still have to specify a maximum context though, and it is best to set that as low as possible.

    …The catch is it’s quite vram inefficient. But it can split over multiple cards reasonably well, better than llama.cpp can, depending on your PCIe speeds.

    You might try TabbyAPI exl2s as well. It’s very good with parallel calls, thoughts I’m not sure how well it supports MI50s.


    Another thing to tweak is batch size. If you are actually making a bunch of 47K context calls, you can increase the prompt processing batch size a ton to load the MI50 better, and get it to process the prompt faster.


    EDIT: Also, now that I think about it, I’m pretty sure ollama is really dumb with parallelization. Does it even support paged attention batching?

    The llama.cpp server should be much better, eg use less VRAM for each of the “slots” it can utilize.




  • “a lot of people around him did the same.”

    Your friend is in la la land.

    I know a rich couple that would love to do this and 100% can’t because it’d be ludicrously expensive, even with no kids. And that’s in a place way cheaper than New York City.

    …Maybe it was more practical when his parents were working, though?








  • …It’s because Mamdani is not a jerk?


    Look, Trump is a human being in an incredible distortion sphere. His actions/reactions tend to skew more “interpersonal” and emotional than Machiavellian, like how he acts immediately after he meets Zelensky or a lawmaker or sees a bad Tweet or whatever; he was like before politics.

    Step in Trump’s shoes: he sees Mamdani on Fox News, on his Twitter feed, from his circle. This guy is Socialist Satan from his perspective.

    Then the guy walks into Trump’s office, is actually super nice and reasonably balanced, and loves New York too? He’s clearly not an alien lizard like a Clinton.

    I’m not surprised Trump reacted this way at all. It’s totally in character. And if you understand this, you understand why Trump has so much popularity: in spite of the massive toxicity he commands, he’s quite human for a politician.