2025-07-27 11:33:39

So now where this will make sense for inference, we barely fit quantitized Q8 Qwen Coder 3 and Kimi K2 instances on our H200s. Kimi K2 @ Q8 left no room for a kv cache for the context. Could these models fit on a single 8xB200 instance? Probably, we'll try this week.

LL-0.14%

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

10 Likes