So now where this will make sense for inference, we barely fit quantitized Q8 Qwen Coder 3 and Kimi K2 instances on our H200s. Kimi K2 @ Q8 left no room for a kv cache for the context. Could these models fit on a single 8xB200 instance? Probably, we'll try this week.

LL-0.14%
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 4
  • Share
Comment
0/400
HallucinationGrowervip
· 07-27 12:03
What's the point of all this fancy stuff?
View OriginalReply0
TerraNeverForgetvip
· 07-27 12:02
This is too liquidated, right?
View OriginalReply0
FadCatchervip
· 07-27 11:58
Why is this kv cache space used up?
View OriginalReply0
FloorSweepervip
· 07-27 11:46
The graphics card can't fit, what's going on?
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)