Gist: Intercom’s AI team introduces Low-Rank Key Value attention, a technique that cuts KV cache memory by roughly 45–53% while improving model performance and training efficiency. The post frames it as a new attention mechanism for Fin’s model architecture.
Signal reason: Primary subject is a new technical capability and model architecture change.
