Do Language Models Track Entities Across State Changes?

Do Language Models Track Entities Across State Changes?

Abstract

Entity tracking (ET), the ability to keep track of states, is a fundamental skill that underlies complex reasoning. An increasing amount of work investigates how transformer language models (LMs) solve entity binding without state changes; however, there is limited understanding of how non-toy LMs address ET problems of realistic difficulties expressed in natural language. To this end, we investigate the mechanisms underlying ET in more complex scenarios featuring multiple state-changing operations. We find that LMs do not build world states incrementally across tokens or local states across layers, but simply retrieve and aggregate relevant information at the last token when the query becomes evident. We further investigate mechanisms of individual operations (PUT, REMOVE, MOVE) to characterize this non-incremental ET mechanism. Surprisingly, LMs implement the REMOVE operation with a fragile global suppression tag; this global removal mechanism predicts various failure modes that we confirm behaviorally. We provide a mechanistic solution of nullifying this tag to partially address this issue. Overall, our findings reveal that language model solve a fundamentally sequential task using a non-sequential strategy, illustrating how interpretability methods can be leveraged to further insights from behavioral evaluations, from predicting failures to explaining successes.

Grafik Top
Authors
  • Tang, Zilu
  • Zhao, Qiao
  • Franco, Gabriel
  • Wijaya, Derry
  • Mueller, Aaron
  • Schuster, Sebastian
  • Kim, Najoung
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
Forty-Third International Conference on Machine Learning (ICML 2026)
Divisions
Data Mining and Machine Learning
Subjects
Kuenstliche Intelligenz
Sprachverarbeitung
Event Location
Seoul, South Korea
Event Type
Conference
Event Dates
6 Jul - 11 Jul 2026
Date
2026
Export
Grafik Top