AI That Can’t Forget: How Memory Limits Affect AI Responses

Many of us have experienced something like this.

You use an AI system to help prepare a strategic project for the launch of a new medication.

At the beginning of the conversation, you provide the following information:

“The newly approved indication is for patients at high cardiovascular risk.”

Later, you update it:

“Actually, the recommendation now includes patients at moderate risk as well.”

Finally, you ask the system:

“What is the current indication for the drug?”

But the AI responds with the older information — even though the updated information appears immediately before the question.

At first glance, this may seem like a minor glitch. However, recent research suggests this behavior is not an occasional bug. Instead, it reveals a structural limitation in the memory mechanisms of modern large language models (LLMs).

A study published in 2025 by Wang and Sun investigated this exact phenomenon. Their findings show that advanced language models often struggle to “forget” outdated information, even when they receive clear updates in the conversation.1

This effect is known as PROACTIVE INTERFERENCE.

And it can directly affect applications such as:

  • AI copilots used in marketing strategy
  • clinical data analysis
  • scientific content generation
  • AI assistants integrated into CRM or omnichannel platforms

 

1. Working Memory Limits in Large Language Models

To understand why this happens, we need to examine a fundamental challenge in modern language models: working memory limitations.

The temporary memory that a model can use while generating a response is measured in tokens. Even as context windows grow larger — 32k, 128k, or even more than one million tokens — models still make errors when retrieving information that is clearly present in the conversation.

The Wang and Sun study suggests that the real limitation may not simply be context length, but rather something called PROACTIVE INTERFERENCE.1

This phenomenon has long been studied in human cognitive psychology, and now evidence suggests it also occurs in Transformer-based* language models.

*Transformer models are a neural network architecture introduced in 2017 that revolutionized natural language processing (NLP). Their key mechanism — self-attention — allows models to evaluate relationships between words across long sequences of text. This architecture forms the backbone of most modern LLMs.

 

2. What Researchers Previously Believed

Historically, errors in language model memory were explained in a simpler way. The prevailing assumption was that models failed because they could not locate the relevant information within very long contexts. This type of evaluation is often referred to as the “needle in a haystack” problem.

In these tests:

  • a critical piece of information is placed early in a long document
  • the model must retrieve it later

If the model failed, researchers concluded that it could not effectively search long contexts.

But Wang and Sun proposed a different hypothesis. Perhaps the issue is not finding information, but rather managing multiple similar pieces of information that compete with each other.

 

3. Proactive Interference

In cognitive psychology, proactive interference occurs when older memories interfere with the retrieval of newer ones.

A familiar example: imagine you’ve changed your password several times. When trying to remember the current password, your brain often retrieves one of the older versions instead.

This happens because multiple memories compete with one another.

The researchers asked a simple question: Could something similar happen inside language models?

 

4. The Experiment

To test this hypothesis, the researchers created a benchmark called PI-LLM. The setup is simple. The model receives a sequence of updates about the same attribute.

Example:

city = Paris
city = London
city = Berlin
city = Tokyo

The model is then asked: “What is the current city?”

The correct answer is Tokyo, because it is the most recent update.

Here is the key detail: the correct information is placed immediately before the question. This means the model does not need to search through the context.

If the model makes a mistake, the problem is not distance — it is interference from earlier information.

 

5. What Happens When Interference Increases

Researchers gradually increased the number of previous updates. For example: 3 updates, 10 updates, 50 updates, 100 updates.

The results were striking. Model accuracy declines steadily as the number of competing updates increases. Even more interesting, the decline follows an approximately log-linear curve. Each additional competing update makes it progressively harder for the model to retrieve the correct information.

Figure 1 illustrates this effect. Each line represents a different language model. The horizontal axis shows the number of updates competing for the same information. The vertical axis shows the probability of retrieving the most recent update.

 

Figure 1. Relationship between the number of competing updates and the accuracy with which large language models retrieve the most recent information. Each line (m1–m6) represents a different model evaluated in the study. As the number of successive updates associated with the same key increases, the models’ ability to retrieve the latest value declines progressively, following an approximately log-linear pattern. This result highlights how proactive interference—well known in human psychology—also affects the working memory of modern LLMs, causing earlier information to compete with more recent updates. Adapted from Wang & Sun (2025).1

 

As interference grows, accuracy drops significantly — demonstrating the impact of proactive interference on LLM working memory.

In many cases, the model returns an older value, such as: “city = London”, even though the most recent update is Tokyo.

This pattern closely resembles what researchers observe in human memory experiments.

 

6. Do Larger Models Solve the Problem?

The researchers evaluated models of different sizes, from small models to very large ones.

Larger models do show better initial performance.

However, the same accuracy decline still occurs as interference increases.

This suggests the limitation is not only about memory size.

Instead, it may be related to the core architecture of Transformer models.

 

7. Why This Happens in Transformer Models

In Transformers, each token can attend to many other tokens in the context through the self-attention mechanism.

When multiple similar pieces of information appear in the context, they all continue competing for attention. Crucially, the model lacks a built-in mechanism to actively suppress or erase outdated information.

As a result, earlier representations can continue influencing the final output.

In simple terms: The model remembers too much — but struggles to forget.

 

8. Implications for AI Applications

This phenomenon has important implications for real-world AI systems.

For example:

  • Long conversational agents

When many updates about the same topic appear during a conversation, the model may retrieve outdated information.

  • Coding assistants

If a variable is redefined multiple times, the model may rely on an older value.

  • RAG systems

When multiple similar documents are retrieved, semantic competition may occur.

*RAG: Retrieval-Augmented Generation. RAG systems are AI architectures that combine a Large Language Model (LLM) with an information retrieval system—such as internal databases, document repositories, or the web. The goal is to: (1) retrieve relevant information from a knowledge base; (2) enrich the user’s query and the model’s context with this retrieved data; and (3) enable the LLM to generate responses that are more accurate, evidence-based, and grounded in reliable sources—helping reduce the risk of “hallucinations,” or answers that are incorrect or fabricated.

In other words, simply increasing context size does not fully solve the memory problem.

 

9. Why This Matters for AI in Healthcare and Pharma

For teams deploying AI across marketing, medical affairs, analytics, or omnichannel engagement, this limitation is important to understand.

As organizations increasingly rely on AI for:

  • scientific content generation
  • marketing strategy support
  • medical communication
  • data analysis

understanding how AI systems handle competing information becomes critical.

Improving memory robustness in language models will be essential for building more reliable AI tools in healthcare and life sciences.

 

References:

  1. Wang C, Sun JV. Proactive Interference Reveals Working Memory Limits in LLMs Beyond Context Length. 2025 [cited 2026 Mar 12]; Available from: https://openreview.net/pdf?id=y8jS7mDurI

How to cite this article:

KACHI. IA que não consegue esquecer: como isso interfere nas respostas?. São Paulo: KACHI Comunicação Científica, 12/março/2026. Disponível em: https://www.kachi.com.br/blog/ .