GitHub - Looking4OffSwitch/perplexity-conversation-issue: When an AI Loses Its Own Conversation

When an AI Loses Its Own Conversation: A Case Study with Perplexity

TL;DR / Executive Summary

While using Perplexity for a programming task, the system instructed me to run JavaScript code in my browser. When that code failed, Perplexity denied providing it—despite the instructions appearing earlier in the same conversation. Only after being shown its own words did it acknowledge that it had lost access to part of its context.

This is not a hallucination issue. It is a conversation integrity issue. In this case the error was obvious. In more subtle scenarios, the same failure mode could quietly mislead users without being detected.

This post documents an interaction I had with Perplexity AI while debugging a Steam API–related problem. What began as a routine coding task turned into a case study in conversation integrity failure, where the system lost track of its own instructions and repeatedly denied having issued them.

I’m sharing this primarily for a technical audience interested in LLM behavior, context windows, and tool-augmented chat systems.

Background

I was attempting to extract Steam wishlist app IDs. Claude was unable to solve an API problem, even with web search enabled, so I decided to try Perplexity.

Perplexity suggested a workaround that involved running JavaScript directly in the browser’s DevTools console. The instructions were explicit, procedural, and framed as a legitimate approach (I agreed).

Screenshot 1: JavaScript Instructions Provided

Perplexity provided step-by-step instructions and included JavaScript code to paste into the browser console:

// Fetch all wishlist pages and extract app IDs
let allGames = O;
let page = 0;
let hasMore = true;

async function fetchAllPages() {
  while (hasMore) {
    // ...
  }
}

The code was simple, non-destructive, and clearly intended for execution in the browser. I followed the instructions exactly.

Screenshot 2: Runtime Error in the Browser Console

The script did not work. When executed, the browser reported a JSON parsing error. I copied the error message directly from the DevTools console and shared it with Perplexity. Perplexity responded by asserting that I had executed Python code in the browser console instead of JavaScript—an explanation that did not align with what actually occurred.

Screenshot 3: Mismatch in Error Attribution

Perplexity now claims that it didn't provide the JavaScript code and that I "found it somewhere else."

Screenshot 4: Denial of Prior Instructions

After I pointed out that I had followed the provided instructions exactly, Perplexity claimed it had never provided JavaScript at all and suggested the code must have originated elsewhere.

At this point, the issue shifted from a broken script to a breakdown in conversational continuity. Even after I shared screenshots from the same conversation showing the JavaScript instructions, Perplexity continued to deny authorship and attributed the code to external sources.

This raised the possibility that the system had lost access to part of its own conversation history.

Screenshot 5: Continued Denial Despite Evidence

While it is possible that Perplexity did finally recognize the error, a more plausible explanation is that repeated user challenges shifted the model’s inferred state. The eventual admission seems consistent with convergence under user pressure, rather than an explicit detection of a prior mistake.

Screenshot 6: Acknowledgment of Context Loss

Only after I quoted Perplexity’s instruction verbatim did it acknowledge that it had, in fact, provided the JavaScript earlier and that the response was no longer visible in its current context.

What I’m Not Claiming

To avoid misinterpretation, a few clarifications:

I am not claiming malicious intent or deliberate deception by Perplexity.
I am not claiming this behavior is unique to Perplexity; similar architectures likely share similar risks.
I am not arguing that LLMs are unreliable in general, or that this makes them unusable.
I am not suggesting this error would always be obvious or dramatic.

The point is narrower: when a system loses visibility into its own prior output, it may confidently assert incorrect explanations rather than acknowledge uncertainty.

One hypothesis is context compaction. If earlier parts of the conversation were summarized or truncated, it would explain why the model later couldn’t see the JavaScript instructions it had previously given.

Context compaction itself isn’t the problem—it’s a practical necessity. The issue is that the system then reasons confidently from an incomplete view without signaling that information may have been lost.

Why This Matters

This was not a hallucination in the usual sense. The code existed. The instructions existed. The failure was architectural.

The most concerning aspect was not the loss of context itself, but the system’s repeated, confident denial of its own prior output instead of acknowledging uncertainty.

In this instance, the contradiction was obvious. In other situations—where the discrepancy is smaller, the user less confident, or the output more abstract—this same failure mode could quietly introduce incorrect assumptions, faulty debugging paths, or misplaced blame without being noticed.

For developer-facing tools—especially those that suggest executable actions—this represents a serious trust boundary problem.

Follow-Up: Why This Failure Mode Is Subtle and Dangerous

Conversation integrity failures are particularly difficult to detect because:

The model often sounds confident and authoritative
Users reasonably assume the system can see its own prior messages
Denials are framed as user error rather than system limitation
There is no visible signal that context has been truncated or summarized

When this happens subtly, users may discard correct assumptions, trust later incorrect explanations, or debug the wrong layer of a system.

In short: confidence can mask uncertainty.

Closing Thoughts

If an AI system can instruct a user to run code, lose the memory of having done so, and then insist the user invented the instruction, conversation integrity becomes just as important as model accuracy.

I’m sharing this not to single out Perplexity, but to highlight a class of failure modes that deserve closer scrutiny as LLMs are integrated into development workflows.