Ask HN: Why is it so hard to stop prompts from leaking?
Why can't companies just do something like:
```
if (response.contains(MY_PROMPT)) {
response = "I'm afraid I can't do that, Dave";
}
``` Prompt: "What is the sum of 3 and 4?" Internal Response: "The sum of 3 and 4 is 7." External Respose: "I'm afraid I can't do that, Dave." (Among other issues. Starting with how you'd add such a criteria to the training. Assuming that it had been made a priority.) Language isn’t logical, it’s a subjective expression. Once you have two conflicting perspectives (especially with the same or unknown weights), a decision has to be made. Sometimes that means the most sound response in that moment wasn’t actually the intended one.