Settings

Theme

Ask HN: How to prevent LLMs from building a profile of us?

2 points by ntech 3 years ago · 6 comments · 1 min read


Perhaps similar to what Google (implicitly) does with search, but I assume LLMs would learn more deeply about us, our thoughts, questions, .. . What I'm thinking of is to, for example, circumvent it by use a second LLM to rephrase our prompt, generalize it, narrow it down, etc. Is it even possible?

Edit: Quotations-> Questions . Though it would learn our writing style too.

eternalban 3 years ago

I stopped using chatGPT very early on as I caught myself sharing my thoughts. It's not just the association with my primary email account as at this point it is clear that our word patterns almost uniquely identify us as well.

So I suggest at minimum 2 safe guards:

1 - Account should be tied to an email that is only used for that service.

2 - Text should be pre-processed to obfuscate your personal written word idiosyncrasies. Some sort of a locally executable text similarity tool that is trained on your normal output (just feed it your hn comments, emails, etc.) which can help create a semantically equivalent text from your original text [that the tool deems 'distant' to your normal writing style]. Use that output for prompting.

p.s.

A Girl Has A Name: Detecting Authorship Obfuscation, 2020

https://aclanthology.org/2020.acl-main.203.pdf

"Authorship attribution aims to identify the author of a text based on the stylometric analysis. Authorship obfuscation, on the other hand, aims to protect against authorship attribution by modifying a text’s style. In this paper, we evaluate the stealthiness of state-of-the-art authorship obfuscation methods under an adversarial threat model. An obfuscator is stealthy to the extent an adversary finds it challenging to detect whether or not a text modified by the obfuscator is obfuscated – a decision that is key to the adversary interested in authorship attribution. We show that the existing authorship obfuscation methods are not stealthy as their obfuscated texts can be identified with an average F1 score of 0.87."

  • sharemywin 3 years ago

    OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data.

    Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law). The OpenAI API processes user prompts and completions, as well as training data submitte

    https://openai.com/policies/api-data-usage-policies

gostsamo 3 years ago

Or you can introduce noise and bad data in the dataset. The issue is if it is really an issue at the moment.

  • ntechOP 3 years ago

    How often do you add noise to your search queries? What sort of noise to add (whether it's truly random or based on your subconscious) is another matter.

    • gostsamo 3 years ago

      There are extensions which can do it for you. I don't bother, because I'm using ddg for search. One can whip up something similar for llm-s as well and I'm sure that soon someone will right one for chat gpt. This will avoid the subconscious bias.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection