Settings

Theme

Is Bard trained on Gmail? Depends what the meaning of the word “is” is

skiff.com

11 points by purplesnowflake 3 years ago · 4 comments

Reader

smoldesu 3 years ago

> AI researcher Kate Crawford was quick to ask Bard itself where its dataset came from. The answer caught her attention: Bard said one of its data sources was Gmail.

Did they find anything? There's a lot of hand-wringing at the start, then a big focus on how Google can't deny that emails are in their training data. Then they finish by interviewing Bard. Google's response makes sense, given that they're working with multi-terabyte language files. It probably has seen Gmail contents through the form of naturally published emails that just get picked up with other data. Claiming otherwise would be confidently wrong.

It would be interesting if they had a "Q_rsqrt in Copilot" moment here, but they don't. There seems to be no evidence that Google uses private data in Bard.

> Society should be having a robust discussion on these questions, but this is not possible if such discussion is inhibited by key players like Google.

How is Google inhibiting this discussion?

  • streethassle 3 years ago

    > ...there's an impulse to consult Bard on its origin precisely because of the lack of transparency from the real authority on the issue: Google. That we're tempted to probe the language model for substantive answers on matters of public interest merely underlines Google's failure to communicate them on their own.

    > LLMs are incredibly powerful tools that could transform our lives for the better. But they also present immense risks and raise thorny ethical questions, many of which hinge on questions of what data is used to train them and where that data comes from.

    Article's claim is that they're inhibiting it via their lack of transparency on training data

version_five 3 years ago

The whole asking Bard thing towards the end is completely meaningless and I'd argue irresponsible. They even say

  But of course, the observation that Bard consistently makes these claims can’t be seen as evidence one way or the other
and then go on to quote a bunch of stuff Bard said.

If I had to speculate, sounds like it could have used anonymized gmail data (could they have some kind of pii removal tool that they run first, that's common, though I wouldn't trust it too much), or something is being pretrained on gmail and fine tuned on something else (hard to see a reason for that). Anyway, google is acting suspicious, but pretending the chatbot's "opinion" has any bearing is disingenuous.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection