Settings

Theme

Ask HN: How the AI companies collect data to train models?

1 points by piotrke 2 years ago · 1 comment · 1 min read

Reader

From the Internet, obviously, but how? Are they crawling through every website out there based on the IPs or domain names? Or do they piggyback on Google. Or is there all-internet-data store to just download the latest 'Internet data' dump?

richardjam73 2 years ago

They use datasets like common crawl.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection