Does Sturgeon's Law Apply to Datasets?
In various fields, we often encounter the principle of Sturgeon's Law, which suggests that "90% of everything is crud."
When it comes to datasets, how much of this holds true?
With the proliferation of LLM are we seeing an overwhelming amount of low-quality, irrelevant datasets?
Curious about HN thoughts on that.
No comments yet.