Settings

Theme

Cynddl

Karma
1,125
Created
12 years ago

Recent Submissions

  1. 1. Measuring What Matters: Construct Validity in Large Language Model Benchmarks (arxiv.org)
  2. 2. AI Capabilities May Be Overhyped on Bogus Benchmarks, Study Finds (gizmodo.com)
  3. 3. AI's capabilities may be exaggerated by flawed tests, according to new study (nbcnews.com)
  4. 4. Experts find flaws in tests that check AI safety and effectiveness (theguardian.com)
  5. 5. Measuring What Matters: Construct Validity in Large Language Model Benchmarks (oxrml.com)
  6. 6. The quiet software tooling Renaissance (pdx.su)
  7. 7. Facial recognition works better in the lab than on the street, researchers show (theregister.com)
  8. 8. We Shouldn't Trust Facial Recognition's Glowing Test Scores (techpolicy.press)
  9. 9. Training language models to be warm and empathetic makes them less reliable (arxiv.org)
  10. 10. AI's limited understanding of gender puts health equity at risk (oii.ox.ac.uk)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection