Cynddl
- Karma
- 1,125
- Created
- 12 years ago
Recent Submissions
- 1. ▲ Measuring What Matters: Construct Validity in Large Language Model Benchmarks (arxiv.org)
- 2. ▲ AI Capabilities May Be Overhyped on Bogus Benchmarks, Study Finds (gizmodo.com)
- 3. ▲ AI's capabilities may be exaggerated by flawed tests, according to new study (nbcnews.com)
- 4. ▲ Experts find flaws in tests that check AI safety and effectiveness (theguardian.com)
- 5. ▲ Measuring What Matters: Construct Validity in Large Language Model Benchmarks (oxrml.com)
- 6. ▲ The quiet software tooling Renaissance (pdx.su)
- 7. ▲ Facial recognition works better in the lab than on the street, researchers show (theregister.com)
- 8. ▲ We Shouldn't Trust Facial Recognition's Glowing Test Scores (techpolicy.press)
- 9. ▲ Training language models to be warm and empathetic makes them less reliable (arxiv.org)
- 10. ▲ AI's limited understanding of gender puts health equity at risk (oii.ox.ac.uk)