I used Claude to match 200 Clinical Trials to 700 PubMed Papers
everyrow.io100% precision on the Claude Code run seems undersold. If you're building a systematic review, a false match (claiming paper X studied trial Y when it didn't) could corrupt your conclusions in ways a miss wouldn't. Would be curious if the authors have a domain reason for weighting recall over precision here.
Personally, I think Claude Code played it a little too safe here, so that's why we didn't put more emphasis on its precision. Note that 100% precision is also easy to achieve in this case: Only match trials with papers that explicitly mention said trials via regex. So clearly we have to pay attention to both precision and recall. We just happened to go with F1 as the more or less canonical measure to take both into account, but I agree that, depending on your use case, you may be interested in other measures of accuracy.