Ask HN: Is anyone with AI expertise analyzing JFK files?
The JFK Files, some of which were just released today, are thousands of text-based PDF files. They seem like a really good match for the capabilities of current LLMs.
Is there anyone who's making a serious attempt at extracting data and analyzing the JFK files? Yes: JFK Assassination Files Chatbot Okay this looks pretty good. Also just found https://github.com/amasad/jfk_files although it's only 1000 files or so. Mention a few here, but my intent of writing this was mainly a warning of don't put much weight into any AI analysis. I built one with a graph DB based RAG approach at https://www.jfksearch.ai Exactly what use-case do you think LLMs would help with? They might help people who don't know the exact terms or synonyms they want to search for, but they don't make logical inferences or detect contradictions. If anything there's a risk that they'll inject bias from blogposts and fiction-books and conspiracy-stories present in all the training-data. Yeah, I expect it'd mostly be useful for OCR and search. These are hard to read PDF files and there's a lot of them. I found a few projects related to using AI with The JFK Files but they all seem old or uninteresting. Which is why I'm asking here. Some prior discussion prompted by "Why LLMs Suck at OCR": https://news.ycombinator.com/item?id=42966958 I've tested Gemini 2.0 Flash on a bunch of the JFK Files PDFs and it's excellent. Even with extremely blurry typewriter scans that are difficult for me to decipher. It's incredible. I'm sure there's cases where it will fail but just OCRing 90% of the files would be a big win. I think one useful use-case would be having an LLM compare today's release with what has been released in the past so one could focus on what was actually newly released (or redacted). Yes, and the conclusion is that the JFK files released today are the exact same set released under Biden with a new cover sheet on the package. Is this "conclusion" based a single social media post? I'd expect that at the very least this release includes fewer redactions. But I have yet to see anyone do the analysis.