Ask HN: Why Isn't Anyone Building a Global Brain for LLMs?
LLMs thrive on high-quality data. While features like web search, file uploads, and integrations with diverse data sources are useful, there seems to be no system for sharing and accessing information between LLM users.
Take my case as an example: I lend money to businesses. Ideally, if someone asks ChatGPT a question like "How can I get B2B financing?" I’d want them to discover me and then be able to ask follow-up questions about financing options, eligibility criteria, required documents, and the process. From there, I could step in and provide personalized assistance.
While web search exists, the internet wasn’t built for the LLM era, and appearing in search results isn’t guaranteed. I’ve also thought about creating an AI agent, but discoverability remains the issue.
Wouldn't a new information system tailored for LLMs be valuable? Imagine an open ecosystem where structured and unstructured data can be shared and accessed by LLMs on behalf of users, powering apps and tools that unlock opportunities and save time.
Am I missing something here, or does this idea have merit? How would you build this (which costs both infra and human time to build and operate for the collection of knowledge) to only have every commercial entity soak it up for private gain? It would be swank to have something like Wikipedia, Stack Overflow, etc where you could compound returns on teaching the system, but someone has to pay for it, and lots of folks don't want their work (free or otherwise) to just go into Big Tech or AI Startup valuations and returns. Novel training data is precious, and democratization of generative AI outcomes is not a function of time, but of effort and resources. I was considering creating a data cooperative - democratically owned and governed by its members. To ensure sustainability, revenue could come from AI companies and other stakeholders paying for data access, much like how news publishers are compensated today. What do you think? Solid foundation, you could be a better version of what Reddit is trying to do. I wish you success. Look for instance at MCP servers https://github.com/modelcontextprotocol/servers Commercial price/deal servers I suspect are just around the corner. > Commercial price/deal servers I suspect are just around the corner. I agree. I think there's going to be a big trend towards paid APIs and paid access to data. Wow, thank you so much! I came across the Model Context Protocol framework but didn't take the time to explore it in depth. And right now Agent.ai is gaining traction (500k+ users) The question is for whom that idea has merit. For sales and marketing folks, yeah, absolutely they would love to be able to just push their marketing spend towards something that would buy LLM users' eyeballs. For everyone else, this sounds like just a new iteration of unwanted ads in our face, and would probably breed the same level of resentment, annoyance, and ultimately blockers to avoid it all. I completely agree with you. Establishing clear guidelines on how data is used and who can access it would be essential. This concept has value beyond just sales and marketing. For instance, if someone is seeking funding today, where can they find a comprehensive list of potential funders, determine their eligibility, understand the application process, and know what documents are required? Of course, spam could be an issue. To maintain a trustworthy and reliable environment, a robust reputation system—similar to Google's PageRank could evaluate individuals and their contributions. It would replicate the way we naturally seek advice from trusted individuals but on a global, digital scale. What do you think? Love this! I wrote a post just recently you might like: Show HN: Comind – A cognitive layer for the ATProtocol/Bluesky social network | https://news.ycombinator.com/item?id=42975350 I'm confused why search + LLM wouldn't work for this? The issue isn't search or LLMs, it's the data sources. I believe we can create something better than the traditional web for sharing and accessing information in the LLM era. What do you think? Hmm i think the issue with any large data set is indexing + querying. At https://usefind.ai/ this is a fundamental problem we've been trying to tackle. What kind of structure do you think would work? You’re right, did you check CircleMind? Could it be useful in your case? https://www.ycombinator.com/companies/circlemind I checked your project, that’s really good and I saw that you’re working on Q&A that will be very useful! Hmm i'll check out the pagerank stuff, IMO RAG isn't super great. I think RAG overall has been oversold since embeddings KNN hasn't proven to be super accurate.