Why AutoGPT engineers ditched vector databases

109 points by DSemba 3 years ago · 58 comments

Reader

« AutoGPT engineers » seem to also generate their articles with LLM, making their documentation awful to grok. For instance, after showing 2 commands, we have to suffer this: Forge your future! The forge is your innovation lab. All the boilerplate code is already handled, letting you channel all your creativity into building a revolutionary agent. It's more than a starting point, it's a launchpad for your ideas. In our exploration today, we’ve covered the essentials of working with AutoGPT projects. We began by laying out the groundwork, ensuring you have all the right tools in place. From there, we delved into the specifics of building an effective AutoGPT agent. Trust me, with the right steps, it becomes a straightforward process.

The doc is littered of those paragraphs. Remove the fat! Go to the point! YOLO, that’s a freaking waste of life cycles!

amelius 3 years ago

Don't read the intermediate representation. The idea is that you use an LLM to summarize those comments into human readable text.
- penjelly 3 years ago
  
  nobody else finds this ridiculous? That just means when i search for the actual source of truth (when llm eventually hallucinates) that it takes longer to find the answer im looking for. This fluff doesnt add any value for anyone, not even the LLM
- workingjubilee 3 years ago
  
  aw geez, now we have to compile docs, too?!
  - gryfft 3 years ago
    
    "Don't you have a machine that puts food into the mouth and pushes it down?"
    --Khrushchev, satirically inquiring regarding the implausible number of labor-saving devices exhibited on the American side of the Kitchen Debate[1].
    1. https://en.wikipedia.org/wiki/Kitchen_Debate
rowanG077 3 years ago

I doubt it's generated with an LLM. LLMs are much too easy to get to generate much higher quality text then the one you quoted.
- clarionbell 3 years ago
  
  You good LLMs, that's probably not what worked on this.
- shapefrog 3 years ago
  
  I am pretty you can get there by doing a print, scan, OCR and then translate it to and from a foreign language and voila.
- make3 3 years ago
  
  probably generated with a 7B model
uoaei 3 years ago

Ironic that boilerplate code is removed but boilerplate copy is introduced to replace it.

datadrivenangel 3 years ago

Summary: They stopped using vector databases because the performance benefit simply didn't matter compared to how long the LLMs took to respond, and you should focus on using technology to solve problems and not pick the trendy option.

But that never got anyone promoted.

cbsmith 3 years ago

You'd be surprised... I've made a pretty good career of identifying cases where simpler, more efficient designs produce better or equivalent results.
coffeebeqn 3 years ago

So it seems like they still use vectors - they just replaced the search (however that works) with a dot product operation? I mean from a vector point of view that makes total sense
- dartos 3 years ago
  
  Searching in LLM land is finding the cosine similarity of two high dimensional vectors. Vector databases try optimizing that operation.
  Maybe they found search by plain old dot product faster
  - akomtu 3 years ago
    
    Cosine what? Isn't it just dot product?
    
    dartos 3 years ago
    
    There are a few “distance” metrics that are used.
    AFAIK cosine similarity or cosine distance is a common one bc it’s faster than a dot product.
rvz 3 years ago

More like hype-driven development and magpies making decisions based on how shiny the bracelet is.

benterix 3 years ago

Has anyone ever managed to generate anything useful with AutoGPT? I had several attempts and apart from wasting some money for GPT-4 API calls, it's never produced anything usable. Whereas if I manually enter prompts in ChatGPT I can often produce simpler project from the beginning till the end, if I partition them into logically independent parts.

coffeebeqn 3 years ago

No. It always gets stuck on a wrong path and can’t get out
kinlan 3 years ago

Unfortunately I've not. For the tasks that I've used these types of tools on I found ChatGPT's "Advanced Data Analysis" mode significantly more useful.
dartos 3 years ago

Me neither.
penjelly 3 years ago

No.

jondwillis 3 years ago

I have been working on a system of agents over at https://github.com/agi-merge/waggle-dance - I already split problems up into subtasks for agents to work on independently. I give agents access to vector databases, using a simple global key for now, but soon a context/parent/child key. Access to the vector DBs is proxied via tools (agents have to “call” saveMemory or retrieveMemory). I also check for looping/repetition FREQUENTLY using in-memory vector databases of the langchain agent callback events.

My opinion on this: eh, who cares? AutoGPT and similar are non-standard use cases for Vector DBs right now, and Vector DBs are useful for RAG.

jasfi 3 years ago

What's your assessment of the biggest blocking issues for something like this to be practically useful? From what I've seen of AutoGPT things seem to fall apart, in that the goals never quite seem to be achieved once anything more than basic research is requested.
- tudorw 3 years ago
  
  It can do stuff, sort of, like helping me to create ; https://github.com/tudorw/Ai_MegaList/blob/main/AI_Sector_Br...
  After a lot of trial and error, I managed to keep it somewhat on track by using a CSV file, something like;
  "An expert manipulate .csv files, read the first URL from the first line, 2nd column of 'raw.csv', pass the URL to browse_website the questions 'summarize the activities, highlight any investment, funding or patents mentioned', regardless of the results or failure, write the data quote delimited and in columns where appropriate to a new line of 'complete.csv', pass the URL to google and summarize the answer to the question 'does this organisation have a good reputation, from reliable sources', regardless of the results or failure, write the data quote delimited and in columns where appropriate to a new line of 'complete.csv', remove the 1 line from 'raw.csv' you have processed, repeat the process until 'raw.csv' has no more URL in it"
  On the plus side, it was very quick to iterate, 'programming' in words is an exercise in linguistics, it's ability to scrape from any site was impressive, on the downside, it really struggled to stay on task, and even when things seems to be working well, random behaviour was normal, so it might just decide to delete the csv as a short cut...
  On Windows it's ability to engage PowerShell was equally enlightening and terrifying... As an exercise in instructing an AI it was interesting, I'd certainly try again if the requirements fitted.
  I think it's a credit to the team that they explored options for vector storage then retreated in the name of complexity, it's a good reason.
  - andai 3 years ago
    
    Are you referring to Waggle Dance or AutoGPT?
    
    tudorw 3 years ago
    
    AutoGPT

dmezzetti 3 years ago

It doesn't have to be a one or the other choice. For example, txtai supports a number of different ANN backends including a simple NumPy implementation (https://neuml.github.io/txtai/embeddings/configuration/ann/). There is value in the plumbing to vectorize data, normalize embeddings and find matches.

It's a good idea to find a solution that enables starting simple and scaling up as needed without having to fully rewrite the code.

Havoc 3 years ago

That does make sense for now.

Surely though we’re going to see a fairly exponential increase in these requirements though?

The cheaper the compute gets/scales the more sense it makes to hammer problems with more agents/tries so scale of needed agent memory also goes up.

I’d have gone with “it’s already implemented so just leave it be”

kromem 3 years ago

Where vector databases would make more sense for a project like AutoGPT would be in centralizing distributed memory as a 3rd party service.

If a model on one computer could report memories to a centralized service that could be searched by new instances so work didn't need to be replicated, I'd fully expect that 3rd party service to be running a vector DB.

But in reality, the issues of trust and poisoning the well are too pertinent to see enough centralized consolidation of memory to justify it for a project like this.

I've seen some discussions around E2E encrypted LLM chains, and I could definitely see a 3rd party memory layer as part of that, though I'd suppose it would need to be a plug-in at the model provider and not at the client anyways.

Fannon 3 years ago

Wouldn't a vector DB be nice to have, so you can use it directly for search?

I understand the argumentation of the article. But I can imagine that waiting so long for a LLM to react that I would actually prefer to do a search instead on a vector database on my "additional information layer" and find relevant information myself. In that case, having a vector DB would then serve two purposes and that could change the considerations whether it is worth the added complexity.

Not an expert here, just a question that came to mind - it might be based on wrong assumptions.

blackkettle 3 years ago

I think the article is a near miss on the right idea. The important point is that a _dedicated_ vector database is probably overkill and not justified for most real-world use cases.
But a multi-modal database that also supports embeddings in hybrid mode or _in addition_ to standard retrieval techniques is both still very useful, and probably sufficient.
What that means to me is that it is yet another vote in favor of less optimized but far more versatile and robust solutions like: OpenSearch, Elastic, and PostgreSQL. [when I say 'less optimized' I'm only referring to their current vectordb plugins, not the rest of the machinery]
OpenSearch and Postgre are phenonemal, robust, OSS tools and the only lingering downside seems to be that their vectordb implementations are still a bit less optimized for large collections - but that probably doesn't matter in practice.
visarga 3 years ago

The similarity operation is just a dot product, practically a sum over position wise products of two vectors. A for loop with an addition and a multiply inside. That's all you need after embedding. You can use np.dot() to get exact similarity scores and better retrieval with very fast times for under 100K vectors.
You only want the approximate nearest neighbour method for millions of vectors and above. Even that is easy to do with off the shelve libraries for local index. It only gets complicated when you want fast insertion and distributed access.
- Fannon 3 years ago
  
  Ok got it now, thanks!
IanCal 3 years ago

I think that just comes down to speed. The utility is identical, just when you're doing it for one user you can deal with a large problem (for you) in the most naive way possible in a few seconds. I would bet good money that there's a slightly more complex but still very basic step up that would get seconds down to something that feels instantaneous.
montebicyclelo 3 years ago

They do still carry out search, but using a brute force approach (dot product over all the embeddings) instead of vector db. Their point is that they aren't likely to generate enough messages for brute force to be an issue. (Even over 100s of thousands of embeddings brute force is pretty fast.)
- kiviuq 3 years ago
  
  > Even over 100s of thousands
  * number of languages, no?

visarga 3 years ago

I was expecting the authors to talk about superficial encoding of information (words and phrases) as opposed to their meaning or implications. For example the text "The general idea behind Q-learning and SARSA" won't embed the same with "reinforcement". Or "Count the letters in this phrase." won't embed the same with "27".

This can cause RAG systems to fail to retrieve, fail to connect fragmented information or to conclude the result of a process. My theory is that information needs to be digested by a LLM and augmented before indexing in a RAG system. Embeddings are just searching at surface level. That is why I thought AutoGPT had difficulties, one of the reasons at least.

Maybe we can have LLMs preprocess the material to expose the deeper layers of meaning. And we need to reindex everything when we discover we are interested in an aspect we didn't explicitly expose for retrieval. Study, then index, and sometimes study again.

ilaksh 3 years ago

They are still using something like a vector DB when it is appropriate. It's just a very simple version built in to the system.

luke-stanley 3 years ago

In a just world, this would be added to the title before I clicked the link.
pedrovhb 3 years ago

I wouldn't say that. It seems analogous to saying a program uses a simple version of a document-oriented database built in to the system when in truth it's just using dictionaries. Sure, conceptually they do kind of the same thing sometimes and you might even persist it to disk, but it's a bit of a stretch to call it "something like a DB" imo.
omneity 3 years ago

At this point I wonder why they don’t simply use faiss. Insisting on using np is unnecessarily low level here imo.
datadrivenangel 3 years ago

Numpy is also my favorite Vector DB.

sytelus 3 years ago

Is there any implementation of open source vector db that is fast enough to say create embedding of 100M documents locally within few hours and find ranked matches in under a second? I tried ChromaDb and it is super slow, basically unusable.

mirzap 3 years ago

Vector DBs don't create embeddings; they store them. As the article points out, the LLM's slowness to respond diminishes the performance that vector DB's can potentially add.
- singularity2001 3 years ago
  
  You can easily create embeddings locally though, with small (L)LMs. Three lines of code using hugging face. I don't understand the point of this article.
  - mirzap 3 years ago
    
    Look it this way: you have a web app that can handle 10 req/s. It does not matter if you add a database behind it that can handle 10,000 req/s. 10 req/s still limit you. You're not gaining any performance benefits. I always wondered why you would need a dedicated vector DB.
    
    singularity2001 3 years ago
    
    vector DBs have much higher 'semantic' recall than classical search engines if you want to ask questions about your documents or previous discussions.
- luke-stanley 3 years ago
  
  Actually, some Vector DB's can generate embeddings as well as storing them, Chroma in particular uses SentenceTransformers models.
malux85 3 years ago

Generating the embeddings is by far the slowest part of that, but it’s embarrassingly parallel so if you have $ it can be done that quick.
When I worked for Dubai airport, I was tasked with building a vector similarity search that was highly optimised for query speed, in the end I ended up holding the vectors in memory (in a numpy array) and using scipy to do cosine similarity, I could get about 1.2 million vectors per second per core after tweaking and optimising, again this is embarrassingly parallel so if you have more vectors, chunk it to fit your hardware and you should get more or less linear scale with that per core.
If you want a hand writing this let me know.
(Also there’s a lot of caveats here, for example they did not need to update the vectors, it was an extremely read-heavy usage pattern)
- opdahl 3 years ago
  
  Actually if you set it up right it doesn't cost you anything more to do it if it's almost fully parallel. It doesn't matter if you paid for one GPU instance for 500 hours or 50 for 10 hours. The cost is about the same. You can also more confidently use spot pricing to reduce the cost.
opdahl 3 years ago

I do about 100 million embeddings using around 50 GPU instances and feed them into Qdrant. Takes about 12 hours. Very happy with the result and performance as long as you have the option to have a very large memory instance running.
- andre-z 3 years ago
  
  Hello from Qdrant. Would like to hear more about your use case. If not yet connected. https://www.linkedin.com/in/zayarni
  - opdahl 3 years ago
    
    Hey Andre. I've actually gotten a lot of great help talking to your co-founder Andrey on your discord channel and he helped me out a lot with making it work at our scale. Super happy with it and it's working in production with no hiccups.
batmansmk 3 years ago

PostgreSQL will match your requirements. Although you won’t load that fast, just because generating the embeddings take longer than that, independently of your storage engine.

iandanforth 3 years ago

Relevant tweet from Karpathy:

https://twitter.com/karpathy/status/1647374645316968449

archibaldJ 3 years ago

I asked them about this in their discord. they didnt give me straight-forward answers. that’s when I knew I was right from the start that this whole thing was a show

omneity 3 years ago

Is AutoGPT used in a production/productive setting? That would put some perspective on the situations where these insights are applicable.

Settings

Why AutoGPT engineers ditched vector databases

Keyboard Shortcuts