Minigpt4 Inference on CPU
github.comI know it's not the main point of this, but... so many multimodal models now that take frozen vision encoders and language decoders and weld them together with a projection layer! I wanna grab the EVA02-CLIP-E image encoder and the Llama-2 33B model and do the same, I bet that'd be fun :D
Qformer isn't necessary just to be clear. Llava is just a projection layer
Not just a projection layer but also Q-former, in this case it was already trained for that specific vision encoder but if you change it you would need to train a Q-former from scratch.
Not for mini gpt-4 but it's just a projection layer for many others(like Llava). The Qformer isn't a necessary part of the equation.
I am not an ML expert. I want to know how to add my own documents without sending them off to a 3rd party.
Use a local LLM like LLaMa 2.
Not heard of minigpt4. Why that name? Is it claiming to be specifically a gpt4 competitor?
MiniGPT-4 is a multimodal model. The name (a bad one, IMO) is a reference to GPT-4's multimodal capability.
They should rename it to ManyGPT.
There's more info here: https://github.com/Vision-CAIR/MiniGPT-4 (Linked in the Readme of the repo.)
Any data on inference speed? I’ve found that the non quantized model was much faster on GPU than the quantized versions due to lower GPU utilization.
It's a RAM tradeoff. If you have enough GPU RAM to load the non-quantized model it may be faster.