Ask HN: GPT-Vision or Llava for Videos

1 points by vanguardanon 2 years ago · 0 comments · 1 min read

I'm interested in a model that can take as input a video and output a caption to describe what is happening in the video. I've looked on huggingface etc. and can only find XCLIP from Microsoft, but that only does video classification. It doesn't write its own caption.

No comments yet.

Settings

Ask HN: GPT-Vision or Llava for Videos

Keyboard Shortcuts