Settings

Theme

Ask HN: GPT-Vision or Llava for Videos

1 points by vanguardanon 2 years ago · 0 comments · 1 min read


I'm interested in a model that can take as input a video and output a caption to describe what is happening in the video. I've looked on huggingface etc. and can only find XCLIP from Microsoft, but that only does video classification. It doesn't write its own caption.

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection