GitHub - AskYoutubeAI/AskVideos-VideoCLIP

1 min read Original article ↗

AskVideos-VideoCLIP

Joint Video-Text embeddings for search, classification and more.

Open In Colab

AskVideos-VideoCLIP

  • AskVideos-VideoCLIP is a language-grounded video embedding model.
  • This model produces a single context-aware embedding for each video clip.
  • 16 frames are sampled from each video clip to generate a video embedding.
  • The model is trained with contrastive and captioning loss to ground the video embeddings to text.

Pre-trained & Fine-tuned Checkpoints

Checkpoint Link
AskVideos-VideoCLIP-v0.1 link
AskVideos-VideoCLIP-v0.2 link
AskVideos-VideoCLIP-v0.3 link

The demo is also available to run on colab.

Model Colab link
AskVideos-VideoCLIP-v0.1 link
AskVideos-VideoCLIP-v0.2 link

Usage

Environment Preparation

First, install ffmpeg.

apt update
apt install ffmpeg

Then, create a conda environment:

conda create -n askvideosclip python=3.9 
conda activate askvideosclip

Then, install the requiremnts:

pip3 install -U pip
pip3 install -r requirements.txt

How to Run Demo Locally

Star History

Star History Chart

Term of Use

AskVideos code and models are distributed under the Apache 2.0 license.

Acknowledgement

This model is inspired by the Video-LLaMA Video-Qformer model.

Citation

bibtex
@misc{askvideos2024videoclip,
  title        = {AskVideos-VideoCLIP: Language-grounded video embeddings},
  author       = {AskVideos},
  year         = {2024},
  howpublished = {GitHub},
  url          = {https://github.com/AskYoutubeAI/AskVideos-VideoCLIP}
}