Show HN: Summary Cat, a YouTube Video Summary Generator
summarycat.comHello HN!
Please check out Summary Cat (https://www.summarycat.com). It uses OpenAI's GPT-3.5 to summarize YouTube transcripts.
Please note that it only works for
- *English* videos.
- videos that are not too long in length.
I'd appreciate any feedbacks, criticisms, or feature requests!! You can also find my contact info in my profile. Thank you in advance.------------Technical Details---------------
Tech Stack
- Frontend: HTML/CSS
- Backend: Python/Flask
APIs: - For grabbing YouTube's transcripts: I used youtube-transcript-api (https://pypi.org/project/youtube-transcript-api/)
- For summarizing the transcripts: I used OpenAI's GPT-3.5-turbo-16k: https://platform.openai.com/docs/guides/gpt.
- I used GPT-3.5 because GPT-4 is quite a lot more expensive (roughly 10X).
My Prompt (Super Simple!) - "please summarize the following text into a few paragraphs:" + the full transcript.
Thoughts about GPT-4 vs GPT-3.5-Turbo-16k for Summary Cat - GPT-4 was 20% better for "summary quality"
- GPT-4 feels 50% faster
- However, GPT-4 is about 10X as expensive as GPT-3.5
- Winner: GPT-3.5-Turbo-16k Just used this to clear out my watch later list without having to watch anything. Nice!
Only note I have at this time is that it seemed to time out or hang or something on a long video (>2h) -- I'm guessing that there might be limitations to how much transcript you can chuck into GPT, it might be worth throwing an error of some sort in that scenario rather than the forever load
E: Seen you've asked for an example to the other person mentioning this. In my case it was this video https://www.youtube.com/watch?v=hFL6qRIJZ_Y
I think your suspicion might be correct: long videos exceeds GPT token limitation (16,385 tokens in my case of GPT-3.5-turbo-16k).
Thanks for your suggestion about how to address it.
'Clean up your watch later list" is a neat use case that might be worth supporting directly in some way.
Ah ha! Great point!
No worries! Keep up the great work :)
I tested it with two videos, the first one it does the summary quite well: https://youtu.be/Cy-NgpRN1FU, I love how it mentions the dogs name is Ernie, that made me smile :)
But in the second video https://www.youtube.com/watch?v=NBFyvOV7fz8 the app keeps mentioning things like: "The text discusses...", but the content is not a text, it's a video.
Really cool app, it's really quick too!
> the content is not a text, it's a video
To be fair, OP did say that they summarise the YouTube transcript. So OpenAI GPT receives text.
But if OP didn’t do so already maybe they could start the OpenAI system prompt with something like “you are summarising transcripts of YouTube videos” and possibly it could help to make the summary refer to the material as video.
Pretty nice! Very useful idea, especially for videos on my watchlist I never get to because I feel they're too long.
Would love if I could ask follow up questions. Would be awesome to ask "Is X also explained?" and get a little summary back with the timestamp so I can jump to that point in the video.
Also it feels a bit slow and doesn't really give feedback whether it's making progress. That would be a good UX improvement.
How many tokens do you allow per session? I've been thinking about creating a similar app, but I'm a little bit concerned about the unintended costs.
Hello! Thanks for the question. I do not myself restrict tokens/session. The model I am using GPT-3.5-Turbo-16k (https://platform.openai.com/docs/models/gpt-3-5), allows max 16,385 tokens in total per input/output.
So far, I found that each ~10 minute video uses around 1000 tokens. It costs me about 3 cents to summarize, which is not too bad as I don't have many users, and users haven't been requesting summaries for super long videos yet.
If this sites gets a lot of interest, I might start restricting something :)..
GPT 3.5 Pricing: https://openai.com/pricing
Awesome work! I used it to summarize an hour long podcast I had been meaning to watch and it worked fabulously. What's amazing is that the transcript is auto-generated and of a conversation between two individuals without any indication of who's actually speaking. Yet GPT-3.5 is able to make sense of it.
Out of curiosity I downloaded the transcript myself with `youtube_transcript_api --format text` and counted the tokens via ttok [0], it was a tad over 16k. So what does your site do in that case? Is the transcript truncated?
Two videos that give a 500 internal server error in the Network tab and an infinite spinner:
https://www.youtube.com/watch?v=GuiTN4tOBr4 (edit: this has no captions so maybe it's expected, but a proper error would be better)
https://www.youtube.com/watch?v=iShzzAK9zxk (edit: this may be because I marked the subtitles as UK English)
So it must have captions for videos for this to work?
Not bad - fed in one of my videos and it’s surprisingly readable. Are you using a particular prompt? Would you be willing to share it?
More than happy to share!
----------- My prompt is super simple. It is "please summarize the following text into a few paragraphs:" + the full transcript. -----------
Seriously that's it!
Oh boy, wait until the "Prompt Engineers" get a hold of this one.
Sorry, could you please explain what you mean? I am not really quite getting it. What might happen if a Prompt Engineer get a hold of it?
GPT-4's answer: StevenNunez is humorously suggesting that "Prompt Engineers," who are people skilled in crafting effective prompts for AI models, might find bing_dai's simple prompt too basic. They may propose complex and intricately designed prompts to enhance the output quality or add more context to it.
I think GPT-4 didn't quite get it but StevenNunez is making fun of the overly complex prompts people sometimes use (which, to be fair, was more important before instruct and chat tuning)
Seems to be an arms race between youtube forcing creators to make videos 8 mins long min to be able to get mid roll ads and people coming up with ways to summarize the transcript.
Idea for the future: Use the summarize to re-cut the videos to the most important parts. Like a super to the point tiktok style video that is nothing but dopamine being injected into your veins. There seems to already be "auto podcast clipper ai agents" out there but nothing for consumers to use. those are more video editor adjacent. If anyone wants to work on something like this, lemme know.
I agree that this is happening " an arms race between youtube forcing creators to make videos 8 mins long min to be able to get mid roll ads and people coming up with ways to summarize the transcript." Along the same line: I have been thinking about how my Summary Cat might mean for the content creators. How would it impact their income?
Your "use the summarizer to re-cut the videos" is fantastic!
Couldn't be worse than the adblock users right now.
Plugged in this meme video and it gave me the "As a AI I can't...": https://www.youtube.com/watch?v=NlZzftmtGJY
Are you using celery for your async workers? Cool project!
It hang on non - english video. I tried this one: https://youtu.be/B4kRwlHTcLM?si=3kp3pvQ4M4l6eRTT Otherwise, brilliant
It literally says in the original post that it only works for English videos.
Why though? From a user perspective, it should return a 5xx, not hang endlessly
You are right. The site doesn't handle non-English videos, but that is on the roadmap. Thanks so much!
Interesting, very cool!
However, how does it do on videos where there's not a lot of speaking? Any plans to do actual video (image) processing?
Thanks for the question... Any sample videos you are thinking of?
Summary Cat doesn't work for videos where there's not a lot of speaking. I am hoping to build a bit more on text-rich videos first, so I do not plan to do actual video (image) processing any time soon.
I will keep that in mind!!
I am thinking about music videos, where the lyrics don't describe the video necessarily. For instance, I am curious about what exactly is the story in this music video: https://www.youtube.com/watch?v=pruKV1chnHA&ab_channel
It's hanging for everything I try.
I suggest a progress bar rather than a spinny thingy. Give the user some sense that a conclusion is on the horizon.
From my own experiments, I think you'll get better summaries with a prompt like "This is a transcription of a youtube video. Please etc etc etc". Context seems to help.
I tried to do something similar, but I could only get transcripts for videos with transcript files attached, which isnt a huge number of videos. How did you get around this?
Hi, I used this Python library (https://pypi.org/project/youtube-transcript-api/) to get transcripts. It works great.
Is GPT-4 performance better enough paying would be worth it?
Edit: Thank you!
Hello, GPT-4 is not worth it in my experience so far!
I would say, GPT-4 is - 20% better at "summary quality" - feels 50% faster - BUT, 10X as expensive.
So using GPT-3.5 was the right choice for me at this point.
Looks great, it gave a quick response. Are you putting the whole transcript in context? Have you encountered issues with transcripts that are too large?
Giving it either a long (2.5hr) video, or a non-youtube URL (e.g. an invidious link) appears to leave it spinning forever - no error message.
If you don't mind, could you share an example of such long video? I'd love to debug it.
As for non-Youtube URL: I indeed do not handle that error right now :D. Thanks so much for bringing it up!
This is the 2.5hr video i tried: https://www.youtube.com/watch?v=JGIGA8taN-M
This 1hr video works (but I note you don't seem to be caching the output?) https://www.youtube.com/watch?v=0s9fpFPAC94
This <1hr video seemed to crash the system: https://www.youtube.com/watch?v=VV949D8AUKU
That one works for me.
This tool finally makes Thunderf00t videos watchable, I mean readable, without wasting 30 minutes. Thanks.
It should always append the message "This could have been a blog post" to everything it summarizes.
A well made video can be far better than a blog post.
Would be nice to add a textarea to give it more specific instructions or to change the summarization prompt.
Also try www.askYouTube.ai for q&a across multiple videos!
Nice!
For those interested in comparing, https://www.summarize.tech/ also builds summaries from YouTube videos but includes an overview, then a summary of each 5 min segment
Holy wow, this is FAST. I wonder if both videos I used were cached. How do they do it so fast?
Totally missed what this was supposed to do and tried to get a summary of a video discussing some music with captions. Got back garbage. Thought it might process the text from the frames. Shrug. Good idea for the use case you intended tho!
The off centre spinning wheel bothers me too much.
That's so evil. Please correct the alignment!