Hacking YouTube with a MP4
realkeyboardwarrior.github.ioIf you've played around with video formats long enough, you'll have seen something like this. This is the basis for most speed change "filters". Only the high end ones do any kind of pixel based motion estimation so that super slo-mo does not look like a slide show.
Also, it's not uncommon to get odd frame rates in the containers. Even on things as "innocent" as listing the frame rate as 29.97 vs 30000/1001 will affect timing (depending on usage). The variations on 23.976 is fun too: 24000/10001. 2997/125.
The muxer is an important step. When using software decoders, things can be a lot more flexible. Back when shiny round discs were popular, there were verifiers that ensured your muxed data was correct. When your decoders are in hardware, there is a very strict set of parameters the input is expected. Any deviation means the hardware cannot play the video. Early days of "cheaper" DVD software had issues with the muxing.
Video editors (or at least Adobe Premiere) have similar problems: they ignore the timestamps entirely, and any clips you import into them will desynchronize unless you've either recorded them from a known-good source with a constant timebase, or re-encoded them at a constant frame rate.
Video timestamps are weird. Years ago I routinely pulled event VODs from an HLS source and re-uploaded to YouTube. To speed up downloading, I downloaded the MPEG-TS segments in parallel and assembled them with FFmpeg. Initially I used the basic and familiar concat demuxer during assembly. The results were fine locally. Months in a visitor told me that all my VODs had subtle yet frequent stutters. Turned out the videos played perfectly fine in any libavcodec-based (i.e. FFmpeg-based) video player, and still played fine even after libavcodec re-encoding, yet once they went through YouTube’s encoder, which AFIAK was also libavcodec-derived, subtle stutters appeared at segment boundaries. I then switched to the hls demuxer during assembly and the YouTube problem went away. I never got to the bottom of this, so to this day it’s still a mystery to me.
This is what I love about tech. I have 15 years experience as a software developer but I have no clue what any of these words mean. Amazing you can have such specialised knowledge about something.
Your comment gave me perspective on how far down the rabbit hole my media server has taken me, as I was nodding along to everything the parent poster said having encountered similar issues with FFmpeg in the past.
Niche knowledge really can creep up on you over the years as you gradually encounter problems and work to solve them a few hours at a time.
While I have not been this deep into the inner workins, I personally have skimmed that rabbit hole when I started a side-project that was essentially grep but for (mainly) MKV files.
Idea was that you could "grep" by specific text in the subtitles and automatically create a clip of every occurrence of the text (by looking at the subtitle timing and padding that in both directions).
The biggest source of my frustration was that I was unable to get the clipping to work exactly as I wanted, where the start or end of the clip would seemingly drift back and forth. That was until I realized it boiled down to how the different seek modes in ffmpeg handled keyframes.
I still haven't gotten the clipping to work exactly as I want but I figured doing two passes might be the way to go: first pass would do a fuzzy match and ensure there is enough extra on both ends of the desired clip and the second pass could re-encode the fuzzy-matched clip to shuffle the keyframes around, allowing more accurate clipping.
It's tricky. Subs aren't always brilliantly timed either.
I did something similar many moons ago to auto create summary videos based on changing sentiment in the subtitles for the BBC. Worked... interestingly.
If you want to nail scene changes, one thing you can do is look for sudden changes to the histogram frame by frame. It'll change pretty smoothly as people move, cameras move etc but there's a discontinuity when there's a cut. One issue though then is that there are a lot of camera cuts! Surprising how many there are that you don't really notice.
I've seen many strange mp4s and webms floating around various discord communities. Some crash your client at a fitting moment in the video, some appear to be thousands of hours long, some appear to be seconds long but are actually hours long, some even loop! somehow.
Do you still have copies of them? Could you send them to jtunney@gmail.com? I'd like to setup a web page hosting MPEG torture tests, since there doesn't appear to be one already. This is actually a very common practice for things like RFCs written for text-based protocols. We should ideally have more accessible information online that helps video software authors to harden their implementations against these sorts of busy beaver attacks.
I even saw videos that play something entirely different the second time you play it!
On some sites with a video duration limit that don't do transcoding, at least those that allow vp8 WEBM uploads, you can change a few bytes on the input to report a false duration and upload longer videos. If you're uploading audio only, with a static image, you can sometimes upload hours of audio before you hit the filesize limit.
I have no idea what or how YouTube’s backend works, but I thought it would be useful to share here that if using ffmpeg one can use the arguments -vsync drop to generate fresh time stamps based on frame rate
It's almost like we didn't learn from the days of MP3. I have several MP3s that, in certain players, are like a half hour long, despite being only 2 minutes long. My best guess was that they were assumed to be CBR, despite nothing about MP3 implying CBR… (there's not a flag or anything that says "this is a VBR" file, CBR files are just special…)
Nowadays it's mostly moot since MP3 is obsolete.
> MP3 is obsolete
What should we be using instead for lossy audio?
The current state of the art is Opus, but HE-AAC is also superior, and then there’s always the appeal of lossless which is a lot more practical than it once was.
HE-AAC is only useful at low bitrates though (below 64 kb/s), and supposedly never reaches transparency. Above that, you should use AAC-LC (or, of course, Opus if you can).
Vorbis is also notable as a better format than MP3, although that too is made obsolete by Opus.
A format which cannot deliver quality is not state of the art. Opus is the Internet Explorer 6 of musical and video formats.
Opus is an audio format, not a video format. Opus is better than MP3. Wouldn't MP3 actually be the Internet Explorer 6 of audio formats?
https://sound.stackexchange.com/questions/26167/opus-vs-mp3-...
What on earh are you talking about? For lossy formats, there is currently nothing better than opus in actual use.
Are you mixing up Opus with something else? By many metrics, it is better at delivering quality than just about any other lossy audio codec.
AAC is far superior, as is OGG, on a technical basis
On the basis of "can it play in my car", MP3 is the only winner. My car's player has one of those baseline decoding chips that can only do MP3.
That's true, MP3 is by far the most widely supported lossy audio format (except presumably MP1/MP2, since MP3 decoders have to support them), so it will live on for a long time, although Opus is the best one nowadays. Just like with PNG and JPEG for images, which will live on for a long time even though we have WebP, AVIF and JPEG XL. And AVC will probably live on for a long time even though we have HEVC, VP9 and AV1.
I mean, what I want is for the car to have an audio input that I plug a cable into. There's no reason for the car to be decoding audio at all.
"Now you have two problems." Specifically the steering wheel controls won't work and I'd have to deal with charging the second device.
My controls work even when I use the aux cable.
Same here. And it doesn't even do that very well. Imagine spending 15k or more on a brand new car in 2021 just to realize that the sound tech is borrowed from a $5 MP3 player from the early 2000s.
Almost exactly the same situation as me. The worst part of it is it doesn't sort the directory entries! It displays them in the same order they are written to the directory (ie. usually random). Luckily there is https://fatsort.sourceforge.io/
This is why I prefer cars where the stereo is replaceable.
My Diamond Rio PMP300 would play VBR but would shit the bed on displaying duration and seeking because it assumed CBR, as you suggested. When VBR was new, this was a pretty familiar situation and oldschool mp3 encoding standards for share sites would give the option to stick to CBR for that reason - they'd specify alt preset standard for VBR and a couple of CBR options, generally around 256kbps.
Plenty of obsolete audio files on my phone.
What's the bug here? It looks like you fooled the container codec with a incorrect timecode and then when it was uploaded to YouTube, the file was rasterized into a sane format. I don't really see an attack here, nor do I see a mitigation.
It seems like it sort of counts as an amplification DOS. Enough people uploading smallish videos that unravel into terabytes could probably create an issue. It's bypassing the YouTube limits of 256 GB/12 hours.
I would guess YouTube will do some sort of fix or sanity check.
That makes sense, thank you. I'd assume a data engineer at Google somewhere has a small yellow light that goes off whenever someone exceeds those limits, but FAANG infrastructure never fails to disappoint me.
More like a graph that a single person generally can't hope to move unless they have a following to the level of xcow. If someone burns a tire in the middle of the rain forest.... can anyone tell until its 50,000 people doing it?
What is xcow?
I think they might've meant xqcow https://en.wikipedia.org/wiki/XQc
Strange reference. Is that just meant to be an arbitrary celebrity or does xqcow have some particular relevance here I'm missing?
Most people would probably name-drop him if asked to list off the 10 biggest streamers they could think of. I wouldn't consider his ilk a household name, but I'm just playing it as it lies.
What proportion of people can name any 10 'streamers'?
The issue with "expensive to calculate" values like the duration of media (for example, variable encodings) is that the encoder tries to help others avoid rematerializing these values by saving its calculation in some metadata. The problem is consumers then have to "trust" the encoder; this post demonstrates a non-malicious case, but perhaps there are more malicious cases (like the vulnerability in Android's libstagefreight years ago).
For example, I wrote an iTunes-in-the-browser web app; I needed to know durations of songs to display them. MP3 doesn't include these in metadata IIRC, so I needed to pre-process them with ffmpeg just to have duration data. I wasn't doing anything with that other than displaying it. But it would have been nice to just have that info in the metadata.
> For example, I wrote an iTunes-in-the-browser web app; I needed to know durations of songs to display them. MP3 doesn't include these in metadata IIRC, so I needed to pre-process them with ffmpeg just to have duration data.
This jogged my memory from (part of) the first thing I ever built in a general purpose programming language, all of probably 20 years ago! I was doing exactly this: using ffmpeg to get duration metadata from MP3s.
My memory was fuzzy so I looked it up, which (surprisingly!) confirmed what I remembered. MP3s may include metadata (ID3) which may include duration (or start/end times).
I knew my input source (it was me, my music, my MP3 conversions), so I was able to rely on the metadata directly. IIRC I even processed it on demand in my first naive version, which was “slow” but not nearly as slow as stuff I’d complain about today.
I ran into a similar issue when I tried to generate a podcast RSS feed from a website whose built-in feed didn't go back far enough. I was trying to do HTTP range requests on the mp3 files to save bandwidth and just fetch their metadata. Sure enough, mostly no duration and if the encoder did put it in a custom field it was usually different than what VLC says.
You could do something like a Zip bomb I guess. YouTube would just have to do some validation of the file before adding to pipeline.
zip bomb is a perfectly valid file.
I can set up a broken service, that outputs a gajillion lines of same errors to syslog, creating terrabytes of logs, zip all that into few megabytes, and that'd be a valid zip, that'd fill up most modern laptops and servers.
A surveillance camera video, with a very high frame rate when motion is detected and a very low frame rate when not (high framerate -> timelapse), can be a perfectly valid video, taking a few gigabytes in this format, and a few terrabytes when converted to fixed 60fps.
Zip files that contain themselves are infinitely large when recursively decompressed, so that's much worse than a log file which is merely easy to compress.
Infinitely large doesn't mean anything, when your disk space is limited.
If your drive is 500GB, there is no practical difference between a 10TB log file a 10PB zip file or an infinite zip bomb... once the disk is full, the unzipping stops.
Narrowly true, except it's trivial to scan a very large archive without actually storing the entire thing, whereas if you tried to do the same thing with a zip quine you'll eventually run out of memory. Zip quines are strictly worse.
Came across a video on YouTube recently that I think may be misreporting its length due to this issue:
https://www.youtube.com/watch?v=5Grsvyt5xps
The video is 22 minutes but it's reported at nearly 3 hours in length.
But OP's video is just a video with very low frame rate (reported as 0.030 FPS from `mediainfo`). There is nothing broken about it.
Just become its file size is small, does not mean it can't be 15 hours long (one of the author is takeaway is "[t]he size of a video file is not an proper indicator for how long it is": but even without this hack, you can't do that either, since video can have whatever bitrates.)
I saw this one misreported as 22 minutes in Firefox and Discord. Only 1:43 ? in reality https://www.youtube.com/watch?v=RerbrfVd1nI
> Herbie Hancock on Miles: Don't play the butter notes!
One of my favorite "breaking YouTube" (jpeg, really) demos was the slow motion glitter
I spent time skipping back and forth in the video looking for the side by side(raw vs yt) until it dawned on me. I may in fact be an idiot.
>Regards, Google Security Bot
So it's basically a compression bomb? Like those small zip files that can expand to gigantic sizes?
42.zip, yep. i have a copy if anyone wants it
never heard of this, TLDR on how it works?
From https://www.unforgettable.dk/ :
"The file contains 16 zipped files, which again contains 16 zipped files, which again contains 16 zipped files, which again contains 16 zipped, which again contains 16 zipped files, which contain 1 file, with the size of 4.3GB. So, if you extract all files, you will most likely run out of space :-)"
Why recursively extract zip files? Well maybe a security tool is truing to inspect or process zip file contents
i thought this as well explained. title a bit clickbaity, but it got me to click.
i'm interested in learning more about the mp4 format. where can I read more? is there a canonical read that everyone but me knows about?
OP seems like he has some kind of file explorer UI for it - also interested in that
MP4 Inspector (Windows)
The MP4 format is fundamentally pretty easy, at least the box structure. But there have been so many standards that overlap that the MP4 format is also really messy. Aaand you need to pay to get access to the specs of the format..
"This is clickbait-y enough that I fell for it" is uh. Not exactly an endorsement? It seems like kind of the opposite of what you'd want to encourage?
I didn't like the title either. On the other hand, I doubt the fellow expected much of an audience but hit HN's front page. Also, the article is pretty great
Remember zip quines? Ah, good old days.
"Hacking YouTube" is a stretch description ...
Yes:
> To the best of my knowledge, the impact was rather low because their transcoders are setten up in such a way that they will eventually give up on file if it takes too many resources.
Recently been playing with this. Using FFMpeg to generate videos from a series of stills, I assumed a frame rate of 1 and a fixed video length would be suitable... Turns out a lot of players are very particular about how they like their files to be set up. Windows couldn't open the file, VLC could, Google couldn't generate thumbnails, but could show the video. Playing them on a Pi lead to more fun and games.
In the end I just encoded them the 'correct' way, but it was eye opening to the wildness going on in video files. I just assumed I would be able to set a duration, a frame rate, and things would "work".
And wait until you get to the edit lists part of the MP4 specs. Some real powerful stuff in that.
With ffmpeg, use the below command to calculate maximum timestamps
ffmpeg -i INPUT -map 0:v -map 0:a -enc_time_base -1 -c copy -f null -
Note the time= value at the end of the process.I'm bit surprised about lack of financial reward- it can realistically be used to takedown processing servers in rather simple DDoS attack.
Earlier this year people were setting false video metadata to bypass TikTok's duration limit and upload very long videos.
Zip bombed Youtube. Nice.
looks like Discord is vulnerable to this too, oopsie
Not discord, but the default player is vulnerable to many different crash shenanigans. I get them sent to me all the time to look into and its usually just people using bogus timestamps, bogus seek times or concatenating multiple videos of different resolutions/rates that the player can't handle. If there was a way to get discord to spawn VLC for playing videos by default this would be less of a problem.
> get discord to spawn VLC
So rather than loading the bogus videos in a sandboxed Chromium instance, you want to load them in an unsandboxed VLC instance? I smell eventual RCE.
Yes.
- VLC has decades of battle hardening and entirely discards all the aforementioned nonsense. In a perfect world, both Discord and VLC would be sandboxed themselves, but I accept that this world is far from perfect. Discord could at least sanitize anything that strays from a filename when passed to VLC.
- Discord is already vulnerable to crashes from multimedia. This has been a long running problem that has not been resolved by sandboxing in Electron. The folks at Discord will not be able to resolve this with code changes in Electron AFAIK. If you can crash it, there is potential for an RCE. What that RCE can effectively accomplish will entirely depend on sandboxing boundaries external to the application, not sandboxing within the application.
In reference to sandboxing, I could make a document that explains how to enable the OS wide sandboxing features of Windows 10 [1] VirtualSecureMode / DeviceGuard / CredentialGuard and Linux SELinux / AppArmor. I don't have one for MacOS. I should add, don't enable the Windows 10 security features if you depend on any virtualization outside of Hyper-V. Enabling those will break all hypervisors that don't rhyme with Hyper-V.
I should add that my solution for Discord is to not preview videos or play them in the client. I click on the links and VLC plays them but that is not the default behavior of the application.
[1] - https://techcommunity.microsoft.com/t5/iis-support-blog/wind...
Aren't quite a few Android security fixes every month related to the media framework? Are those not severe in a browser context because it's sandboxed?
We don't transcode video, so no.
I presume you're Discord eng. You must do some sort of pass or parse of it, because every now and then I'll upload something and it will fail to process and result in what I'll call "the sad Discord poop"…
We try to grab the first frame to show a preview. But if we can't for whatever reason, that's when the sad poop appears :(
the player is malfunctioning anyway, similarly to those videos that report short runtime and then go on forever that get passed around quite frequently
This is how the video element works in chromium. I suspect it looks at the same metadata field. Beyond leading to a bit of absurd UI state though it's not the same kind of issue that this post describes, which deals with trying to transcode these kinds of videos which could multiply storage utilization on the backend.
Folks, just because the author wasn't wearing a Guy Fawks mask with a black hoodie and made no mention of gaining access to the Central Meme Database, that doesn't mean they weren't hacking.
They were hacking around with MP4 muxers and YouTube. This is definitely the hacker spirit. The word doesn't need to be re-appropriated by Hollywood caricatures.
Does anyone else really use "hack" in the way HN uses it, ie with its original meaning?
For your average person a hacker is a person in Guy Fawks mask with a black hoodie that steals your facebook password.
For people in the industry a "hack" is a code that works but might be a placeholder/potentially dangerous code. The author would want to write a better version of it but perhaps is not able to due to time or design constraints
I don't think anyone other than HN using "hacker" in the way it mean to be, perhaps it is time to catch up with the time
Literally everyone who uses the word "lifehack"
To be fair, thanks to lifehacks, growth hacks and whatever it seems hack slowly fades back into the original meaning. At least from my POV
Growth hacking