I’m trying to figure out how best to compress my videos for streaming over the web. I’m using FFMPEG for compression, and it has dozens of potentially-useful flags. Looking at random guides and forum posts online, there’s obviously a lot of cargo-culting of compression parameters, and it’s not clear at all to me what the best choices are.
Since there’s no authoritative guide to picking options, and in any case it’d vary widely depending on the nature of the videos that you’re encoding, I’m left running experiments to see what works best for me. I’ve just finished 1,193 different test encodings of the same 30 second video, all in an attempt to figure out what the best settings are for my uses.
Will this apply to your video? Maybe, maybe not. But the technique is generally useful, and a number of the results were surprising. You can test with your own content and see what works for you.
Testing Details
First, let’s lay out what I’m actually testing.
I’m running a nightly GPL build of FFMPEG from https://github.com/BtbN/FFmpeg-Builds/releases, built on March 13, 2025 at 05:44. I’m running it on an AMD Threadripper 5975WX, with 32 cores (64 threads) and an nVidia RTX 4090. Tests were conducted against files on a local NVMe SSD (Samsung 990 Pro), so disk latency shouldn’t be a substantial part of any of the metrics.
The file that I’m compressing is the first 30 seconds of the second video from Deception Pass that I posted a couple weeks ago. It was recorded with a Panasonic GH6 via a Blackmagic Video Assist as a BRAW file, and then rendered down into a 5376x3024 4:2:2 10-bit DNxHR HQX file via DaVinci Resolve. This was then downscaled via FFMPEG into a 30 second long, 60 FPS, 1920x1080, 4:4:4 10-bit DNxHD file, totaling 3.1 GB. This is about as close as I can come to producing a source video with no artifacts from earlier compression cycles.
Once I had the test video, I started testing various compression
flags to see how they performed. For each set of flags, I encoded a
video and then calculated the VMAFVMAF is Netflix’s internally-developed compressed video
quality scoring system, and is probably the best single metric for
automatically judging compressed video quality. It’s not perfect, but
it’s pretty good and far far superior to hand-reviewing over
1,000 .mp4 files. and VMAF
NEGVMAF NEG is intended to help filter out “enhancements”
that degrade quality while boosting the VMAF score. I was hoping that
it’d tell me something about the weird issues that I keep seeing with
nVidia-rendered H.265 files, but it seems to like them slightly
better. Oh well. score. I then recorded the flags, file size, encoding
time, user CPU use for encoding, and VMAF scores for each encoding.
For each set of flags, I varied the quality settings
(-crf for libx265 and -cq:v for
hevc_nvenc) to find a the lowest quality setting that would
get me a VMAF of at least 95. This should let me compare
various compression flags on a relatively equal basis.
I’m trying to find the best possible flags for compressing my videos, where “best” means the highest quality with the lowest bit rate, all compressed in a reasonable amount of time. Various runs ranged from 3.84s up to 453 seconds, and the resulting VMAF 95 files ranged from 4.2 to 12.3 MB.
TL;DR
For my uses,
-g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.6
gave the best results for this sample file. Using nVidia’s hevc_nvenc
gives slightly larger file sizes in about half of the rendering time,
but the videos produced are visually
inferior.
Since most of my videos are basically similarEssentially landscape photography in video form, with
slowly flowing water, falling leaves, etc., I expect that the same general flags will work well for
them. I’ll probably want to adjust the -crf value at least
per-resolution, as it’s unlikely that 480p and 2160p will be optimal
with the same number. Longer-term I’m debating if I want to calculate
VMAF scores per-video, per-resolution or just pick a static set
-crf values and use those by default.
If I was encoding noisy video, or rapidly changing content (like sports or anything with a moving camera), or if I had scene changes, then I’d probably want to retest to see if different settings worked better.
Interestingly, I found that 10-bit H.265 is almost always smaller than 8-bit H.265 for this source file. Since (in theory) the 10-bit file contains 25% more information, and the lower-order bits are noisier than the 8 high-order bits, I’d expect the 10-bit file to be larger, but this was never the case. In addition, there wasn’t a substantial advantage to chroma subsampling in my case; 4:4:4 10-bit H.265 files were rarely substantially larger than 4:2:0 10-bit files, and were sometimes quite a bit smaller.
libx265 results
libx265 is the standard open-source H.265 encoder in FFMPEG. It’s fairly slow but seems to get the job done.
-preset
First, let’s look at a very basic libx265 encoding,
using
-c:v libx265 -preset <speed> -crf <quality>.
First, the most important metric – the output size. Using
-preset veryslow produced the smallest file, but by a tiny
margin. -preset slow was less than 0.2% larger, and was
actually smaller than -preset slower. Using
-preset medium or faster produced substantially larger
files.
Here are the -crf values needed to achieve a VMAF score
of 95 for each preset. I adjusted the CRF setting until the VMAF was
just over 95, to equalize for quality:
Just to make it clear why I kept changing the -crf value
for each preset, here are the VMAF scores for a constant
-crf 20 for all presets:
So, when running with a constant -crf setting, faster
presets produce lower-quality output. Which shouldn’t be surprising. By
adjusting the -crf for each -preset until we
reach a VMAF of 95, we can judge the various presets on the basis of
their file size and how long they take to compress and get more of an
apple-to-apples comparison.
When it comes to the amount of time needed to compress,
veryslow and slower are aptly named, while
medium through veryfast were all similar,
possibly because it took ffmpeg a while to read and decode
the 3.1 GB source file.
For this set of settings, -preset slow or
-preset medium are the best two options, depending on how
you value size vs compute time.
-tune fastdecode
The next setting I looked at was -tune fastdecode. There
are a few other -tune options, but they’re mostly geared
towards either specific testing scenarios or specific types of input
video, while fastdecode is intended to make the player’s
work easier. I expected that this would make output files slightly
larger.
Okay, I didn’t see that coming. Adding
-tune fastdecode dropped output sizes 200 kB or so. It also
made a small improvement in encoding time.
GOP intervals
Next, I experimented with changing the GOP interval in the generated video. Analysis showed that the bulk of the bytes in the video were in I frames, and the B and P frames were relatively small. I think increasing the GOP interval to 10 seconds (or 600 frames) should be fine for my use.
Assuming that it still streams right, fastdecode plus
10-second GOPs seems like a nice win.
-pix_fmt
My source video was a 4:4:4 10-bit file, so libx265
defaults to producing a 4:4:4 10-bit H.265 file. In theory, reducing the
video to 4:2:2 or 4:2:0, or dropping from 10-bit to 8-bit video
should reduce the output size.
This… didn’t happen.
I have no clue why 4:2:2 is larger than either 4:4:4 or 4:2:0. This isn’t the result that I’d expect.
My best guesses why 10-bit encodings are smaller than 8-bit encodings all involve either banding or dithering, but I’d love to see an authoritative explanation from someone.
Overall results
Here’s the full set of VMAF=95 results for libx265,
sorted by size. Note that -pix_fmt yuv444p10le and
-preset medium are defaults and may not always be
shown.
| flags | kbytes | walltime | vmaf | vmaf_neg | mbps |
|---|---|---|---|---|---|
-g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.6 |
4232 | 38.64 | 95.03862 | 93.301121 | 1.129 |
-pix_fmt yuv420p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.7 |
4240 | 29.03 | 95.047369 | 93.320013 | 1.131 |
-pix_fmt yuv420p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 20.7 |
4320 | 32.54 | 95.008211 | 93.333366 | 1.152 |
-g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 20.5 |
4392 | 44.77 | 95.040188 | 93.362489 | 1.171 |
-pix_fmt yuv422p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.8 |
4392 | 34.44 | 95.029819 | 93.28896 | 1.171 |
-c:v libx265 -preset slow -tune fastdecode -crf 20.7 |
4540 | 39.15 | 95.040546 | 93.32068 | 1.211 |
-pix_fmt yuv422p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 20.7 |
4572 | 38.79 | 95.013026 | 93.344707 | 1.219 |
-c:v libx265 -preset veryslow -crf 20.7 |
4780 | 298.81 | 95.020114 | 93.364579 | 1.275 |
-c:v libx265 -preset slow -crf 20.5 |
4788 | 45.34 | 95.032299 | 93.404406 | 1.277 |
-pix_fmt yuv420p -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 19.6 |
4788 | 25.17 | 95.030429 | 93.472752 | 1.277 |
-c:v libx265 -preset slower -crf 20.7 |
4892 | 171.46 | 95.044693 | 93.39301 | 1.305 |
-pix_fmt yuv420p -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 19.2 |
5324 | 27.98 | 95.016273 | 93.545946 | 1.42 |
-pix_fmt yuv422p -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 19.7 |
5372 | 30.4 | 95.040415 | 93.473104 | 1.433 |
-pix_fmt yuv420p10le -c:v libx265 -crf 18.4 |
5796 | 16.7 | 95.005216 | 93.597107 | 1.546 |
-c:v libx265 -preset superfast -crf 16 |
5856 | 9.15 | 95.030784 | 93.522659 | 1.562 |
-pix_fmt yuv422p -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 19.4 |
5864 | 32.92 | 95.009275 | 93.526384 | 1.564 |
-c:v libx265 -tune fastdecode -crf 18.3 |
5912 | 17.42 | 95.013498 | 93.545759 | 1.577 |
-c:v libx265 -crf 18.1 |
6124 | 23.7 | 95.001819 | 93.60139 | 1.633 |
-c:v libx265 -preset medium -crf 18.1 |
6124 | 23.57 | 95.001819 | 93.60139 | 1.633 |
-c:v libx265 -preset fast -crf 17.8 |
6280 | 22.4 | 95.000127 | 93.636264 | 1.675 |
-c:v libx265 -preset faster -crf 17.6 |
6412 | 21.48 | 95.01327 | 93.639918 | 1.71 |
-c:v libx265 -preset veryfast -crf 17.6 |
6416 | 21.52 | 95.012366 | 93.638633 | 1.711 |
-pix_fmt yuv420p -c:v libx265 -tune fastdecode -crf 17.8 |
6752 | 14.51 | 95.007574 | 93.525252 | 1.801 |
-pix_fmt yuv422p10le -c:v libx265 -crf 18.4 |
7252 | 18.94 | 95.012661 | 93.610947 | 1.934 |
-pix_fmt yuv422p -c:v libx265 -tune fastdecode -crf 17.8 |
8812 | 14.95 | 95.005922 | 93.549463 | 2.35 |
-pix_fmt yuv420p -c:v libx265 -crf 16.9 |
8956 | 14.83 | 95.018216 | 93.699527 | 2.388 |
-pix_fmt yuv422p -c:v libx265 -crf 17 |
11388 | 17.17 | 95.024472 | 93.708545 | 3.037 |
libx265 summary
Given these results, -preset slow -tune fastdecode with
10-second GOPs and 4:4:4 10-bit seems like the obvious choice, although
4:2:0 10-bit encodes a bit faster and might have a
compatibility advantage, although developer.mozilla.com implies that 4:4:4
is generally supported.
hevc_nvenc results
nVidia’s hardware encoder doesn’t seem to produce as good of results
as libx265, but it’s hard to argue with the performance.
hevc_nvenc has far more config flags than
libx265, but the majority of them seem fairly
special-purpose to me. They may be useful for tuning individual videos
but probably aren’t worth it in general. This made testing it
more difficult, as there were more scenarios to look at.
-preset
nVidia’s presets are named p1 through p7,
with p7 the slowest.
So, these are uniformly terrible compared to the libx265
results. With default options, it produced a 6,124 kB file, compared to
the best-case 8,900 kB file here.
These are all downright zippy compared to
libx265; the slow preset there took 45.34s, 4x
as long as p7.
-rc vbr
In general, we want to use variable bitrate encoding. This appears to
be the default for hevc_nvenc when used with
-cq:v; adding -rc vbr gave identical file
sizes and VMAF scores to runs that didn’t use an
-rc flag.
Most of my test runs have the flag included anyway even though it’s effectively a no-op.
-tune uhq
FFMPEG 12.2 added an “ultra-high quality” -tune uhq
option. Turning it on drastically improved the results:
So, with -tune uhq, the worst preset is better
than the best without -tune uhq.
It’s a little bit slower, but not terrible:
So p5 takes a bit of a hit, but the drop in file size is
mostly worth the extra time.
Given the numbers overall, I’m going to concentrate on
-preset p7 -tune uhq from here on out.
pix_fmt
Unlike libx265, dropping to 10-bit 4:2:0 actually helps
reduce the size of the output. nVidia only supports H.265 4:2:2 on RTX
5xxx and newer GPUs, so I can’t test it on my RTX 4090.
Like libx265, 8-bit 4:2:0 is substantially worse than
10-bit 4:2:0.
GOP
Increasing the GOP length helps nVidia’s encoder, just like it helps
libx265:
So, with this we can get hevc_nvenc down to 4,384 kB,
compared to 4,232 kB for libx265. Since this only takes
14.36s to encode vs 38.64s for libx265, I would prefer to
use hevc_nvenc. I’m perfectly willing to spend a few
percent more storage and bandwidth in exchange for a 60% drop in
compression time.
Unfortunately, as mentioned earlier, the nVidia-encoded version of my
test video is missing a bunch of details that the software-encoded
version retained, and increasing to -cq:v won’t get them
back. I still haven’t found the flag that will disable whatever is going
on under the hood, if it even exists.
Overall results
Here’s the full set of VMAF=95 results for hevc_nvenc,
sorted by size. Note that -pix_fmt yuv444p10le and
-preset medium are defaults and may not always be
shown.
| flags | kbytes | walltime | vmaf | vmaf_neg | mbps |
|---|---|---|---|---|---|
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -g 600 -keyint_min 600 -cq:v 33.4 |
4384 | 14.36 | 95.013901 | 93.685362 | 1.169 |
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 33.7 |
4876 | 14.62 | 95.030828 | 93.720213 | 1.3 |
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p6 -tune uhq -cq:v 33.7 |
4884 | 14.37 | 95.028366 | 93.716392 | 1.302 |
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p5 -tune uhq -cq:v 33.2 |
5016 | 14.35 | 95.005851 | 93.712047 | 1.338 |
-rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 33.7 |
5076 | 14.86 | 95.015688 | 93.698684 | 1.354 |
-rc vbr -c:v hevc_nvenc -preset p6 -tune uhq -cq:v 33.7 |
5080 | 14.6 | 95.011742 | 93.69536 | 1.355 |
-rc vbr -c:v hevc_nvenc -preset p5 -tune uhq -cq:v 33.2 |
5260 | 14.61 | 95.001358 | 93.711044 | 1.403 |
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p4 -tune uhq -cq:v 33.4 |
5664 | 6.61 | 95.01387 | 93.716526 | 1.51 |
-pix_fmt yuv420p -g 600 -keyint_min 600 -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 32.7 |
5688 | 14.35 | 95.016777 | 93.672057 | 1.517 |
-rc vbr -c:v hevc_nvenc -preset p4 -tune uhq -cq:v 33.2 |
5948 | 7.36 | 95.031411 | 93.730912 | 1.586 |
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p3 -tune uhq -cq:v 33.4 |
6140 | 6.35 | 95.114432 | 93.820198 | 1.637 |
-rc vbr -c:v hevc_nvenc -preset p3 -tune uhq -cq:v 33.4 |
6236 | 7.36 | 95.077299 | 93.769911 | 1.663 |
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p1 -tune uhq -cq:v 34.2 |
6340 | 5.1 | 95.149738 | 93.848934 | 1.691 |
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p2 -tune uhq -cq:v 34.2 |
6340 | 5.11 | 95.149738 | 93.848934 | 1.691 |
-pix_fmt yuv420p -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 32.7 |
6428 | 14.35 | 95.082293 | 93.75149 | 1.714 |
-rc vbr -c:v hevc_nvenc -preset p1 -tune uhq -cq:v 34.2 |
6456 | 6.85 | 95.105326 | 93.798717 | 1.722 |
-rc vbr -c:v hevc_nvenc -preset p2 -tune uhq -cq:v 34 |
6456 | 6.61 | 95.105326 | 93.798717 | 1.722 |
-rc vbr -c:v hevc_nvenc -preset p6 -cq:v 26.2 |
8900 | 9.35 | 95.207754 | 93.911661 | 2.373 |
-rc vbr -c:v hevc_nvenc -preset p7 -cq:v 26.2 |
8912 | 10.1 | 95.219151 | 93.919735 | 2.377 |
-rc vbr -c:v hevc_nvenc -preset p5 -cq:v 26.2 |
8924 | 4.85 | 95.164084 | 93.87303 | 2.38 |
-rc vbr -c:v hevc_nvenc -preset p4 -cq:v 26.2 |
8960 | 4.36 | 95.158709 | 93.86267 | 2.389 |
-rc vbr -c:v hevc_nvenc -cq:v 26.2 |
8960 | 4.34 | 95.158709 | 93.86267 | 2.389 |
-c:v hevc_nvenc -cq:v 26.2 |
8960 | 4.35 | 95.158709 | 93.86267 | 2.389 |
-rc vbr -c:v hevc_nvenc -preset p3 -cq:v 26.9 |
9808 | 4.1 | 95.201238 | 93.849491 | 2.615 |
-rc vbr -c:v hevc_nvenc -preset p2 -cq:v 27.9 |
10092 | 3.86 | 95.033091 | 93.659736 | 2.691 |
-rc vbr -c:v hevc_nvenc -preset p1 -cq:v 27.7 |
12276 | 3.86 | 95.156093 | 93.794733 | 3.274 |
Conclusions
A few conclusions, based on this one video and my intended uses. YMMV.
- Ignoring speed,
libx265is generally better. It produces better quality videos, has fewer fiddly options, and produces smaller files. Even usinglibx265with the default flags isn’t terrible, and switching to-preset slowgets you within 10% of the best results that I’ve found so far. - If you care about quality or size at all with
hevc_nvenc, then always turn on-tune uhqand adjust your quality parametersForp7, VMAF 95 happens at-cq:v 26.2withoutuhqand-cq:v 33.7withuhq. So you can’t just enableuhqwithout tweaking other settings and expect the best results.. It’s dramatically better thanhevc_nvenc’s defaults, which are pretty bad. - For both encoders, there isn’t a huge difference between the
moderately-fast encoder presets. Using
-preset fastvs-preset fasteror-preset veryfastis only a few percent change. Similarly,p1throughp4are fairly similar if-tune uhqis enabled. - Adjusting the GOP size for streaming especially with low-motion video makes a big difference.
- Chroma subsampling doesn’t make much of a difference, and may hurt size more than it helps.
- Dropping from 10-bit to 8-bit H.265 is a bad idea and hurts compression ratios.
Future Work
I’m absolutely confident that I’m missing a few
moderately-useful flags here, but I doubt that they’ll make more than a
few percent difference. I’ve done some testing with a handful of
additional hevc_nvenc flags that didn’t show any difference
at all. I’ve omitted them for now, but I may retest later.
Let me know if you disagree with any of this or have suggestions.
I’ll be testing H.264 and AV1 in a week or two.
Finally, I’ll drop a link to my VMAF testing script once I’ve had a chance to clean it up a bit.