Benchmarking FFMPEG's H.265 Options

15 min read Original article ↗

I’m trying to figure out how best to compress my videos for streaming over the web. I’m using FFMPEG for compression, and it has dozens of potentially-useful flags. Looking at random guides and forum posts online, there’s obviously a lot of cargo-culting of compression parameters, and it’s not clear at all to me what the best choices are.

Since there’s no authoritative guide to picking options, and in any case it’d vary widely depending on the nature of the videos that you’re encoding, I’m left running experiments to see what works best for me. I’ve just finished 1,193 different test encodings of the same 30 second video, all in an attempt to figure out what the best settings are for my uses.

Will this apply to your video? Maybe, maybe not. But the technique is generally useful, and a number of the results were surprising. You can test with your own content and see what works for you.

Testing Details

First, let’s lay out what I’m actually testing.

I’m running a nightly GPL build of FFMPEG from https://github.com/BtbN/FFmpeg-Builds/releases, built on March 13, 2025 at 05:44. I’m running it on an AMD Threadripper 5975WX, with 32 cores (64 threads) and an nVidia RTX 4090. Tests were conducted against files on a local NVMe SSD (Samsung 990 Pro), so disk latency shouldn’t be a substantial part of any of the metrics.

The file that I’m compressing is the first 30 seconds of the second video from Deception Pass that I posted a couple weeks ago. It was recorded with a Panasonic GH6 via a Blackmagic Video Assist as a BRAW file, and then rendered down into a 5376x3024 4:2:2 10-bit DNxHR HQX file via DaVinci Resolve. This was then downscaled via FFMPEG into a 30 second long, 60 FPS, 1920x1080, 4:4:4 10-bit DNxHD file, totaling 3.1 GB. This is about as close as I can come to producing a source video with no artifacts from earlier compression cycles.

Once I had the test video, I started testing various compression flags to see how they performed. For each set of flags, I encoded a video and then calculated the VMAFVMAF is Netflix’s internally-developed compressed video quality scoring system, and is probably the best single metric for automatically judging compressed video quality. It’s not perfect, but it’s pretty good and far far superior to hand-reviewing over 1,000 .mp4 files. and VMAF NEGVMAF NEG is intended to help filter out “enhancements” that degrade quality while boosting the VMAF score. I was hoping that it’d tell me something about the weird issues that I keep seeing with nVidia-rendered H.265 files, but it seems to like them slightly better. Oh well. score. I then recorded the flags, file size, encoding time, user CPU use for encoding, and VMAF scores for each encoding.

For each set of flags, I varied the quality settings (-crf for libx265 and -cq:v for hevc_nvenc) to find a the lowest quality setting that would get me a VMAF of at least 95. This should let me compare various compression flags on a relatively equal basis.

I’m trying to find the best possible flags for compressing my videos, where “best” means the highest quality with the lowest bit rate, all compressed in a reasonable amount of time. Various runs ranged from 3.84s up to 453 seconds, and the resulting VMAF 95 files ranged from 4.2 to 12.3 MB.

TL;DR

For my uses, -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.6 gave the best results for this sample file. Using nVidia’s hevc_nvenc gives slightly larger file sizes in about half of the rendering time, but the videos produced are visually inferior.

Since most of my videos are basically similarEssentially landscape photography in video form, with slowly flowing water, falling leaves, etc., I expect that the same general flags will work well for them. I’ll probably want to adjust the -crf value at least per-resolution, as it’s unlikely that 480p and 2160p will be optimal with the same number. Longer-term I’m debating if I want to calculate VMAF scores per-video, per-resolution or just pick a static set -crf values and use those by default.

If I was encoding noisy video, or rapidly changing content (like sports or anything with a moving camera), or if I had scene changes, then I’d probably want to retest to see if different settings worked better.

Interestingly, I found that 10-bit H.265 is almost always smaller than 8-bit H.265 for this source file. Since (in theory) the 10-bit file contains 25% more information, and the lower-order bits are noisier than the 8 high-order bits, I’d expect the 10-bit file to be larger, but this was never the case. In addition, there wasn’t a substantial advantage to chroma subsampling in my case; 4:4:4 10-bit H.265 files were rarely substantially larger than 4:2:0 10-bit files, and were sometimes quite a bit smaller.

libx265 results

libx265 is the standard open-source H.265 encoder in FFMPEG. It’s fairly slow but seems to get the job done.

-preset

First, let’s look at a very basic libx265 encoding, using -c:v libx265 -preset <speed> -crf <quality>.

First, the most important metric – the output size. Using -preset veryslow produced the smallest file, but by a tiny margin. -preset slow was less than 0.2% larger, and was actually smaller than -preset slower. Using -preset medium or faster produced substantially larger files.

Here are the -crf values needed to achieve a VMAF score of 95 for each preset. I adjusted the CRF setting until the VMAF was just over 95, to equalize for quality:

Just to make it clear why I kept changing the -crf value for each preset, here are the VMAF scores for a constant -crf 20 for all presets:

So, when running with a constant -crf setting, faster presets produce lower-quality output. Which shouldn’t be surprising. By adjusting the -crf for each -preset until we reach a VMAF of 95, we can judge the various presets on the basis of their file size and how long they take to compress and get more of an apple-to-apples comparison.

When it comes to the amount of time needed to compress, veryslow and slower are aptly named, while medium through veryfast were all similar, possibly because it took ffmpeg a while to read and decode the 3.1 GB source file.

For this set of settings, -preset slow or -preset medium are the best two options, depending on how you value size vs compute time.

-tune fastdecode

The next setting I looked at was -tune fastdecode. There are a few other -tune options, but they’re mostly geared towards either specific testing scenarios or specific types of input video, while fastdecode is intended to make the player’s work easier. I expected that this would make output files slightly larger.

Okay, I didn’t see that coming. Adding -tune fastdecode dropped output sizes 200 kB or so. It also made a small improvement in encoding time.

GOP intervals

Next, I experimented with changing the GOP interval in the generated video. Analysis showed that the bulk of the bytes in the video were in I frames, and the B and P frames were relatively small. I think increasing the GOP interval to 10 seconds (or 600 frames) should be fine for my use.

Assuming that it still streams right, fastdecode plus 10-second GOPs seems like a nice win.

-pix_fmt

My source video was a 4:4:4 10-bit file, so libx265 defaults to producing a 4:4:4 10-bit H.265 file. In theory, reducing the video to 4:2:2 or 4:2:0, or dropping from 10-bit to 8-bit video should reduce the output size.

This… didn’t happen.

I have no clue why 4:2:2 is larger than either 4:4:4 or 4:2:0. This isn’t the result that I’d expect.

My best guesses why 10-bit encodings are smaller than 8-bit encodings all involve either banding or dithering, but I’d love to see an authoritative explanation from someone.

Overall results

Here’s the full set of VMAF=95 results for libx265, sorted by size. Note that -pix_fmt yuv444p10le and -preset medium are defaults and may not always be shown.

flags kbytes walltime vmaf vmaf_neg mbps
-g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.6 4232 38.64 95.03862 93.301121 1.129
-pix_fmt yuv420p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.7 4240 29.03 95.047369 93.320013 1.131
-pix_fmt yuv420p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 20.7 4320 32.54 95.008211 93.333366 1.152
-g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 20.5 4392 44.77 95.040188 93.362489 1.171
-pix_fmt yuv422p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.8 4392 34.44 95.029819 93.28896 1.171
-c:v libx265 -preset slow -tune fastdecode -crf 20.7 4540 39.15 95.040546 93.32068 1.211
-pix_fmt yuv422p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 20.7 4572 38.79 95.013026 93.344707 1.219
-c:v libx265 -preset veryslow -crf 20.7 4780 298.81 95.020114 93.364579 1.275
-c:v libx265 -preset slow -crf 20.5 4788 45.34 95.032299 93.404406 1.277
-pix_fmt yuv420p -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 19.6 4788 25.17 95.030429 93.472752 1.277
-c:v libx265 -preset slower -crf 20.7 4892 171.46 95.044693 93.39301 1.305
-pix_fmt yuv420p -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 19.2 5324 27.98 95.016273 93.545946 1.42
-pix_fmt yuv422p -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 19.7 5372 30.4 95.040415 93.473104 1.433
-pix_fmt yuv420p10le -c:v libx265 -crf 18.4 5796 16.7 95.005216 93.597107 1.546
-c:v libx265 -preset superfast -crf 16 5856 9.15 95.030784 93.522659 1.562
-pix_fmt yuv422p -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 19.4 5864 32.92 95.009275 93.526384 1.564
-c:v libx265 -tune fastdecode -crf 18.3 5912 17.42 95.013498 93.545759 1.577
-c:v libx265 -crf 18.1 6124 23.7 95.001819 93.60139 1.633
-c:v libx265 -preset medium -crf 18.1 6124 23.57 95.001819 93.60139 1.633
-c:v libx265 -preset fast -crf 17.8 6280 22.4 95.000127 93.636264 1.675
-c:v libx265 -preset faster -crf 17.6 6412 21.48 95.01327 93.639918 1.71
-c:v libx265 -preset veryfast -crf 17.6 6416 21.52 95.012366 93.638633 1.711
-pix_fmt yuv420p -c:v libx265 -tune fastdecode -crf 17.8 6752 14.51 95.007574 93.525252 1.801
-pix_fmt yuv422p10le -c:v libx265 -crf 18.4 7252 18.94 95.012661 93.610947 1.934
-pix_fmt yuv422p -c:v libx265 -tune fastdecode -crf 17.8 8812 14.95 95.005922 93.549463 2.35
-pix_fmt yuv420p -c:v libx265 -crf 16.9 8956 14.83 95.018216 93.699527 2.388
-pix_fmt yuv422p -c:v libx265 -crf 17 11388 17.17 95.024472 93.708545 3.037

libx265 summary

Given these results, -preset slow -tune fastdecode with 10-second GOPs and 4:4:4 10-bit seems like the obvious choice, although 4:2:0 10-bit encodes a bit faster and might have a compatibility advantage, although developer.mozilla.com implies that 4:4:4 is generally supported.

hevc_nvenc results

nVidia’s hardware encoder doesn’t seem to produce as good of results as libx265, but it’s hard to argue with the performance. hevc_nvenc has far more config flags than libx265, but the majority of them seem fairly special-purpose to me. They may be useful for tuning individual videos but probably aren’t worth it in general. This made testing it more difficult, as there were more scenarios to look at.

-preset

nVidia’s presets are named p1 through p7, with p7 the slowest.

So, these are uniformly terrible compared to the libx265 results. With default options, it produced a 6,124 kB file, compared to the best-case 8,900 kB file here.

These are all downright zippy compared to libx265; the slow preset there took 45.34s, 4x as long as p7.

-rc vbr

In general, we want to use variable bitrate encoding. This appears to be the default for hevc_nvenc when used with -cq:v; adding -rc vbr gave identical file sizes and VMAF scores to runs that didn’t use an -rc flag.

Most of my test runs have the flag included anyway even though it’s effectively a no-op.

-tune uhq

FFMPEG 12.2 added an “ultra-high quality” -tune uhq option. Turning it on drastically improved the results:

So, with -tune uhq, the worst preset is better than the best without -tune uhq.

It’s a little bit slower, but not terrible:

So p5 takes a bit of a hit, but the drop in file size is mostly worth the extra time.

Given the numbers overall, I’m going to concentrate on -preset p7 -tune uhq from here on out.

pix_fmt

Unlike libx265, dropping to 10-bit 4:2:0 actually helps reduce the size of the output. nVidia only supports H.265 4:2:2 on RTX 5xxx and newer GPUs, so I can’t test it on my RTX 4090.

Like libx265, 8-bit 4:2:0 is substantially worse than 10-bit 4:2:0.

GOP

Increasing the GOP length helps nVidia’s encoder, just like it helps libx265:

So, with this we can get hevc_nvenc down to 4,384 kB, compared to 4,232 kB for libx265. Since this only takes 14.36s to encode vs 38.64s for libx265, I would prefer to use hevc_nvenc. I’m perfectly willing to spend a few percent more storage and bandwidth in exchange for a 60% drop in compression time.

Unfortunately, as mentioned earlier, the nVidia-encoded version of my test video is missing a bunch of details that the software-encoded version retained, and increasing to -cq:v won’t get them back. I still haven’t found the flag that will disable whatever is going on under the hood, if it even exists.

Overall results

Here’s the full set of VMAF=95 results for hevc_nvenc, sorted by size. Note that -pix_fmt yuv444p10le and -preset medium are defaults and may not always be shown.

flags kbytes walltime vmaf vmaf_neg mbps
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -g 600 -keyint_min 600 -cq:v 33.4 4384 14.36 95.013901 93.685362 1.169
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 33.7 4876 14.62 95.030828 93.720213 1.3
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p6 -tune uhq -cq:v 33.7 4884 14.37 95.028366 93.716392 1.302
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p5 -tune uhq -cq:v 33.2 5016 14.35 95.005851 93.712047 1.338
-rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 33.7 5076 14.86 95.015688 93.698684 1.354
-rc vbr -c:v hevc_nvenc -preset p6 -tune uhq -cq:v 33.7 5080 14.6 95.011742 93.69536 1.355
-rc vbr -c:v hevc_nvenc -preset p5 -tune uhq -cq:v 33.2 5260 14.61 95.001358 93.711044 1.403
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p4 -tune uhq -cq:v 33.4 5664 6.61 95.01387 93.716526 1.51
-pix_fmt yuv420p -g 600 -keyint_min 600 -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 32.7 5688 14.35 95.016777 93.672057 1.517
-rc vbr -c:v hevc_nvenc -preset p4 -tune uhq -cq:v 33.2 5948 7.36 95.031411 93.730912 1.586
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p3 -tune uhq -cq:v 33.4 6140 6.35 95.114432 93.820198 1.637
-rc vbr -c:v hevc_nvenc -preset p3 -tune uhq -cq:v 33.4 6236 7.36 95.077299 93.769911 1.663
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p1 -tune uhq -cq:v 34.2 6340 5.1 95.149738 93.848934 1.691
-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p2 -tune uhq -cq:v 34.2 6340 5.11 95.149738 93.848934 1.691
-pix_fmt yuv420p -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 32.7 6428 14.35 95.082293 93.75149 1.714
-rc vbr -c:v hevc_nvenc -preset p1 -tune uhq -cq:v 34.2 6456 6.85 95.105326 93.798717 1.722
-rc vbr -c:v hevc_nvenc -preset p2 -tune uhq -cq:v 34 6456 6.61 95.105326 93.798717 1.722
-rc vbr -c:v hevc_nvenc -preset p6 -cq:v 26.2 8900 9.35 95.207754 93.911661 2.373
-rc vbr -c:v hevc_nvenc -preset p7 -cq:v 26.2 8912 10.1 95.219151 93.919735 2.377
-rc vbr -c:v hevc_nvenc -preset p5 -cq:v 26.2 8924 4.85 95.164084 93.87303 2.38
-rc vbr -c:v hevc_nvenc -preset p4 -cq:v 26.2 8960 4.36 95.158709 93.86267 2.389
-rc vbr -c:v hevc_nvenc -cq:v 26.2 8960 4.34 95.158709 93.86267 2.389
-c:v hevc_nvenc -cq:v 26.2 8960 4.35 95.158709 93.86267 2.389
-rc vbr -c:v hevc_nvenc -preset p3 -cq:v 26.9 9808 4.1 95.201238 93.849491 2.615
-rc vbr -c:v hevc_nvenc -preset p2 -cq:v 27.9 10092 3.86 95.033091 93.659736 2.691
-rc vbr -c:v hevc_nvenc -preset p1 -cq:v 27.7 12276 3.86 95.156093 93.794733 3.274

Conclusions

A few conclusions, based on this one video and my intended uses. YMMV.

  • Ignoring speed, libx265 is generally better. It produces better quality videos, has fewer fiddly options, and produces smaller files. Even using libx265 with the default flags isn’t terrible, and switching to -preset slow gets you within 10% of the best results that I’ve found so far.
  • If you care about quality or size at all with hevc_nvenc, then always turn on -tune uhq and adjust your quality parametersFor p7, VMAF 95 happens at -cq:v 26.2 without uhq and -cq:v 33.7 with uhq. So you can’t just enable uhq without tweaking other settings and expect the best results.. It’s dramatically better than hevc_nvenc’s defaults, which are pretty bad.
  • For both encoders, there isn’t a huge difference between the moderately-fast encoder presets. Using -preset fast vs -preset faster or -preset veryfast is only a few percent change. Similarly, p1 through p4 are fairly similar if -tune uhq is enabled.
  • Adjusting the GOP size for streaming especially with low-motion video makes a big difference.
  • Chroma subsampling doesn’t make much of a difference, and may hurt size more than it helps.
  • Dropping from 10-bit to 8-bit H.265 is a bad idea and hurts compression ratios.

Future Work

I’m absolutely confident that I’m missing a few moderately-useful flags here, but I doubt that they’ll make more than a few percent difference. I’ve done some testing with a handful of additional hevc_nvenc flags that didn’t show any difference at all. I’ve omitted them for now, but I may retest later.

Let me know if you disagree with any of this or have suggestions.

I’ll be testing H.264 and AV1 in a week or two.

Finally, I’ll drop a link to my VMAF testing script once I’ve had a chance to clean it up a bit.