Benchmarking FFMPEG's H.265 Options

I’m trying to figure out how best to compress my videos for streaming over the web. I’m using FFMPEG for compression, and it has dozens of potentially-useful flags. Looking at random guides and forum posts online, there’s obviously a lot of cargo-culting of compression parameters, and it’s not clear at all to me what the best choices are.

Since there’s no authoritative guide to picking options, and in any case it’d vary widely depending on the nature of the videos that you’re encoding, I’m left running experiments to see what works best for me. I’ve just finished 1,193 different test encodings of the same 30 second video, all in an attempt to figure out what the best settings are for my uses.

Will this apply to your video? Maybe, maybe not. But the technique is generally useful, and a number of the results were surprising. You can test with your own content and see what works for you.

Testing Details

First, let’s lay out what I’m actually testing.

I’m running a nightly GPL build of FFMPEG from https://github.com/BtbN/FFmpeg-Builds/releases, built on March 13, 2025 at 05:44. I’m running it on an AMD Threadripper 5975WX, with 32 cores (64 threads) and an nVidia RTX 4090. Tests were conducted against files on a local NVMe SSD (Samsung 990 Pro), so disk latency shouldn’t be a substantial part of any of the metrics.

The file that I’m compressing is the first 30 seconds of the second video from Deception Pass that I posted a couple weeks ago. It was recorded with a Panasonic GH6 via a Blackmagic Video Assist as a BRAW file, and then rendered down into a 5376x3024 4:2:2 10-bit DNxHR HQX file via DaVinci Resolve. This was then downscaled via FFMPEG into a 30 second long, 60 FPS, 1920x1080, 4:4:4 10-bit DNxHD file, totaling 3.1 GB. This is about as close as I can come to producing a source video with no artifacts from earlier compression cycles.

Once I had the test video, I started testing various compression flags to see how they performed. For each set of flags, I encoded a video and then calculated the VMAFVMAF is Netflix’s internally-developed compressed video quality scoring system, and is probably the best single metric for automatically judging compressed video quality. It’s not perfect, but it’s pretty good and far far superior to hand-reviewing over 1,000 .mp4 files. and VMAF NEGVMAF NEG is intended to help filter out “enhancements” that degrade quality while boosting the VMAF score. I was hoping that it’d tell me something about the weird issues that I keep seeing with nVidia-rendered H.265 files, but it seems to like them slightly better. Oh well. score. I then recorded the flags, file size, encoding time, user CPU use for encoding, and VMAF scores for each encoding.

For each set of flags, I varied the quality settings (-crf for libx265 and -cq:v for hevc_nvenc) to find a the lowest quality setting that would get me a VMAF of at least 95. This should let me compare various compression flags on a relatively equal basis.

I’m trying to find the best possible flags for compressing my videos, where “best” means the highest quality with the lowest bit rate, all compressed in a reasonable amount of time. Various runs ranged from 3.84s up to 453 seconds, and the resulting VMAF 95 files ranged from 4.2 to 12.3 MB.

TL;DR

For my uses, -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.6 gave the best results for this sample file. Using nVidia’s hevc_nvenc gives slightly larger file sizes in about half of the rendering time, but the videos produced are visually inferior.

Since most of my videos are basically similarEssentially landscape photography in video form, with slowly flowing water, falling leaves, etc., I expect that the same general flags will work well for them. I’ll probably want to adjust the -crf value at least per-resolution, as it’s unlikely that 480p and 2160p will be optimal with the same number. Longer-term I’m debating if I want to calculate VMAF scores per-video, per-resolution or just pick a static set -crf values and use those by default.

If I was encoding noisy video, or rapidly changing content (like sports or anything with a moving camera), or if I had scene changes, then I’d probably want to retest to see if different settings worked better.

Interestingly, I found that 10-bit H.265 is almost always smaller than 8-bit H.265 for this source file. Since (in theory) the 10-bit file contains 25% more information, and the lower-order bits are noisier than the 8 high-order bits, I’d expect the 10-bit file to be larger, but this was never the case. In addition, there wasn’t a substantial advantage to chroma subsampling in my case; 4:4:4 10-bit H.265 files were rarely substantially larger than 4:2:0 10-bit files, and were sometimes quite a bit smaller.

libx265 results

libx265 is the standard open-source H.265 encoder in FFMPEG. It’s fairly slow but seems to get the job done.

-preset

First, let’s look at a very basic libx265 encoding, using -c:v libx265 -preset <speed> -crf <quality>.

First, the most important metric – the output size. Using -preset veryslow produced the smallest file, but by a tiny margin. -preset slow was less than 0.2% larger, and was actually smaller than -preset slower. Using -preset medium or faster produced substantially larger files.

Here are the -crf values needed to achieve a VMAF score of 95 for each preset. I adjusted the CRF setting until the VMAF was just over 95, to equalize for quality:

Just to make it clear why I kept changing the -crf value for each preset, here are the VMAF scores for a constant -crf 20 for all presets:

So, when running with a constant -crf setting, faster presets produce lower-quality output. Which shouldn’t be surprising. By adjusting the -crf for each -preset until we reach a VMAF of 95, we can judge the various presets on the basis of their file size and how long they take to compress and get more of an apple-to-apples comparison.

When it comes to the amount of time needed to compress, veryslow and slower are aptly named, while medium through veryfast were all similar, possibly because it took ffmpeg a while to read and decode the 3.1 GB source file.

For this set of settings, -preset slow or -preset medium are the best two options, depending on how you value size vs compute time.

-tune fastdecode

The next setting I looked at was -tune fastdecode. There are a few other -tune options, but they’re mostly geared towards either specific testing scenarios or specific types of input video, while fastdecode is intended to make the player’s work easier. I expected that this would make output files slightly larger.

Okay, I didn’t see that coming. Adding -tune fastdecode dropped output sizes 200 kB or so. It also made a small improvement in encoding time.

GOP intervals

Next, I experimented with changing the GOP interval in the generated video. Analysis showed that the bulk of the bytes in the video were in I frames, and the B and P frames were relatively small. I think increasing the GOP interval to 10 seconds (or 600 frames) should be fine for my use.

Assuming that it still streams right, fastdecode plus 10-second GOPs seems like a nice win.

-pix_fmt

My source video was a 4:4:4 10-bit file, so libx265 defaults to producing a 4:4:4 10-bit H.265 file. In theory, reducing the video to 4:2:2 or 4:2:0, or dropping from 10-bit to 8-bit video should reduce the output size.

This… didn’t happen.

I have no clue why 4:2:2 is larger than either 4:4:4 or 4:2:0. This isn’t the result that I’d expect.

My best guesses why 10-bit encodings are smaller than 8-bit encodings all involve either banding or dithering, but I’d love to see an authoritative explanation from someone.

Overall results

Here’s the full set of VMAF=95 results for libx265, sorted by size. Note that -pix_fmt yuv444p10le and -preset medium are defaults and may not always be shown.

flags	kbytes	walltime	vmaf	vmaf_neg	mbps
`-g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.6`	4232	38.64	95.03862	93.301121	1.129
`-pix_fmt yuv420p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.7`	4240	29.03	95.047369	93.320013	1.131
`-pix_fmt yuv420p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 20.7`	4320	32.54	95.008211	93.333366	1.152
`-g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 20.5`	4392	44.77	95.040188	93.362489	1.171
`-pix_fmt yuv422p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 20.8`	4392	34.44	95.029819	93.28896	1.171
`-c:v libx265 -preset slow -tune fastdecode -crf 20.7`	4540	39.15	95.040546	93.32068	1.211
`-pix_fmt yuv422p10le -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 20.7`	4572	38.79	95.013026	93.344707	1.219
`-c:v libx265 -preset veryslow -crf 20.7`	4780	298.81	95.020114	93.364579	1.275
`-c:v libx265 -preset slow -crf 20.5`	4788	45.34	95.032299	93.404406	1.277
`-pix_fmt yuv420p -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 19.6`	4788	25.17	95.030429	93.472752	1.277
`-c:v libx265 -preset slower -crf 20.7`	4892	171.46	95.044693	93.39301	1.305
`-pix_fmt yuv420p -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 19.2`	5324	27.98	95.016273	93.545946	1.42
`-pix_fmt yuv422p -g 600 -keyint_min 600 -c:v libx265 -preset slow -tune fastdecode -crf 19.7`	5372	30.4	95.040415	93.473104	1.433
`-pix_fmt yuv420p10le -c:v libx265 -crf 18.4`	5796	16.7	95.005216	93.597107	1.546
`-c:v libx265 -preset superfast -crf 16`	5856	9.15	95.030784	93.522659	1.562
`-pix_fmt yuv422p -g 600 -keyint_min 600 -c:v libx265 -preset slow -crf 19.4`	5864	32.92	95.009275	93.526384	1.564
`-c:v libx265 -tune fastdecode -crf 18.3`	5912	17.42	95.013498	93.545759	1.577
`-c:v libx265 -crf 18.1`	6124	23.7	95.001819	93.60139	1.633
`-c:v libx265 -preset medium -crf 18.1`	6124	23.57	95.001819	93.60139	1.633
`-c:v libx265 -preset fast -crf 17.8`	6280	22.4	95.000127	93.636264	1.675
`-c:v libx265 -preset faster -crf 17.6`	6412	21.48	95.01327	93.639918	1.71
`-c:v libx265 -preset veryfast -crf 17.6`	6416	21.52	95.012366	93.638633	1.711
`-pix_fmt yuv420p -c:v libx265 -tune fastdecode -crf 17.8`	6752	14.51	95.007574	93.525252	1.801
`-pix_fmt yuv422p10le -c:v libx265 -crf 18.4`	7252	18.94	95.012661	93.610947	1.934
`-pix_fmt yuv422p -c:v libx265 -tune fastdecode -crf 17.8`	8812	14.95	95.005922	93.549463	2.35
`-pix_fmt yuv420p -c:v libx265 -crf 16.9`	8956	14.83	95.018216	93.699527	2.388
`-pix_fmt yuv422p -c:v libx265 -crf 17`	11388	17.17	95.024472	93.708545	3.037

`libx265` summary

Given these results, -preset slow -tune fastdecode with 10-second GOPs and 4:4:4 10-bit seems like the obvious choice, although 4:2:0 10-bit encodes a bit faster and might have a compatibility advantage, although developer.mozilla.com implies that 4:4:4 is generally supported.

hevc_nvenc results

nVidia’s hardware encoder doesn’t seem to produce as good of results as libx265, but it’s hard to argue with the performance. hevc_nvenc has far more config flags than libx265, but the majority of them seem fairly special-purpose to me. They may be useful for tuning individual videos but probably aren’t worth it in general. This made testing it more difficult, as there were more scenarios to look at.

-preset

nVidia’s presets are named p1 through p7, with p7 the slowest.

So, these are uniformly terrible compared to the libx265 results. With default options, it produced a 6,124 kB file, compared to the best-case 8,900 kB file here.

These are all downright zippy compared to libx265; the slow preset there took 45.34s, 4x as long as p7.

-rc vbr

In general, we want to use variable bitrate encoding. This appears to be the default for hevc_nvenc when used with -cq:v; adding -rc vbr gave identical file sizes and VMAF scores to runs that didn’t use an -rc flag.

Most of my test runs have the flag included anyway even though it’s effectively a no-op.

-tune uhq

FFMPEG 12.2 added an “ultra-high quality” -tune uhq option. Turning it on drastically improved the results:

So, with -tune uhq, the worst preset is better than the best without -tune uhq.

It’s a little bit slower, but not terrible:

So p5 takes a bit of a hit, but the drop in file size is mostly worth the extra time.

Given the numbers overall, I’m going to concentrate on -preset p7 -tune uhq from here on out.

pix_fmt

Unlike libx265, dropping to 10-bit 4:2:0 actually helps reduce the size of the output. nVidia only supports H.265 4:2:2 on RTX 5xxx and newer GPUs, so I can’t test it on my RTX 4090.

Like libx265, 8-bit 4:2:0 is substantially worse than 10-bit 4:2:0.

GOP

Increasing the GOP length helps nVidia’s encoder, just like it helps libx265:

So, with this we can get hevc_nvenc down to 4,384 kB, compared to 4,232 kB for libx265. Since this only takes 14.36s to encode vs 38.64s for libx265, I would prefer to use hevc_nvenc. I’m perfectly willing to spend a few percent more storage and bandwidth in exchange for a 60% drop in compression time.

Unfortunately, as mentioned earlier, the nVidia-encoded version of my test video is missing a bunch of details that the software-encoded version retained, and increasing to -cq:v won’t get them back. I still haven’t found the flag that will disable whatever is going on under the hood, if it even exists.

Overall results

Here’s the full set of VMAF=95 results for hevc_nvenc, sorted by size. Note that -pix_fmt yuv444p10le and -preset medium are defaults and may not always be shown.

flags	kbytes	walltime	vmaf	vmaf_neg	mbps
`-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -g 600 -keyint_min 600 -cq:v 33.4`	4384	14.36	95.013901	93.685362	1.169
`-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 33.7`	4876	14.62	95.030828	93.720213	1.3
`-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p6 -tune uhq -cq:v 33.7`	4884	14.37	95.028366	93.716392	1.302
`-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p5 -tune uhq -cq:v 33.2`	5016	14.35	95.005851	93.712047	1.338
`-rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 33.7`	5076	14.86	95.015688	93.698684	1.354
`-rc vbr -c:v hevc_nvenc -preset p6 -tune uhq -cq:v 33.7`	5080	14.6	95.011742	93.69536	1.355
`-rc vbr -c:v hevc_nvenc -preset p5 -tune uhq -cq:v 33.2`	5260	14.61	95.001358	93.711044	1.403
`-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p4 -tune uhq -cq:v 33.4`	5664	6.61	95.01387	93.716526	1.51
`-pix_fmt yuv420p -g 600 -keyint_min 600 -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 32.7`	5688	14.35	95.016777	93.672057	1.517
`-rc vbr -c:v hevc_nvenc -preset p4 -tune uhq -cq:v 33.2`	5948	7.36	95.031411	93.730912	1.586
`-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p3 -tune uhq -cq:v 33.4`	6140	6.35	95.114432	93.820198	1.637
`-rc vbr -c:v hevc_nvenc -preset p3 -tune uhq -cq:v 33.4`	6236	7.36	95.077299	93.769911	1.663
`-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p1 -tune uhq -cq:v 34.2`	6340	5.1	95.149738	93.848934	1.691
`-pix_fmt yuv420p10le -rc vbr -c:v hevc_nvenc -preset p2 -tune uhq -cq:v 34.2`	6340	5.11	95.149738	93.848934	1.691
`-pix_fmt yuv420p -rc vbr -c:v hevc_nvenc -preset p7 -tune uhq -cq:v 32.7`	6428	14.35	95.082293	93.75149	1.714
`-rc vbr -c:v hevc_nvenc -preset p1 -tune uhq -cq:v 34.2`	6456	6.85	95.105326	93.798717	1.722
`-rc vbr -c:v hevc_nvenc -preset p2 -tune uhq -cq:v 34`	6456	6.61	95.105326	93.798717	1.722
`-rc vbr -c:v hevc_nvenc -preset p6 -cq:v 26.2`	8900	9.35	95.207754	93.911661	2.373
`-rc vbr -c:v hevc_nvenc -preset p7 -cq:v 26.2`	8912	10.1	95.219151	93.919735	2.377
`-rc vbr -c:v hevc_nvenc -preset p5 -cq:v 26.2`	8924	4.85	95.164084	93.87303	2.38
`-rc vbr -c:v hevc_nvenc -preset p4 -cq:v 26.2`	8960	4.36	95.158709	93.86267	2.389
`-rc vbr -c:v hevc_nvenc -cq:v 26.2`	8960	4.34	95.158709	93.86267	2.389
`-c:v hevc_nvenc -cq:v 26.2`	8960	4.35	95.158709	93.86267	2.389
`-rc vbr -c:v hevc_nvenc -preset p3 -cq:v 26.9`	9808	4.1	95.201238	93.849491	2.615
`-rc vbr -c:v hevc_nvenc -preset p2 -cq:v 27.9`	10092	3.86	95.033091	93.659736	2.691
`-rc vbr -c:v hevc_nvenc -preset p1 -cq:v 27.7`	12276	3.86	95.156093	93.794733	3.274

Conclusions

A few conclusions, based on this one video and my intended uses. YMMV.

Ignoring speed, libx265 is generally better. It produces better quality videos, has fewer fiddly options, and produces smaller files. Even using libx265 with the default flags isn’t terrible, and switching to -preset slow gets you within 10% of the best results that I’ve found so far.
If you care about quality or size at all with hevc_nvenc, then always turn on -tune uhq and adjust your quality parametersFor p7, VMAF 95 happens at -cq:v 26.2 without uhq and -cq:v 33.7 with uhq. So you can’t just enable uhq without tweaking other settings and expect the best results.. It’s dramatically better than hevc_nvenc’s defaults, which are pretty bad.
For both encoders, there isn’t a huge difference between the moderately-fast encoder presets. Using -preset fast vs -preset faster or -preset veryfast is only a few percent change. Similarly, p1 through p4 are fairly similar if -tune uhq is enabled.
Adjusting the GOP size for streaming especially with low-motion video makes a big difference.
Chroma subsampling doesn’t make much of a difference, and may hurt size more than it helps.
Dropping from 10-bit to 8-bit H.265 is a bad idea and hurts compression ratios.

Future Work

I’m absolutely confident that I’m missing a few moderately-useful flags here, but I doubt that they’ll make more than a few percent difference. I’ve done some testing with a handful of additional hevc_nvenc flags that didn’t show any difference at all. I’ve omitted them for now, but I may retest later.

Let me know if you disagree with any of this or have suggestions.

I’ll be testing H.264 and AV1 in a week or two.

Finally, I’ll drop a link to my VMAF testing script once I’ve had a chance to clean it up a bit.

Testing Details

TL;DR

libx265 results

-preset

-tune fastdecode

GOP intervals

-pix_fmt

Overall results

libx265 summary

hevc_nvenc results

-preset

-rc vbr

-tune uhq

pix_fmt

GOP

Overall results

Conclusions

Future Work

`libx265` summary