Becoming a Video Engineer

7 min read Original article ↗

I’ve recently shared that I’ll be leaving Snap, my professional home for the last 8 years. I was a founding engineer of the team that ran the video transcoding service in 2019, and the last OG still in the team.

One of the most recurring questions I’ve got recently is how a generalist engineer enters the media space. So, here we go. My role was quite broad. I’ve cared for media capture, ingestion, uploads, backend transcoding, downloads, video quality selection, media playback, CDN selection, etc. Doing this for 850 million monthly active users is no easy feat, but it’s been so much fun. Not all will apply to you, but here’s my advice based on my learnings.

There are different archetypes of engineers working on “video” or more broadly “media” (which includes still images, audio, etc). Some engineers will work in distributed systems that happen to handle video at scale. They could be handling stock prices, Uber rides, or delivering caskets to funeral homes. They don’t care that much, they just operate a distributed service and know how to operate it well. Their video knowledge may be limited to naming the tools it uses, knowing that it is computationally intensive, and what are efficient hardware choices. I’m passionate about the topic and it’s been a large part of my role, but there’s already a lot of literature; start with Google’s Site Reliability Engineering book if it’s new to you. Even large companies often use third-parties to “transcode on the cloud”. That is, a SaaS service such as Mux (mux.com) which frees them from having to operate the services themselves.

Getting Started with Video Processing

I’ll assume you are not one of those engineers, that you’d like to learn the ropes of video processing. We can zoom in on backend transcoding, which is what can be more obscure to the generalist engineer. I’ll do oversimplifications that hurt, but hopefully make the material more digestible. I’ll focus on VOD (video on demand), as opposed to live streaming.

The first take-away to engrave in your brain is that video that travels over the internet needs to be compressed, full stop. The only questions are when and how.

Side note: Video engineers love lingo. It makes us feel smarter. Don’t be discouraged, I’ll translate it for you. We don’t say “compressed” video unless we are talking to lay people, video engineers say “encoded” or “transcoded”.

When to Compress Video

We can compress it either before it is uploaded -on the client-, after it’s been uploaded to the backend, or both. “Client” can mean many things: a mobile application, a web page, or a software program like OBS or a dedicated hardware encoder.

If a video has been compressed on the device and then uploaded to be shared later, you may consider not transcoding it again on the backend. After all, transcoding is costly, and the video might lose some fine details in the process.

When you expect multiple users to play the video (think Youtube), it is almost always advisable to encode on the backend. First, you’ll know exactly the properties of the video that your viewers will attempt to play. Every device encodes videos differently. At scale you will find corrupt timestamps, extravagant bitrates, and many other issues. These will manifest differently in different players. If you don’t encode your inputs to deliver homogeneous videos to your customers, they will soon have unpredictable experiences and say your service sucks. But your service was not even involved!

Second, encoding on the backend will give you a chance of reducing the bitrate, resulting in lower data fees and a better experience for the viewers. Viewers will load videos faster and preserve their precious data plans. Third, you’ll have the chance to encode in multiple qualities to cater to a diversity of devices and variable network conditions.

Enter FFmpeg

Let’s talk now about how. FFmpeg, this is it. It’s the open source, free and powerful Swiss Army knife for all your encoding needs. It can read any video under the sun, use any encoder of choice, and output in every conceivable format. FFmpeg works like a wrapper of different libraries (libavcodec, libavformat), you can assume any encoder and format you’ll need will be supported.

FFmpeg usability is daunting at first, but the basics are actually fairly easy to crack.

Start with this:

ffmpeg -i input.mp4 -vf scale=1280:720 output.mp4

There are 3 components, which is simple:

  • Input file (-i input.mp4)
  • Video filter (-vf scale=1280:720), in this case resize to a 720p resolution
  • Output file (output.mp4)

The command can get more complex, but will preserve the structure “ffmpeg {input} {parameters} {output}”:

ffmpeg -i input.mp4 \
   -c:v libx264 \
   -preset slower \
   -crf 23 \
   -vf scale=1280:-2 \
   -c:a aac -b:a 192k \
   -movflags +faststart \
   -profile:v high -level:v 4.1 \
   -maxrate 5M -bufsize 10M \
   output.mp4
  • -c:v libx264: x264, a well-known open source encoder that encodes H.264 video
  • -preset slower: Higher quality encoding (options range from ultrafast to veryslow)
  • -crf 23: Constant Rate Factor for quality (lower = better, range 0-51, 23 is default)
  • -vf scale=1280:-2: Resize to 720p while maintaining aspect ratio
  • -c:a aac: Use AAC audio codec
  • -b:a 192k: Set audio bitrate to 192kbps
  • -movflags +faststart: Enables streaming by moving metadata to file start
  • -profile:v high: Use high profile H.264 encoding
  • -level:v 4.1: Set H.264 level compatibility
  • -maxrate 5M: Maximum bitrate of 5 Mbps
  • -bufsize 10M: Video buffer size of 10 MB

It used to be very time-consuming to create or even interpret all the flags of such a command, but ChatGPT or Claude are really helpful. For example, “what’s -crf 23, and how do I select an optimal value?”

The other cool thing is that FFmpeg can take a number of inputs, combine them in different ways and produce an output. Or take one input and produce multiple outputs (for example, at different resolutions). Or even take multiple inputs, mix-and-match them, and produce multiple outputs.
Even though LLMs can help, I would recommend an easy book such as FFMPEG - From Zero to Hero or FFmpeg Basics: Multimedia handling with a fast audio and video encoder. They read like cookbooks which are easy to follow and you’ll get a very good intuition of what can be achieved. Then use LLMs for more targeted questions.

Eventually you may want to go deeper, using directly the libraries that FFmpeg uses, but for most practical cases, the FFmpeg command-line is all you need. When you are ready, do yourself a favor and follow Leandro Moreira’s tutorial. It’s incredibly useful to understand how video works and peek into the black box just enough to have a sense.

All this barely scratches the surface, but it will enable you to get your bearings. There are many cans of worms we haven’t even opened yet: what codecs to use when, trade-offs, packaging and distributing these videos, etc. Maybe another day.

The Devil is in the Details

One last word: media handling is about the details. Watching a video, noticing the details, tweaking different settings. Can you notice the differences? What if you play them on the phone instead of a computer monitor? How colors change? Test something with your users, run an A/B test and go deep in the metrics you observe. The ones you expected and the ones you didn’t. Maybe something was fantastic on paper but it takes much longer to get playback started which ruins the experience. Is it somehow fixable? How, or why is it not fixable?

There’s a lot of grunt work that makes the difference between good and great. Between meets and exceeds. It’s not for everyone. I for one love it.

Resources and Community

Now, you are just getting started! Here’s what you can do:

  1. First, familiarize yourself using the resources I’ve pointed to. Read https://awesome.video
  2. Join the video-dev community. Full of amazing, generous people. Get into their Slack, make yourself at home
  3. Get to know many of them in a yearly conference called Demuxed. Best of all, their recordings are on Youtube. Just watch them! There are sessions about all of the topics I’ve listed at the beginning: on-device encoding, uploads, CDNs, encoding, FFmpeg, etc.
  4. Attend video meetups, really interesting, wonderful people

I’ve had three teachers: an awesome mentor (CkH), the Demuxed conference and community, and working every day until figuring things out. Find the approach that works best for you!