Reference Guide

FFmpeg Intro

A practical introduction to FFmpeg — what it is, where it came from, and how to use it for the most common audio and video tasks. Built around real command examples you can adapt as you go.

Updated 2026-05-08 View on GitHub

1. What is FFmpeg?

FFmpeg is a free, all-in-one command-line tool for working with audio and video. It can record, convert, resize, crop, cut, combine, extract frames, measure quality — basically anything you'd want to do with a media file, without ever opening a GUI.

You don't need to memorize the exact commands — but it's useful to know which options exist, so you recognize what's possible and can look up the right syntax when you need it. Most people keep a small list of recipes (like this page) and adapt them as they go.

Official site: ffmpeg.org — downloads, news and the source code.
Official documentation: ffmpeg.org/documentation — full reference for every command, flag, and filter.
Companion tools: ffprobe (inspect files) and ffplay (quick playback) ship in the same bundle.

Tools and services built on FFmpeg

Even if it's the first time you hear about it, chances are you've already used FFmpeg without knowing — it's the engine behind a huge slice of the media ecosystem:

ToolWhat it isHow it uses FFmpeg
VLC media playerPopular open-source video playerUses libavcodec / libavformat (FFmpeg's libraries) for decoding most formats
OBS StudioStreaming & screen-recording app used by Twitch/YouTube creatorsBundles FFmpeg for recording, encoding and muxing output
ShotcutOpen-source video editorBuilt on top of FFmpeg's libraries for import, export and rendering
AudacityAudio editorOptional FFmpeg plugin enables import/export of MP3, M4A, WMA, etc.

So when you learn FFmpeg directly, you're learning the same engine that's already running quietly inside dozens of products you use every day.

2. A bit of history

FFmpeg has been around longer than YouTube. Knowing where it came from explains why it shows up in so many places today.

  1. 2000
    Created by French programmer Fabrice Bellard. Initial focus: a fast, minimal MPEG video encoder/decoder.
  2. 2003
    Bellard handed over maintenance to Michael Niedermayer, who led the project for the next decade-plus.
  3. 2004–2010
    Rapid growth — libavcodec and libavformat become the de-facto open-source codec stack, adopted by VLC and many other media tools.
  4. 2017+
    Native support added for modern codecs (HEVC/H.265, VP9, AV1), hardware acceleration, and quality metrics (PSNR, SSIM, VMAF).
  5. Today
    Powers a huge portion of the world's media infrastructure — browsers, streaming platforms, broadcast pipelines, mobile apps, and more. See the projects using FFmpeg wiki page.

What does the name mean?

"MPEG" refers to the Moving Picture Experts Group, the standards body behind MPEG-1, MPEG-2, MPEG-4 and many of the video formats FFmpeg was originally built to handle. The "FF" part is more interesting — for years there was lively speculation on the mailing list about what those letters really meant, until Fabrice Bellard himself stepped in to settle it: in a 2006 ffmpeg-devel post he confirmed the original meaning is simply "Fast Forward", the playback-control symbol on tape decks and remotes.

Fun fact: the project's mascot is a stylized zigzag pattern derived from a zigzag scan used in JPEG/MPEG block encoding.

3. Installing it

FFmpeg ships as a single executable. Pick the route for your platform — the rest of this guide uses the same commands across all of them.

Windows

  1. Go to gyan.dev/ffmpeg/builds — the recommended Windows build provider (also linked from the official ffmpeg.org/download page). The page offers two build streams: a release build (based on the latest stable FFmpeg release — recommended for most users) and a git master build (built from the development branch — gets new features and fixes earlier, but slightly less battle-tested).
  2. Pick a build (see the table below), download the .7z or .zip, and unzip it to a folder of your choice.
  3. Add the unzipped ffmpeg/bin folder to your PATH environment variable — this lets you run ffmpeg directly from any cmd or PowerShell window, without having to type its full path each time.

Essentials vs Full — which build do I want?

BuildSizeIncludesUse it when…
Essentials ~30 MB The common codecs (H.264/x264, H.265/x265, AAC, Opus, VP9, AV1) and basic filters. No GPL-only or research libraries. You just want to convert / cut / capture / play files. 90% of users.
Full ~150 MB Everything in Essentials plus extras like libvmaf, libtensorflow, frei0r, chromaprint, additional decoders and analyzers. VMAF, ML-based filters, or any of the niche libraries are needed.

Linux & macOS

FFmpeg is available through every major package manager (apt, dnf, pacman, Homebrew, MacPorts, …). For install instructions per distro and Mac, plus source builds and nightly releases, see the official downloads page at ffmpeg.org/download.

Verify the install: run ffmpeg -version. If it prints a version banner, you're set. The configuration line tells you which features your build supports (e.g. --enable-libvmaf, --enable-nvenc) — handy when a feature later seems missing.

4. Key concepts

The shape of an FFmpeg command

Every FFmpeg command reads roughly the same way:

ffmpeg [global opts]  -i INPUT  [input opts]  [filters]  [output opts]  OUTPUT
  • -i introduces an input. You can have multiple inputs (e.g. video + audio + reference for VMAF).
  • Options that come before -i apply to the input. Options after all inputs apply to the output.
  • The output filename comes last. No flag — just the path.
Order matters! -ss 30 -i input.mp4 seeks before decoding (fast). -i input.mp4 -ss 30 seeks after decoding (slow but precise).

Streams inside a container

A media file is a container that holds one or more streams — video, audio, subtitles, sometimes data. A single .mp4 might have one video stream and two audio tracks (e.g. original + dubbed); an audio file can have an embedded cover image as a video stream. FFmpeg addresses streams using INPUT:TYPE:INDEX:

  • 0:v:0 — input file 0, first video stream
  • 0:a:0 — input file 0, first audio stream
  • 0:a:1 — input file 0, second audio stream (e.g. the dub)
  • 1:s:0 — input file 1, first subtitle stream

The -map flag explicitly chooses which streams end up in the output. Without it, FFmpeg auto-picks the "best" stream of each type — usually the right call, occasionally a surprising one.

Container vs. codec

The container is the file's wrapper (.mp4, .mkv, .webm, …). The codec is the algorithm that compresses the audio or video data inside it. They are independent choices:

  • The output filename's extension determines the container.
  • -c:v / -c:a determines the codec inside.

Each container only supports certain codecs. For example, .webm only accepts VP8 / VP9 / AV1 video and Vorbis / Opus audio. Mismatched combinations fail with errors like "could not find tag for codec X in stream Y." When in doubt, MP4 + H.264 + AAC is the universal-compatibility default.

Beyond basic compatibility, many flags only work with specific encoders or containers — even though their names look generic. A few common gotchas:

  • -crf (constant quality) is supported by libx264, libx265, and libvpx-vp9. NVIDIA's h264_nvenc uses -cq instead — same idea, different flag name.
  • -preset slow makes sense for libx264 (values: ultrafastveryslow). NVENC also accepts -preset, but with values like p1 through p7 — same option name, completely different value scheme.
  • -tune zerolatency, -tune film, etc. are libx264/libx265-specific. Hardware encoders ignore them.
  • -movflags +faststart (moves the index to the beginning of the file for streamable MP4) only applies to MP4 and MOV containers. Silently ignored elsewhere.
  • Trying to put H.264 in a .webm output is rejected outright — the container's spec doesn't allow it.

The takeaway: when copying a command from somewhere, check that its flags actually apply to the encoder and container you're using. The errors appendix lists the typical failure messages.

Output is optional

Not every command needs an output file. Replacing the output with -f null - tells FFmpeg to process the input but throw the result away. Useful when you only care about the side effects — analysis logs (VMAF, cropdetect), validating a file, or testing a filter graph without writing anything to disk.

5. Flag cheat-sheet

FlagWhat it doesExample
-iInput file or device-i input.mp4
-c:vVideo codec-c:v libx264
-c:aAudio codec-c:a aac
-c copyStream copy (no re-encode)-c copy
-b:v / -b:aVideo / audio bitrate-b:v 2500k
-sResolution-s 1920x1080
-rFramerate (FPS)-r 60
-pix_fmtPixel format-pix_fmt yuv420p
-acAudio channel count-ac 1 (mono)
-tDuration-t 30 (30 seconds)
-ss / -toSeek to / stop at-ss 00:01:00 -to 00:01:30
-vfVideo filter chain-vf crop=500:500
-filter_complexMulti-input filter graph-filter_complex hstack=inputs=2
-vframesNumber of video frames to write-vframes 1
-yOverwrite output without asking-y
-hide_bannerQuieter output-hide_banner

6. First commands

Inspect a file

ffprobe -i input.mp4 -hide_banner
Explain this command
ffprobeCompanion tool to FFmpeg that reads media files and prints information about them.
-i input.mp4The file to inspect.
-hide_bannerSkip the build/version banner so the output focuses on the file's metadata.

Convert to a different container / codec

ffmpeg -i input.avi -c:v libx264 -c:a aac output.mp4
Explain this command
-i input.aviSource file. The container (.avi) is detected automatically.
-c:v libx264Video codec — H.264 via the libx264 encoder.
-c:a aacAudio codec — AAC.
output.mp4Output filename. The .mp4 extension determines the container.

Resize to a specific resolution

ffmpeg -i input.mp4 -vf scale=1280:720 -b:v 1500k output_720p.mp4
Explain this command
-i input.mp4Input file.
-vf scale=1280:720Video filter: resize the picture to 1280×720.
-b:v 1500kTarget video bitrate — 1.5 Mbps.
output_720p.mp4Output file.

Extract the audio track

ffmpeg -i input.mp4 -q:a 0 -map a audio.wav
Explain this command
-i input.mp4Input file (with both video and audio).
-q:a 0Highest audio quality (variable bitrate, 0 = best, 9 = worst).
-map aPick all audio streams from the input; drop video.
audio.wavAudio-only output file.

Remove the audio track

ffmpeg -i input.mp4 -c copy -an output.mp4
Explain this command
-i input.mp4Input file.
-c copyCopy streams without re-encoding — fast and lossless.
-anDrop the audio (a = audio, n = none).
output.mp4Video-only output file.

7. Capturing from a device windows

List capture devices

ffmpeg -list_devices true -f dshow -i dummy
Explain this command
-list_devices truePrint available capture devices instead of recording.
-f dshowUse Windows DirectShow as the input format.
-i dummyRequired placeholder input — not actually read.

Show options for one device

Replace Video Capture Device / Audio Capture Device below with the exact name -list_devices printed for your hardware (e.g. a webcam, capture card, or HDMI grabber).

ffmpeg -list_options true -f dshow -i video="Video Capture Device"
ffmpeg -list_options true -f dshow -i audio="Audio Capture Device"
Explain this command
-list_options truePrint supported settings (resolutions, framerates, pixel formats) for the selected device.
-f dshowDirectShow input.
-i video="…" / -i audio="…"Device to query — video or audio variant.

Record 30 seconds of HDMI input (video + audio, GPU-encoded)

ffmpeg -rtbufsize 150M -f dshow ^
  -i video="Video Capture Device":audio="Audio Capture Device" ^
  -c:v h264_nvenc -s 1920x1080 -r 60 -b:v 2500k -pix_fmt yuv420p ^
  -ac 1 -b:a 128k -profile:a aac_main ^
  -t 30 "C:\temp\recording.mp4"
Explain this command
-rtbufsize 150MReal-time input buffer — prevents frame drops at high data rates.
-f dshowDirectShow input format.
-i video=…:audio=…Combined video + audio source — both grabbed from the same dshow command.
-c:v h264_nvencNVIDIA hardware H.264 encoder.
-s 1920x1080Output resolution.
-r 60Framerate — 60 fps.
-b:v 2500kTarget video bitrate — 2.5 Mbps.
-pix_fmt yuv420pPixel format — broad-compat 4:2:0 chroma.
-ac 1One audio channel (mono).
-b:a 128kAudio bitrate — 128 kbps.
-profile:a aac_mainAAC profile.
-t 30Stop after 30 seconds.
"C:\temp\recording.mp4"Output file path.
What's happening: dshow input → NVENC h264 video at 1080p60 / 2.5 Mbps → mono AAC audio at 128 kbps → stop after 30s.

Real device names tend to be longer and include the manufacturer / port index. As an example, a Magewell Pro Capture Quad HDMI card with the second port selected might appear in -list_devices as Video (00-1 Pro Capture Quad HDMI), and the same command would look like:

ffmpeg -rtbufsize 150M -f dshow ^
  -i video="Video (00-1 Pro Capture Quad HDMI)":audio="Audio (00-1 Pro Capture Quad HDMI)" ^
  -c:v h264_nvenc -s 1920x1080 -r 60 -b:v 2500k -pix_fmt yuv420p ^
  -ac 1 -b:a 128k -profile:a aac_main ^
  -t 30 "C:\temp\recording.mp4"
Explain the device-name part
Video (00-1 Pro Capture Quad HDMI)The exact device name as printed by -list_devices, including the parentheses. Must be quoted in full because of the spaces.
00-1Card / port index. 00 = the first card in the system; -1 = its second port (zero-indexed). A four-port capture card would expose 00-0 through 00-3.
Pro Capture Quad HDMIVendor / model name as the driver registers it.
Audio (00-1 Pro Capture Quad HDMI)The audio side of the same physical input — same manufacturer + port index, but registered as an audio device. Many capture cards expose video and audio as two separate dshow devices that you have to combine yourself.

All other flags work identically to the explainer for the previous command.

Snap one frame from a webcam

ffmpeg -f dshow -i video="Integrated Camera" -vframes 1 output.png
Explain this command
-f dshowDirectShow input.
-i video="Integrated Camera"Built-in laptop webcam (the standard Windows device name).
-vframes 1Capture exactly one frame, then stop.
output.pngSave the still as a PNG.

8. Cut, crop, snapshot

Cut without re-encoding (keyframe-precision, super fast)

ffmpeg -ss 00:02:29 -to 00:02:39 -i input.mp4 -c copy clip.mp4
Explain this command
-ss 00:02:29Seek to start time. Placed before -i = fast keyframe-aligned input seek.
-to 00:02:39Stop at this absolute timestamp.
-i input.mp4Source file.
-c copyCopy streams as-is, no re-encoding — instantaneous.
clip.mp4Output. Cut points snap to the nearest keyframe.

Cut with re-encoding (frame-precise)

ffmpeg -i input.mp4 -ss 180 -to 190 -c:v libx264 -crf 0 clip_precise.mp4
Explain this command
-i input.mp4Source file.
-ss 180Seek to t=180s. Placed after -i = decode-and-discard until that frame (slower, exact).
-to 190Stop at t=190s.
-c:v libx264Re-encode video with x264 (required for frame-exact cuts).
-crf 0Lossless quality (CRF 0).
clip_precise.mp4Output.

Crop a 500×500 area starting at (100,100)

ffmpeg -i input.mp4 -filter:v crop=500:500:100:100 cropped_corner.mp4
Explain this command
-i input.mp4Source file.
-filter:v crop=W:H:X:YVideo filter. Here W=500, H=500, X=100, Y=100 — a 500×500 box starting 100px from the top-left.
cropped_corner.mp4Output.

Crop 500×500 from the center

ffmpeg -i input.mp4 -filter:v crop=500:500 cropped_center.mp4
Explain this command
-filter:v crop=500:500Same crop filter, but with X and Y omitted — defaults to centering the crop window in the frame.
cropped_center.mp4Output.

Detect black bars (so you know what to crop)

ffmpeg -i input.mp4 -vf cropdetect=limit=120 -f null -
Explain this command
-i input.mp4Source file.
-vf cropdetect=limit=120Analyze each frame for black borders. limit = brightness threshold (0–255); pixels below count as "black".
-f null -Discard the output stream — we only care about the suggested crop values printed to the log.

Side-by-side comparison of two videos

ffmpeg -i video_left.mp4 -i video_right.mp4 -filter_complex hstack=inputs=2 sidebyside.mp4
Explain this command
-i video_left.mp4First input — placed on the left.
-i video_right.mp4Second input — placed on the right.
-filter_complex hstack=inputs=2Horizontally stack the two inputs. (For top/bottom use vstack.)
sidebyside.mp4Combined output.

Take a snapshot (first I-frame at second 1)

ffmpeg -ss 1 -i input.mp4 -vf select="eq(pict_type\,I)" -vframes 1 snapshot.jpg
Explain this command
-ss 1Skip the first 1 second of the input (fast seek).
-i input.mp4Source file.
-vf select="eq(pict_type\,I)"Keep only I-frames (keyframes). The backslash escapes the comma so it's not parsed as a filter separator.
-vframes 1Take exactly one matching frame.
snapshot.jpgSave as JPG.

9. Working with individual frames

Heads up: FFmpeg numbers frames starting at 0. So if you want frames 1, 60, 120, 240 — ask for 0, 59, 119, 239.

Extract specific frames as BMP

ffmpeg -hide_banner -i input.mp4 ^
  -vf select="eq(n\,0)+eq(n\,59)+eq(n\,119)+eq(n\,239)" ^
  -vframes 4 -fps_mode passthrough -y "frame_%d.bmp"
Explain this command
-hide_bannerSkip the build/version banner.
-i input.mp4Source file.
-vf select="eq(n\,0)+eq(n\,59)+…" Select specific frames by zero-indexed number. Breaking the expression down:
  • select=… — keep only frames where the inner expression is true.
  • eq(a, b) — equality function; returns 1 if a equals b, otherwise 0.
  • n — built-in variable holding the current frame's zero-indexed number (0 for the first frame, 1 for the second, …).
  • \, — a comma escaped with a backslash. Inside FFmpeg's filter syntax a bare , is a filter separator, so the comma between eq's arguments must be escaped.
  • + — logical OR; joins multiple conditions so several frames match.
So eq(n\,0) means "this frame is frame 0" (the very first one), eq(n\,59) means "this frame is frame 59" (the 60th one), and the full expression keeps only frames 0, 59, 119, and 239.
-vframes 4Stop after writing 4 frames.
-fps_mode passthroughPreserve input timestamps (don't try to fill gaps).
-yOverwrite outputs without asking.
"frame_%d.bmp"Output pattern. %d = sequence number (1, 2, 3, 4).

Build a video from a sequence of images

ffmpeg -i "frame_%d.bmp" -y -c:v libx264 -pix_fmt yuv420p -crf 16 output.mp4
Explain this command
-i "frame_%d.bmp"Input image sequence. %d matches numbered frames (frame_1.bmp, frame_2.bmp, …).
-yOverwrite output without asking.
-c:v libx264Encode with x264 (H.264).
-pix_fmt yuv420pUniversal-compat pixel format.
-crf 16High visual quality (lower CRF = better; range 0–51).
output.mp4Resulting video file.

Seek + step a few frames forward

ffmpeg -hide_banner -ss 249.3 -i input.mp4 -vf select="eq(n\,2)" -vframes 1 -y snapshot.png
Explain this command
-ss 249.3Seek to 249.3 seconds before decoding (fast).
-i input.mp4Source file.
-vf select="eq(n\,2)"From the seek point, advance 2 frames forward and select that one.
-vframes 1Take exactly one frame.
-y snapshot.pngOverwrite output as PNG.

10. CPU vs GPU encoding

Encoding is the slow part. If your GPU has a hardware encoder, you can offload it with one flag swap.

HardwareH.264 codecH.265 codec
CPU (any)libx264libx265
NVIDIAh264_nvenchevc_nvenc
AMDh264_amfhevc_amf
Intel QuickSynch264_qsvhevc_qsv
ffmpeg -i input.mp4 -c:v libx264     output.mp4   # CPU
ffmpeg -i input.mp4 -c:v h264_amf    output.mp4   # AMD GPU
ffmpeg -i input.mp4 -c:v h264_nvenc  output.mp4   # NVIDIA GPU
Explain this command
-i input.mp4Source file (same in all three).
-c:v libx264Software CPU encoder. Best quality per bit, slowest.
-c:v h264_amfAMD GPU H.264 encoder (Advanced Media Framework).
-c:v h264_nvencNVIDIA GPU H.264 encoder (NVENC).
output.mp4Output file.
Rule of thumb: CPU = best quality per bit, slower. GPU = much faster, slightly larger files at the same visual quality. For real-time capture, always use GPU.

11. Metadata and FFprobe

Add custom metadata fields

You can attach standard tags (title, artist, comment, year) or arbitrary key/value pairs of your own. With -c copy the streams aren't re-encoded — only the container's tag table is rewritten, so this is essentially instantaneous.

ffmpeg -i input.mp4 ^
  -metadata title="My Recording" ^
  -metadata artist="Studio Team" ^
  -metadata comment="Original capture, take 3" ^
  -metadata custom_field="value-of-your-choice" ^
  -movflags +use_metadata_tags -c copy output.mp4
Explain this command
-i input.mp4Source file.
-metadata title="…"Set / override the standard title tag.
-metadata artist="…"Standard artist tag.
-metadata comment="…"Standard comment tag.
-metadata custom_field="…"An arbitrary key/value pair you make up. Useful for tagging files with project- or workflow-specific info.
-movflags +use_metadata_tagsRequired for non-standard tags to be preserved in MP4 / MOV containers.
-c copyDon't re-encode — just rewrite the container's tag table.
output.mp4Tagged output file.

Read metadata back

ffmpeg -i output.mp4 -hide_banner
Explain this command
-i output.mp4The file to inspect. With no output specified, FFmpeg just prints what it knows about the input and stops.
-hide_bannerSkip the build/version header so the metadata stands out.

Per-frame info with ffprobe

ffprobe -i input.mp4 -show_frames
Explain this command
ffprobeInspector tool — prints structured info about a media file.
-i input.mp4File to analyze.
-show_framesDump one block per frame: picture type (I/P/B), PTS, size, pixel format, and more.

Each frame block tells you the picture type (I/P/B), PTS, size, pixel format, etc. — useful when you're debugging encoder behaviour or hunting a specific keyframe.

12. VMAF — measuring video quality netflix

VMAF (Video Multimethod Assessment Fusion) is Netflix's perceptual quality metric. Score range: 0–100. Higher = closer to the reference. It correlates with what humans actually perceive much better than PSNR alone.

Prerequisites

  • An FFmpeg build compiled with --enable-libvmaf. Check with: ffmpeg -h filter=libvmaf
  • A VMAF model file, e.g. vmaf_v0.6.1.json (available from the Netflix/vmaf GitHub).
  • Two videos to compare: a distorted one (e.g. a re-encoded version) and the original/reference.
Important: the reference (original) video must be the second input to FFmpeg. Distorted = first input.

Basic VMAF score

ffmpeg -i distorted.mp4 -i original.mp4 ^
  -lavfi libvmaf=model_path="C\\:/ffmpeg/model/vmaf_v0.6.1.json" -f null -
Explain this command
-i distorted.mp4First input — the encode being scored.
-i original.mp4Second input — the reference (must be the second input).
-lavfi libvmaf=…Apply the libvmaf filter to compare the two inputs.
model_path="C\\:/…/vmaf_v0.6.1.json"Path to the VMAF model file. Windows colons are escaped as \\: because : is a filter-graph separator.
-f null -Don't write an output file — only the score in the log matters.

Save scores to CSV

ffmpeg -i distorted.mp4 -i original.mp4 ^
  -lavfi libvmaf=model_path="C\\:/ffmpeg/model/vmaf_v0.6.1.json":log_path="C\\:/out/vmaf.csv":log_fmt=csv ^
  -f null -
Explain this command
model_path="…"VMAF model file (same as basic).
log_path="C\\:/out/vmaf.csv"Where to write the per-frame log.
log_fmt=csvLog format. csv is easiest to import into a spreadsheet; json and xml also work.

Faster scan (subsample every N frames)

ffmpeg -i distorted.mp4 -i original.mp4 ^
  -lavfi libvmaf=model_path="C\\:/ffmpeg/model/vmaf_v0.6.1.json":log_path="C\\:/out/vmaf.csv":log_fmt=csv:n_subsample=20 ^
  -f null -
Explain this command
n_subsample=20Score every 20th frame instead of every frame — roughly 20× faster, with minimal impact on the aggregate score.

VMAF + SSIM + PSNR all in one pass

ffmpeg -i distorted.mp4 -i original.mp4 ^
  -lavfi libvmaf=model_path="C\\:/ffmpeg/model/vmaf_v0.6.1.json":log_path="C\\:/out/vmaf.csv":log_fmt=csv:n_subsample=20:ssim=true:psnr=true ^
  -f null -
Explain this command
ssim=trueAlso compute SSIM (Structural Similarity) for every frame.
psnr=trueAlso compute PSNR (Peak Signal-to-Noise Ratio).

All three metrics land in the same CSV alongside the VMAF columns.

Reading the results

VMAF scoreRough interpretation
95–100Visually indistinguishable from source
80–94Excellent — most viewers won't notice impairments
60–79Good — some artifacts visible on close inspection
40–59Fair — noticeable degradation
< 40Poor — clearly degraded
Gotchas:
  • Both inputs must have the same resolution and framerate. Pre-scaling or resampling may be needed.
  • Frame counts should match — drift between inputs causes sync issues that distort the score.
  • The Windows path escaping (C\\:/ffmpeg/...) is required because the colon is a filter-graph separator.

A. Appendix

Additional reference material. Each appendix entry is self-contained — useful when you need a deeper look at a specific topic beyond the core walkthrough above.

A.1 Common error messages

A reference for the FFmpeg errors you're most likely to bump into, with what they actually mean and the usual fix.

MessageWhat it meansTypical fix
moov atom not found The MP4 file is missing its index — almost always because the recording was interrupted before being properly finalized. Recover with untrunc using a healthy reference file from the same encoder, or re-record. The file is otherwise unreadable.
Invalid data found when processing input FFmpeg can't make sense of the bytes — wrong format detection, corrupted header, or partial download. Force the format with -f <fmt>, or verify the file with ffprobe. Re-download / re-export if the source is bad.
Unknown encoder 'libx264' Your FFmpeg build was compiled without that codec library. Switch to a GPL-enabled build (e.g. the gyan.dev "Full" build). See section 3.
No such filter: 'libvmaf' The build doesn't include libvmaf. Use the gyan.dev "Full" build. Confirm with ffmpeg -h filter=libvmaf.
Output #0 does not contain any stream Your filter graph or stream-mapping options removed every stream from the output. Check -map, -vn, -an flags. Run without filters first to confirm the input has streams.
Error initializing filter 'X' … / filter 'X' not found Filter syntax error — usually a misplaced colon, comma, or unescaped Windows path. On Windows, escape colons in paths inside filter graphs as C\\:/path/.... See VMAF section.
Conversion failed! Generic catch-all printed at the end. The real error is one of the lines above it. Scroll up in the output. Re-run with -loglevel verbose if the cause is unclear.
real-time buffer [...] too full or near too full During dshow capture, FFmpeg can't drain the device buffer fast enough. Increase -rtbufsize (e.g. -rtbufsize 256M) or reduce input resolution / framerate.
Permission denied / Operation not permitted Output file is locked (open in another app), in a write-protected location, or being overwritten while in use. Close anything that has the file open, or write to a different path. Add -y to overwrite without prompting.
Past duration X too large Timestamps are non-monotonic — usually because of a bad input or seek that lands mid-GOP. Add -fflags +genpts, or seek with -ss before -i for keyframe-aligned seeking.
height/width not divisible by 2 H.264 / H.265 require even dimensions. Common after cropping. Round to even: -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" or pick a crop with even values.
When in doubt: add -loglevel verbose (or debug) to your command. FFmpeg's default output suppresses the line that actually explains what failed.

A.2 Performance tuning & presets

Encoder speed, file size, and quality form a triangle — you can pick any two. Below are the knobs that actually matter for libx264 and libx265, which are the encoders most people use.

The -preset flag

Controls how hard the encoder works to find the best compression. Slower presets produce smaller files at the same visual quality, at the cost of CPU time. Output of all presets is decodable everywhere; only the encoder's effort changes.

PresetRelative speedFile size at same qualityWhen to use
ultrafast~10× faster than medium~1.7× largerLive capture, real-time recording.
superfast~7×~1.5×Real-time-ish workflows.
veryfast~4×~1.25×Streaming, fast bulk transcodes. Common live-streaming default.
faster~2×~1.10×Daily-driver fast preset.
fast~1.3×~1.05×Marginal speed gain over medium.
medium (default)1× (baseline)Reasonable balance — the FFmpeg default.
slow~0.7×~0.95×Quality-focused offline encoding.
slower~0.4×~0.92×Archival, master encodes.
veryslow~0.2×~0.90×Maximum compression, when CPU time is free.

Numbers are approximate and depend heavily on content — see the FFmpeg H.264 encoding wiki for the canonical guidance.

CRF vs bitrate

Two ways to tell the encoder how much quality you want:

  • CRF (Constant Rate Factor)variable bitrate, constant perceptual quality. Best for archival and general delivery. Range 0–51; lower = better.
    • 0 = mathematically lossless (huge files)
    • 17–18 = "visually lossless" for most viewers
    • 23 = libx264 default — good balance
    • 28 = smaller, noticeable artifacts
    • 32+ = clearly degraded
  • Target bitrate (-b:v 4M) — fixed average bitrate. Use when you have a streaming budget or need a predictable file size.
  • Two-pass — analyze first, then encode. Slowest, but gives the best quality at a fixed target file size.
# CRF — pick a quality, accept whatever bitrate it produces
ffmpeg -i input.mp4 -c:v libx264 -preset slow -crf 20 -c:a aac output.mp4

# Single-pass bitrate — pick a bitrate, accept whatever quality
ffmpeg -i input.mp4 -c:v libx264 -preset slow -b:v 4M -c:a aac output.mp4

# Two-pass — best quality for an exact target file size
ffmpeg -y -i input.mp4 -c:v libx264 -preset slow -b:v 4M -pass 1 -an -f mp4 NUL
ffmpeg    -i input.mp4 -c:v libx264 -preset slow -b:v 4M -pass 2 -c:a aac output.mp4

The -tune flag

Optional content-aware adjustment for libx264 / libx265. Pick the one that matches your input:

  • film — live-action, default-ish
  • animation — cartoons / anime; favours flat regions
  • grain — preserves film grain (don't smooth it away)
  • stillimage — slideshows
  • fastdecode — for low-power playback devices
  • zerolatency — disable look-ahead for live streaming / video calls

CPU vs GPU encoders

Hardware encoders (h264_nvenc, h264_qsv, h264_amf) are much faster than libx264 — often 5–20× — but produce ~10–25% larger files for the same visual quality. The right choice depends on the workload:

  • Real-time capture or live streaming → GPU encoder, every time.
  • Bulk batch transcoding where you have idle GPU → GPU encoder.
  • Archival / mastering / final delivery → libx264 or libx265 with a slow preset and a CRF.

A.3 Glossary

The terms that come up over and over once you start working with video. Knowing these makes documentation, error messages, and forum threads much easier to read.

TermMeaning
CodecThe algorithm that encodes and decodes media. Examples: H.264, H.265, AV1, AAC, Opus, MP3.
ContainerThe file format that wraps streams + metadata together. Examples: .mp4, .mkv, .webm. Independent from the codec inside.
StreamA single track inside a container — usually one video, one or more audio, optional subtitles.
Mux / DemuxTo mux = combine streams into a container. To demux = pull streams out. No re-encoding involved.
TranscodeDecode + re-encode. Slow. Used to change codec or quality.
TransmuxCopy streams into a different container without re-encoding. Fast — equivalent to -c copy.
FrameA single image inside a video stream.
I-frame (keyframe)Self-contained image, decodable on its own. Largest type.
P-frame"Predicted" — only stores the difference from previous frames. Smaller than I-frames.
B-frame"Bi-directional" — predicted from frames before and after. Smallest, most compressed.
GOP"Group of Pictures" — the sequence between two keyframes (e.g. I P P B P B P … I). Shorter GOPs = better seeking, larger files.
BitrateBits per second of compressed data. Higher = bigger file, generally better quality.
CBR / VBR / CRFConstant / Variable / Constant Rate Factor — three ways to control the bitrate-vs-quality trade-off. See A.2.
ResolutionWidth × height in pixels (e.g. 1920×1080).
Frame rate (FPS)Frames per second. Common values: 24 (film), 30, 60.
Pixel format (pix_fmt)How color and luminance bits are laid out per pixel. yuv420p is the universal default for 8-bit H.264.
PTS / DTSPresentation / Decode Time Stamp. PTS = when a frame should display. DTS = when it should be decoded. Differ when B-frames are present.
Filter graphA pipeline of operations applied to a stream. Specified via -vf (single video chain) or -filter_complex (multi-input/output).
Hardware accelerationUsing dedicated GPU silicon (NVENC, QSV, VAAPI, AMF) for encode/decode. Much faster than CPU; slightly larger files at same quality.
Lossy / LosslessLossy = original cannot be exactly recovered (e.g. H.264, AAC). Lossless = bit-perfect reconstruction (e.g. FLAC, FFV1, ALAC).
Profile / LevelCodec sub-specifications that limit which features a decoder must support. main, high for H.264.
Chroma subsamplingStoring color information at lower resolution than brightness. 4:2:0 (default) uses 1/4 the color samples; the eye barely notices.