A practical introduction to FFmpeg — what it is, where it came from, and how to use it for the most common audio and video tasks. Built around real command examples you can adapt as you go.
FFmpeg is a free, all-in-one command-line tool for working with audio and video. It can record, convert, resize, crop, cut, combine, extract frames, measure quality — basically anything you'd want to do with a media file, without ever opening a GUI.
You don't need to memorize the exact commands — but it's useful to know which options exist, so you recognize what's possible and can look up the right syntax when you need it. Most people keep a small list of recipes (like this page) and adapt them as they go.
Official site:ffmpeg.org — downloads, news and the source code. Official documentation:ffmpeg.org/documentation — full reference for every command, flag, and filter. Companion tools:ffprobe (inspect files) and ffplay (quick playback) ship in the same bundle.
Tools and services built on FFmpeg
Even if it's the first time you hear about it, chances are you've already used FFmpeg without knowing — it's the engine behind a huge slice of the media ecosystem:
Tool
What it is
How it uses FFmpeg
VLC media player
Popular open-source video player
Uses libavcodec / libavformat (FFmpeg's libraries) for decoding most formats
OBS Studio
Streaming & screen-recording app used by Twitch/YouTube creators
Bundles FFmpeg for recording, encoding and muxing output
Shotcut
Open-source video editor
Built on top of FFmpeg's libraries for import, export and rendering
Audacity
Audio editor
Optional FFmpeg plugin enables import/export of MP3, M4A, WMA, etc.
So when you learn FFmpeg directly, you're learning the same engine that's already running quietly inside dozens of products you use every day.
2. A bit of history
FFmpeg has been around longer than YouTube. Knowing where it came from explains why it shows up in so many places today.
2000
Created by French programmer Fabrice Bellard. Initial focus: a fast, minimal MPEG video encoder/decoder.
2003
Bellard handed over maintenance to Michael Niedermayer, who led the project for the next decade-plus.
2004–2010
Rapid growth — libavcodec and libavformat become the de-facto open-source codec stack, adopted by VLC and many other media tools.
2017+
Native support added for modern codecs (HEVC/H.265, VP9, AV1), hardware acceleration, and quality metrics (PSNR, SSIM, VMAF).
Today
Powers a huge portion of the world's media infrastructure — browsers, streaming platforms, broadcast pipelines, mobile apps, and more. See the projects using FFmpeg wiki page.
What does the name mean?
"MPEG" refers to the Moving Picture Experts Group, the standards body behind MPEG-1, MPEG-2, MPEG-4 and many of the video formats FFmpeg was originally built to handle. The "FF" part is more interesting — for years there was lively speculation on the mailing list about what those letters really meant, until Fabrice Bellard himself stepped in to settle it: in a 2006 ffmpeg-devel post he confirmed the original meaning is simply "Fast Forward", the playback-control symbol on tape decks and remotes.
Fun fact: the project's mascot is a stylized zigzag pattern derived from a zigzag scan used in JPEG/MPEG block encoding.
3. Installing it
FFmpeg ships as a single executable. Pick the route for your platform — the rest of this guide uses the same commands across all of them.
Windows
Go to gyan.dev/ffmpeg/builds — the recommended Windows build provider (also linked from the official ffmpeg.org/download page). The page offers two build streams: a release build (based on the latest stable FFmpeg release — recommended for most users) and a git master build (built from the development branch — gets new features and fixes earlier, but slightly less battle-tested).
Pick a build (see the table below), download the .7z or .zip, and unzip it to a folder of your choice.
Add the unzipped ffmpeg/bin folder to your PATH environment variable — this lets you run ffmpeg directly from any cmd or PowerShell window, without having to type its full path each time.
Essentials vs Full — which build do I want?
Build
Size
Includes
Use it when…
Essentials
~30 MB
The common codecs (H.264/x264, H.265/x265, AAC, Opus, VP9, AV1) and basic filters. No GPL-only or research libraries.
You just want to convert / cut / capture / play files. 90% of users.
Full
~150 MB
Everything in Essentials plus extras like libvmaf, libtensorflow, frei0r, chromaprint, additional decoders and analyzers.
VMAF, ML-based filters, or any of the niche libraries are needed.
Linux & macOS
FFmpeg is available through every major package manager (apt, dnf, pacman, Homebrew, MacPorts, …). For install instructions per distro and Mac, plus source builds and nightly releases, see the official downloads page at ffmpeg.org/download.
Verify the install: run ffmpeg -version. If it prints a version banner, you're set. The configuration line tells you which features your build supports (e.g. --enable-libvmaf, --enable-nvenc) — handy when a feature later seems missing.
-i introduces an input. You can have multiple inputs (e.g. video + audio + reference for VMAF).
Options that come before-i apply to the input. Options after all inputs apply to the output.
The output filename comes last. No flag — just the path.
Order matters!-ss 30 -i input.mp4 seeks before decoding (fast). -i input.mp4 -ss 30 seeks after decoding (slow but precise).
Streams inside a container
A media file is a container that holds one or more streams — video, audio, subtitles, sometimes data. A single .mp4 might have one video stream and two audio tracks (e.g. original + dubbed); an audio file can have an embedded cover image as a video stream. FFmpeg addresses streams using INPUT:TYPE:INDEX:
0:v:0 — input file 0, first video stream
0:a:0 — input file 0, first audio stream
0:a:1 — input file 0, second audio stream (e.g. the dub)
1:s:0 — input file 1, first subtitle stream
The -map flag explicitly chooses which streams end up in the output. Without it, FFmpeg auto-picks the "best" stream of each type — usually the right call, occasionally a surprising one.
Container vs. codec
The container is the file's wrapper (.mp4, .mkv, .webm, …). The codec is the algorithm that compresses the audio or video data inside it. They are independent choices:
The output filename's extension determines the container.
-c:v / -c:a determines the codec inside.
Each container only supports certain codecs. For example, .webm only accepts VP8 / VP9 / AV1 video and Vorbis / Opus audio. Mismatched combinations fail with errors like "could not find tag for codec X in stream Y." When in doubt, MP4 + H.264 + AAC is the universal-compatibility default.
Beyond basic compatibility, many flags only work with specific encoders or containers — even though their names look generic. A few common gotchas:
-crf (constant quality) is supported by libx264, libx265, and libvpx-vp9. NVIDIA's h264_nvenc uses -cq instead — same idea, different flag name.
-preset slow makes sense for libx264 (values: ultrafast → veryslow). NVENC also accepts -preset, but with values like p1 through p7 — same option name, completely different value scheme.
-tune zerolatency, -tune film, etc. are libx264/libx265-specific. Hardware encoders ignore them.
-movflags +faststart (moves the index to the beginning of the file for streamable MP4) only applies to MP4 and MOV containers. Silently ignored elsewhere.
Trying to put H.264 in a .webm output is rejected outright — the container's spec doesn't allow it.
The takeaway: when copying a command from somewhere, check that its flags actually apply to the encoder and container you're using. The errors appendix lists the typical failure messages.
Output is optional
Not every command needs an output file. Replacing the output with -f null - tells FFmpeg to process the input but throw the result away. Useful when you only care about the side effects — analysis logs (VMAF, cropdetect), validating a file, or testing a filter graph without writing anything to disk.
5. Flag cheat-sheet
Flag
What it does
Example
-i
Input file or device
-i input.mp4
-c:v
Video codec
-c:v libx264
-c:a
Audio codec
-c:a aac
-c copy
Stream copy (no re-encode)
-c copy
-b:v / -b:a
Video / audio bitrate
-b:v 2500k
-s
Resolution
-s 1920x1080
-r
Framerate (FPS)
-r 60
-pix_fmt
Pixel format
-pix_fmt yuv420p
-ac
Audio channel count
-ac 1 (mono)
-t
Duration
-t 30 (30 seconds)
-ss / -to
Seek to / stop at
-ss 00:01:00 -to 00:01:30
-vf
Video filter chain
-vf crop=500:500
-filter_complex
Multi-input filter graph
-filter_complex hstack=inputs=2
-vframes
Number of video frames to write
-vframes 1
-y
Overwrite output without asking
-y
-hide_banner
Quieter output
-hide_banner
6. First commands
Inspect a file
ffprobe -i input.mp4 -hide_banner
Explain this command
ffprobe
Companion tool to FFmpeg that reads media files and prints information about them.
-i input.mp4
The file to inspect.
-hide_banner
Skip the build/version banner so the output focuses on the file's metadata.
Pick all audio streams from the input; drop video.
audio.wav
Audio-only output file.
Remove the audio track
ffmpeg -i input.mp4 -c copy -an output.mp4
Explain this command
-i input.mp4
Input file.
-c copy
Copy streams without re-encoding — fast and lossless.
-an
Drop the audio (a = audio, n = none).
output.mp4
Video-only output file.
7. Capturing from a device windows
List capture devices
ffmpeg -list_devices true -f dshow -i dummy
Explain this command
-list_devices true
Print available capture devices instead of recording.
-f dshow
Use Windows DirectShow as the input format.
-i dummy
Required placeholder input — not actually read.
Show options for one device
Replace Video Capture Device / Audio Capture Device below with the exact name -list_devices printed for your hardware (e.g. a webcam, capture card, or HDMI grabber).
Real-time input buffer — prevents frame drops at high data rates.
-f dshow
DirectShow input format.
-i video=…:audio=…
Combined video + audio source — both grabbed from the same dshow command.
-c:v h264_nvenc
NVIDIA hardware H.264 encoder.
-s 1920x1080
Output resolution.
-r 60
Framerate — 60 fps.
-b:v 2500k
Target video bitrate — 2.5 Mbps.
-pix_fmt yuv420p
Pixel format — broad-compat 4:2:0 chroma.
-ac 1
One audio channel (mono).
-b:a 128k
Audio bitrate — 128 kbps.
-profile:a aac_main
AAC profile.
-t 30
Stop after 30 seconds.
"C:\temp\recording.mp4"
Output file path.
What's happening: dshow input → NVENC h264 video at 1080p60 / 2.5 Mbps → mono AAC audio at 128 kbps → stop after 30s.
Real device names tend to be longer and include the manufacturer / port index. As an example, a Magewell Pro Capture Quad HDMI card with the second port selected might appear in -list_devices as Video (00-1 Pro Capture Quad HDMI), and the same command would look like:
The exact device name as printed by -list_devices, including the parentheses. Must be quoted in full because of the spaces.
00-1
Card / port index. 00 = the first card in the system; -1 = its second port (zero-indexed). A four-port capture card would expose 00-0 through 00-3.
Pro Capture Quad HDMI
Vendor / model name as the driver registers it.
Audio (00-1 Pro Capture Quad HDMI)
The audio side of the same physical input — same manufacturer + port index, but registered as an audio device. Many capture cards expose video and audio as two separate dshow devices that you have to combine yourself.
All other flags work identically to the explainer for the previous command.
Select specific frames by zero-indexed number. Breaking the expression down:
select=… — keep only frames where the inner expression is true.
eq(a, b) — equality function; returns 1 if a equals b, otherwise 0.
n — built-in variable holding the current frame's zero-indexed number (0 for the first frame, 1 for the second, …).
\, — a comma escaped with a backslash. Inside FFmpeg's filter syntax a bare , is a filter separator, so the comma between eq's arguments must be escaped.
+ — logical OR; joins multiple conditions so several frames match.
So eq(n\,0) means "this frame is frame 0" (the very first one), eq(n\,59) means "this frame is frame 59" (the 60th one), and the full expression keeps only frames 0, 59, 119, and 239.
-vframes 4
Stop after writing 4 frames.
-fps_mode passthrough
Preserve input timestamps (don't try to fill gaps).
-y
Overwrite outputs without asking.
"frame_%d.bmp"
Output pattern. %d = sequence number (1, 2, 3, 4).
Software CPU encoder. Best quality per bit, slowest.
-c:v h264_amf
AMD GPU H.264 encoder (Advanced Media Framework).
-c:v h264_nvenc
NVIDIA GPU H.264 encoder (NVENC).
output.mp4
Output file.
Rule of thumb: CPU = best quality per bit, slower. GPU = much faster, slightly larger files at the same visual quality. For real-time capture, always use GPU.
11. Metadata and FFprobe
Add custom metadata fields
You can attach standard tags (title, artist, comment, year) or arbitrary key/value pairs of your own. With -c copy the streams aren't re-encoded — only the container's tag table is rewritten, so this is essentially instantaneous.
An arbitrary key/value pair you make up. Useful for tagging files with project- or workflow-specific info.
-movflags +use_metadata_tags
Required for non-standard tags to be preserved in MP4 / MOV containers.
-c copy
Don't re-encode — just rewrite the container's tag table.
output.mp4
Tagged output file.
Read metadata back
ffmpeg -i output.mp4 -hide_banner
Explain this command
-i output.mp4
The file to inspect. With no output specified, FFmpeg just prints what it knows about the input and stops.
-hide_banner
Skip the build/version header so the metadata stands out.
Per-frame info with ffprobe
ffprobe -i input.mp4 -show_frames
Explain this command
ffprobe
Inspector tool — prints structured info about a media file.
-i input.mp4
File to analyze.
-show_frames
Dump one block per frame: picture type (I/P/B), PTS, size, pixel format, and more.
Each frame block tells you the picture type (I/P/B), PTS, size, pixel format, etc. — useful when you're debugging encoder behaviour or hunting a specific keyframe.
12. VMAF — measuring video quality netflix
VMAF (Video Multimethod Assessment Fusion) is Netflix's perceptual quality metric. Score range: 0–100. Higher = closer to the reference. It correlates with what humans actually perceive much better than PSNR alone.
Prerequisites
An FFmpeg build compiled with --enable-libvmaf. Check with: ffmpeg -h filter=libvmaf
A VMAF model file, e.g. vmaf_v0.6.1.json (available from the Netflix/vmaf GitHub).
Two videos to compare: a distorted one (e.g. a re-encoded version) and the original/reference.
Important: the reference (original) video must be the second input to FFmpeg. Distorted = first input.
Additional reference material. Each appendix entry is self-contained — useful when you need a deeper look at a specific topic beyond the core walkthrough above.
A.1 Common error messages
A reference for the FFmpeg errors you're most likely to bump into, with what they actually mean and the usual fix.
Message
What it means
Typical fix
moov atom not found
The MP4 file is missing its index — almost always because the recording was interrupted before being properly finalized.
Recover with untrunc using a healthy reference file from the same encoder, or re-record. The file is otherwise unreadable.
Invalid data found when processing input
FFmpeg can't make sense of the bytes — wrong format detection, corrupted header, or partial download.
Force the format with -f <fmt>, or verify the file with ffprobe. Re-download / re-export if the source is bad.
Unknown encoder 'libx264'
Your FFmpeg build was compiled without that codec library.
Switch to a GPL-enabled build (e.g. the gyan.dev "Full" build). See section 3.
No such filter: 'libvmaf'
The build doesn't include libvmaf.
Use the gyan.dev "Full" build. Confirm with ffmpeg -h filter=libvmaf.
Output #0 does not contain any stream
Your filter graph or stream-mapping options removed every stream from the output.
Check -map, -vn, -an flags. Run without filters first to confirm the input has streams.
Error initializing filter 'X' … / filter 'X' not found
Filter syntax error — usually a misplaced colon, comma, or unescaped Windows path.
On Windows, escape colons in paths inside filter graphs as C\\:/path/.... See VMAF section.
Conversion failed!
Generic catch-all printed at the end. The real error is one of the lines above it.
Scroll up in the output. Re-run with -loglevel verbose if the cause is unclear.
real-time buffer [...] too full or near too full
During dshow capture, FFmpeg can't drain the device buffer fast enough.
Output file is locked (open in another app), in a write-protected location, or being overwritten while in use.
Close anything that has the file open, or write to a different path. Add -y to overwrite without prompting.
Past duration X too large
Timestamps are non-monotonic — usually because of a bad input or seek that lands mid-GOP.
Add -fflags +genpts, or seek with -ssbefore-i for keyframe-aligned seeking.
height/width not divisible by 2
H.264 / H.265 require even dimensions. Common after cropping.
Round to even: -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" or pick a crop with even values.
When in doubt: add -loglevel verbose (or debug) to your command. FFmpeg's default output suppresses the line that actually explains what failed.
A.2 Performance tuning & presets
Encoder speed, file size, and quality form a triangle — you can pick any two. Below are the knobs that actually matter for libx264 and libx265, which are the encoders most people use.
The -preset flag
Controls how hard the encoder works to find the best compression. Slower presets produce smaller files at the same visual quality, at the cost of CPU time. Output of all presets is decodable everywhere; only the encoder's effort changes.
Preset
Relative speed
File size at same quality
When to use
ultrafast
~10× faster than medium
~1.7× larger
Live capture, real-time recording.
superfast
~7×
~1.5×
Real-time-ish workflows.
veryfast
~4×
~1.25×
Streaming, fast bulk transcodes. Common live-streaming default.
faster
~2×
~1.10×
Daily-driver fast preset.
fast
~1.3×
~1.05×
Marginal speed gain over medium.
medium (default)
1×
1× (baseline)
Reasonable balance — the FFmpeg default.
slow
~0.7×
~0.95×
Quality-focused offline encoding.
slower
~0.4×
~0.92×
Archival, master encodes.
veryslow
~0.2×
~0.90×
Maximum compression, when CPU time is free.
Numbers are approximate and depend heavily on content — see the FFmpeg H.264 encoding wiki for the canonical guidance.
CRF vs bitrate
Two ways to tell the encoder how much quality you want:
CRF (Constant Rate Factor) — variable bitrate, constant perceptual quality. Best for archival and general delivery. Range 0–51; lower = better.
0 = mathematically lossless (huge files)
17–18 = "visually lossless" for most viewers
23 = libx264 default — good balance
28 = smaller, noticeable artifacts
32+ = clearly degraded
Target bitrate (-b:v 4M) — fixed average bitrate. Use when you have a streaming budget or need a predictable file size.
Two-pass — analyze first, then encode. Slowest, but gives the best quality at a fixed target file size.
Optional content-aware adjustment for libx264 / libx265. Pick the one that matches your input:
film — live-action, default-ish
animation — cartoons / anime; favours flat regions
grain — preserves film grain (don't smooth it away)
stillimage — slideshows
fastdecode — for low-power playback devices
zerolatency — disable look-ahead for live streaming / video calls
CPU vs GPU encoders
Hardware encoders (h264_nvenc, h264_qsv, h264_amf) are much faster than libx264 — often 5–20× — but produce ~10–25% larger files for the same visual quality. The right choice depends on the workload:
Real-time capture or live streaming → GPU encoder, every time.
Bulk batch transcoding where you have idle GPU → GPU encoder.
Archival / mastering / final delivery → libx264 or libx265 with a slow preset and a CRF.
A.3 Glossary
The terms that come up over and over once you start working with video. Knowing these makes documentation, error messages, and forum threads much easier to read.
Term
Meaning
Codec
The algorithm that encodes and decodes media. Examples: H.264, H.265, AV1, AAC, Opus, MP3.
Container
The file format that wraps streams + metadata together. Examples: .mp4, .mkv, .webm. Independent from the codec inside.
Stream
A single track inside a container — usually one video, one or more audio, optional subtitles.
Mux / Demux
To mux = combine streams into a container. To demux = pull streams out. No re-encoding involved.
Transcode
Decode + re-encode. Slow. Used to change codec or quality.
Transmux
Copy streams into a different container without re-encoding. Fast — equivalent to -c copy.
Frame
A single image inside a video stream.
I-frame (keyframe)
Self-contained image, decodable on its own. Largest type.
P-frame
"Predicted" — only stores the difference from previous frames. Smaller than I-frames.
B-frame
"Bi-directional" — predicted from frames before and after. Smallest, most compressed.
GOP
"Group of Pictures" — the sequence between two keyframes (e.g. I P P B P B P … I). Shorter GOPs = better seeking, larger files.
Bitrate
Bits per second of compressed data. Higher = bigger file, generally better quality.
CBR / VBR / CRF
Constant / Variable / Constant Rate Factor — three ways to control the bitrate-vs-quality trade-off. See A.2.
Resolution
Width × height in pixels (e.g. 1920×1080).
Frame rate (FPS)
Frames per second. Common values: 24 (film), 30, 60.
Pixel format (pix_fmt)
How color and luminance bits are laid out per pixel. yuv420p is the universal default for 8-bit H.264.
PTS / DTS
Presentation / Decode Time Stamp. PTS = when a frame should display. DTS = when it should be decoded. Differ when B-frames are present.
Filter graph
A pipeline of operations applied to a stream. Specified via -vf (single video chain) or -filter_complex (multi-input/output).
Hardware acceleration
Using dedicated GPU silicon (NVENC, QSV, VAAPI, AMF) for encode/decode. Much faster than CPU; slightly larger files at same quality.
Lossy / Lossless
Lossy = original cannot be exactly recovered (e.g. H.264, AAC). Lossless = bit-perfect reconstruction (e.g. FLAC, FFV1, ALAC).
Profile / Level
Codec sub-specifications that limit which features a decoder must support. main, high for H.264.
Chroma subsampling
Storing color information at lower resolution than brightness. 4:2:0 (default) uses 1/4 the color samples; the eye barely notices.