Captions for YouTube Shorts: Hook in 2 Seconds, Keep Them to the End
Vertical safe zones, on-mute autoplay, the 2-second hook rule, and how burned-in captions help your Shorts beat the 60-second swipe.
TL;DR
YouTube Shorts autoplays muted, in vertical, and lives or dies inside the first two seconds. Burned-in captions placed inside the safe zones are the single biggest lever to keep viewers from swiping past your hook.
Shorts is a swipe economy
A YouTube Shorts viewer makes a decision in well under two seconds. They are scrolling vertically, with the sound off, looking for any reason to keep their thumb still.
If your hook is in the audio, you've already lost.
This is why captions on Shorts behave differently from captions on long-form YouTube. They aren't just an accessibility layer: they are the primary hook surface, and they need to be designed like a thumbnail, not like a subtitle file.
The 2-second rule
Treat your first frame and your first caption as one composition:
- 0.0–0.3s: opening frame and first 3–5 caption words appear at the same instant.
- 0.3–1.0s: caption finishes the hook sentence ("Why your Shorts get 100 views").
- 1.0–2.0s: hook delivers a payoff line that earns the watch.
If the viewer doesn't read a meaningful sentence by second 2, the swipe happens. The caption is what they read. Period.
Hook caption styling that works
- Weight: 700–900. Anything lighter disappears against busy footage.
- Size: 7–9% of frame height. On a 1920px-tall vertical, that's roughly 140–170px tall.
- Stroke or background plate: a 2–3px outline or a semi-opaque pill background ensures contrast on every frame.
- Color accents: highlight the one word that delivers the hook (the number, the verb, or the brand) in a contrasting color. Don't rainbow the whole sentence.
Vertical safe zones on Shorts
YouTube's Shorts UI is busy. The right edge has likes, comments, share, remix, and subscribe. The bottom has the channel name, title, and audio attribution. Your captions need to live in the middle band.
Practical safe-zone rules:
- Keep the top 10% clear for the small "Shorts" badge.
- Keep the bottom 22–25% clear for title and CTA UI.
- Keep the right 12% clear for action buttons.
- Center your captions in the remaining vertical band, roughly between 35% and 70% of the frame height.
If your captions sit too low, the title and channel handle will overlap them. If they sit too high, viewers reading on the move can't reach them with their gaze before the next swipe.
Pacing inside a 60-second cap
Shorts can now run up to 3 minutes, but the format still rewards 30–60 second pacing. That gives you, conservatively, 80–120 spoken words per short, which is tight.
Caption pacing should match:
- Short bursts of 3–5 words at a time, swapped every 0.6–1.2 seconds.
- One full thought per "card": no sentence wraps mid-card.
- Avoid filler words ("um", "you know", "kind of"), both for the audio edit and the captions.
When captions are paced correctly, the viewer's eye does almost no work. That low-effort consumption is exactly what the algorithm and the swipe threshold reward.
YouTube's auto-captions vs. burned-in
YouTube does ship auto-captions on Shorts, and you can toggle a CC track on upload. The catch: most viewers do not toggle them on. The default mute, default no-CC behavior means your auto-caption track is invisible to the majority of viewers.
Burned-in captions sidestep that entirely. They render as part of the video frame, every viewer sees them, and they survive cross-posting to TikTok, Reels, and X without any platform-specific re-upload flow.
The right combo for most channels:
- Burned-in captions for hook and key payoff lines (always visible).
- Optional CC track for full transcript accessibility and SEO.
This is exactly the export shape Kaptionly produces by default: you edit the transcript once, style the caption look once, and export an MP4 ready for Shorts, Reels, and TikTok in a single render.
Short checklist before you upload a Short
- Caption is on screen by frame 1, not frame 30.
- Hook payoff word is in a contrasting color.
- No caption text in the bottom 22% of the frame.
- Lines are 3–5 words, not full sentences.
- Brand names and numbers are spelled correctly.
Quick FAQ
How big should captions be on YouTube Shorts?+
Aim for caption text at roughly 7–9% of frame height, about 140–170px tall on a 1920px vertical canvas. That keeps them readable on a phone in motion without dominating the frame.
Where should captions sit on a Short?+
Center them vertically between roughly 35% and 70% of the frame height. The top 10% holds the Shorts badge, and the bottom 22–25% gets covered by the title, channel, and CTA UI.
Should I burn captions in or use the CC track?+
Burn the hook and key payoff lines into the video itself, since most Shorts viewers never toggle CC on. You can still upload an SRT or VTT for the full transcript so YouTube can index it.
How fast should captions change?+
Swap caption text every 0.6–1.2 seconds in groups of 3–5 words. Long, slow caption blocks lose viewers in a swipe-heavy feed.