VideoText workflow guide

TTML to SRT Converter

Convert TTML, DFXP, or EBU-TT subtitle files to SRT format. Used for Netflix, broadcast, and enterprise video workflows. Free, browser-based.

Where subtitle workflows break in real production

  • Reading speed is the invisible subtitle constraint: 14–17 characters per second is the readable range for most viewers; anything above 21 CPS causes comprehension failure even when the transcript text is accurate. Most automated subtitle tools generate lines without CPS checks.
  • Burned captions create a permanent workflow commitment — any timing correction or text fix after encoding requires re-encoding the full video. For long-form content this is hours of lost render time, which is why burn decisions need to happen after QA, not before.
  • Platform subtitle requirements are not interchangeable: TikTok displays a 2-line maximum with specific font rendering; YouTube accepts up to 1,500 SRT blocks and requires UTF-8 encoding; Instagram Reels ignores soft subtitle tracks on autoplay entirely, making burned captions the only reliable option for Reels.

From raw video to export-ready subtitle file

1. Generate subtitle file with word-level timestamps

Upload the video or paste a public URL. Word-level timestamp alignment produces more accurate line breaks than sentence-level alignment — each subtitle break falls at a natural pause rather than a word boundary mid-phrase.

2. Review CPS on every subtitle line

Flag any line above 17 CPS and split it. An 8-word subtitle in a 1.2-second window hits approximately 24 CPS — unreadable for most viewers. Merge subtitle lines shorter than 0.8 seconds, which display too briefly to register.

3. Check for timing overlap and minimum duration

Subtitle overlap — where the next block starts before the previous one ends — causes display flicker in most players. Any gap shorter than 0.1 seconds between adjacent subtitles should be merged or widened to prevent rendering artifacts.

4. Select platform-appropriate export format

Export SRT for YouTube, Vimeo, and video editors. Export VTT for HTML5 web players and streaming platforms that support caption positioning. Choose burned-caption output for Instagram Reels, TikTok clips, and any social context where autoplay without sound is expected.

SRT, VTT, and burned caption outputs

SRT subtitle file

Standard timed subtitle format: sequence number, timestamp pair (00:00:00,000 → 00:00:00,000 with comma separator), and one or two lines of caption text per block. Supported by YouTube, Vimeo, LinkedIn, most video editors, and accessibility workflows.

VTT subtitle stream

WebVTT format with period timestamp separators (00:00:00.000 → 00:00:00.000). Used by HTML5 video players, streaming services, and accessibility platforms. Unlike SRT, VTT supports <cue> positioning metadata, text styling, and karaoke-mode highlighting.

Burned captions

Permanently embedded caption text rendered directly into the video frame — cannot be toggled off by the viewer. Required for platforms that strip external caption tracks. Font rendering, shadow depth, and vertical position all need QA review after encoding because video compression can degrade caption legibility.

Creators and teams running subtitle workflows

Video creators publishing to multiple platforms

Generate SRT for YouTube upload, export burned captions for Instagram Reels, and produce VTT for course platform embeds — all from the same subtitle pass without reformatting timing.

Accessibility compliance teams

Produce synchronized captions that meet WCAG 2.1 timing and reading-speed requirements. CPS validation catches lines that fail accessibility guidelines before the video goes live.

Post-production editors

Export SRT or VTT for import into Premiere, DaVinci Resolve, or Final Cut Pro. Accurate word-level timestamps eliminate the need to manually re-sync caption timing after import.

Subtitle edge cases that cause QA failure

CPS violation — splitting required

Eight words spoken in 1.2 seconds: "we need to fix this before the deadline" at 24 CPS. The line must be split into two display segments within the same timing window to hit the readable 14–17 CPS range.

Subtitle timing overlap

Block ending at 00:01:23,500 overlapping a block starting at 00:01:23,200 causes display flicker and rendering failure in most SRT players. A subtitle validator catches this before upload.

Mobile frame crop

A two-line subtitle with 44+ characters per line at default font sizes clips at the bottom of mobile video frames. The safe area for mobile subtitles is a single line, 38 characters maximum, positioned at 85% vertical height.

Burned caption rendering after encode

Font stroke weight, drop shadow depth, and subtitle vertical position all shift slightly after H.264 encoding. Captions that look correct in preview may become harder to read in the encoded file — requires a post-encode QA pass before publishing.

Platform-specific subtitle requirements

YouTube SRT requirements

UTF-8 encoding required. Maximum 1,500 subtitle blocks per file. Timestamp format: 00:00:00,000 (comma separator, not period). Files over 1,500 blocks need splitting before upload. YouTube auto-syncs uploaded SRT to audio — small timing offsets are corrected automatically.

TikTok caption behavior

TikTok generates its own auto-captions and displays them over uploaded SRT files in most cases. Burned captions are the reliable method for ensuring accurate captions appear on TikTok content. SRT upload works but TikTok may override it.

Instagram Reels caption handling

Instagram does not display soft subtitle tracks during autoplay in Feed or Reels. Burned captions are the only reliable method for ensuring captions appear for silent autoplay viewers. Instagram's built-in auto-caption feature can also be used after upload, but accuracy varies.

VTT vs SRT encoding difference

VTT uses period timestamp separators (00:00:00.000) while SRT uses commas (00:00:00,000). Swapping the separator character breaks parsing in most players. VTT files must begin with the WEBVTT header line — SRT files must not.

Subtitle workflow questions answered

What is CPS and why does it matter for subtitle readability?

CPS stands for characters per second — how fast the viewer must read a subtitle line before it disappears. The readable range is 14–17 CPS for most general audiences. Above 21 CPS, a significant percentage of viewers cannot finish reading the line before it vanishes, even if the text is accurate. Most automated subtitle tools generate lines without a CPS check, which is why reviewing reading speed is part of subtitle QA before any platform upload.

Why does Instagram not show my uploaded subtitle file?

Instagram Reels does not display soft subtitle tracks during autoplay in Feed. External SRT or VTT files uploaded to Instagram are not surfaced for most viewers. For Reels content, burned (hardcoded) captions are the only reliable method for ensuring captions appear — either by using a burn workflow before upload or using Instagram's built-in auto-caption feature after upload, which has variable accuracy.

My burned subtitles look different after video encoding — what happened?

Video encoding (particularly H.264 and H.265) applies compression that slightly degrades text edges. Font stroke weight, drop shadow contrast, and subtitle position all shift after encoding. Captions that look clean in an editing preview may develop legibility issues in the exported file. A post-encode QA pass — watching 2–3 minutes of the encoded video at actual playback size — should be part of any burned caption workflow before the file is published.

What is the difference between SRT and VTT timestamp formats?

SRT uses comma separators in timestamps: 00:01:23,456 → 00:01:25,123. VTT uses period separators: 00:01:23.456 → 00:01:25.123. Swapping the separator character causes parsing failures in most players — the file appears empty or throws an error. SRT files must not begin with a header; VTT files must begin with the line "WEBVTT". The two formats are otherwise structurally similar but not interchangeable.

How do I fix subtitle timing that drifts progressively later in the video?

Progressive timing drift — where captions start slightly late early in the video and progressively fall further behind — usually indicates an audio-video sync issue in the source file, not a transcription error. The subtitle timestamps were generated against audio that does not match the video track timing. Fix: use the subtitle timing fixer to shift all timestamps by the measured offset at a known sync point early in the video. If the drift accelerates over time, the source file has a variable frame rate issue that requires frame-rate normalization before re-captioning.

What subtitle formats does VideoText export?

VideoText exports SRT and VTT subtitle files. SRT works with YouTube, Vimeo, LinkedIn, and most video editors. VTT is the standard for HTML5 web players and streaming platforms — it also supports styling and positioning metadata that SRT does not carry. Burned caption output is available for social clips that require permanently embedded text. All exports are UTF-8 encoded.

Related caption and subtitle tools

Workflow shortcuts

Fix Ttml To SRT timing and reading-speed issuesExport Ttml To SRT captions without timestamp drift All Pages Index Tool Alternatives Transcription Tools Subtitle Tools

Primary Transcription & Caption Tools

Video to TranscriptVideo to SubtitlesTranslate SubtitlesFix SubtitlesBurn SubtitlesCompress Video

Find More Tools

Tool Alternatives Transcription Tools Subtitle Tools