Supported Media Formats for Video to Text
See which audio and video file formats Video to Text supports, plus upload size and duration limits to check before you upload.
Video to Text supports a focused set of audio and video formats so you can upload common files without extra setup. Before you start, check two hard limits: each file must be 5 GB or smaller, and the media duration must stay under 10 hours.
If you want a full walkthrough of the upload flow, read How to Use Video to Text. If your file includes multiple languages, Supported Languages in Video to Text explains the language options.
Supported audio formats
You can upload these audio file types:
.aac.flac.m4a.mp3.oga.ogg.opus.wav
These formats cover most voice notes, podcasts, interviews, lectures, and exported audio tracks from editing tools.
Supported video formats
You can upload these video file types:
.mp4.m4v.webm
If your source file comes from a screen recorder, camera export, or browser capture tool, one of these formats is usually the easiest path.
Upload limits you should know
Video to Text checks both file size and media duration before the transcription starts.
- Maximum file size: 5 GB
- Maximum media duration: less than 10 hours
These checks happen before the file is submitted for transcription. That helps you catch problems early instead of waiting through a long upload.
Which format should you choose?
If you can export your media in more than one format, use these simple rules:
- Choose MP3 or M4A for spoken audio when you want smaller files.
- Choose WAV or FLAC when you want lossless audio.
- Choose MP4 for video in most cases because it is widely supported.
- Keep the file as close to the original recording as possible. Re-encoding a file several times can lower speech clarity.
Common upload scenarios
Meetings and interviews
Audio-first formats such as .m4a, .mp3, and .wav are often the best fit. They are easy to export from mobile recorders, meeting tools, and editing apps.
Podcasts and lectures
.mp3, .m4a, .flac, and .wav are all supported. If storage matters, a clean .mp3 export is usually enough. If you want the original quality, use .wav or .flac.
Recorded video lessons and webinars
.mp4 is the safest choice for most exported video files. If your recording tool gives you .webm, you can upload that directly.
Tips before you upload
Use these checks to avoid a failed upload:
- Confirm the file extension matches one of the supported formats above.
- Keep the file below 5 GB.
- Make sure the media is shorter than 10 hours.
- If you know the spoken language, select it before you start. That can help keep the workflow simple.
- Turn on speaker labels when you need speaker-separated output for meetings or interviews.
What happens after upload?
After the file upload finishes, Video to Text starts transcription and then sends you to the export page when processing is done. You can export the transcript as:
csvsrtvtttxt
For the full process, including language choice and export steps, see How to Use Video to Text.
