How to Use Video to Text | Upload, Transcribe & Export

How to use Video to Text is straightforward: sign in, upload a supported file, choose your language settings, wait for transcription, and export the result in the format you need.

This guide covers the current flow in the app. If you want format details first, see Supported Media Formats for Video to Text. If you want language guidance, read Supported Languages in Video to Text.

Step 1: Sign in and prepare your minutes

Before you start a transcription, sign in to your account. The app checks account access and available minutes before it begins processing.

New users receive 30 free minutes. If your remaining minutes are not enough for the file you selected, the app prompts you to top up before transcription starts.

Step 2: Upload a supported audio or video file

Click the upload area and choose your file.

Video to Text currently accepts these file types:

Audio: .aac, .flac, .m4a, .mp3, .oga, .ogg, .opus, .wav
Video: .mp4, .m4v, .mov, .mkv, .webm

The upload rules are also important:

maximum file size: 5 GB
maximum media duration: under 10 hours

If your file breaks either limit, the app stops the workflow before transcription begins.

Step 3: Choose your language option

After selecting a file, choose the language setup that matches your recording:

pick a specific language if you know it
use auto detection if you are not sure
use multilingual detection if the recording switches between languages

This choice matters most for long interviews, podcasts, and international team meetings.

Step 4: Turn on speaker labels if you need them

Speaker labels are useful for conversations with more than one person. When this option is enabled, the transcript can separate content by speaker, which makes reviews and exports easier to read.

This is especially helpful for:

meetings
interviews
panel discussions
classroom recordings

Step 5: Wait for upload and transcription to finish

Once you submit the file, the app uploads it to storage and then starts transcription. Progress messages appear while the job is running.

Video to Text is built for quick turnaround, but total time still depends on a few things:

file duration
upload speed
file size
whether the audio is clear

If you want a closer look at expected timing, read How Long Does Video to Text Take?.

Step 6: Export the transcript

When transcription is complete, the app takes you to the export page. You can export the transcript as:

csv
srt
vtt
txt

Choose the format that matches your next step:

use srt or vtt for subtitles
use csv for spreadsheet review or structured handoff
use txt for plain text editing and note-taking

Quick workflow summary

If you just need the short version, here it is:

Sign in.
Upload a supported file.
Confirm the file is under 5 GB and under 10 hours.
Pick a language option.
Turn on speaker labels if needed.
Start transcription.
Export the finished transcript in csv, srt, vtt, or txt.

Tips for a smoother result

Upload the cleanest source file you have.
Use the correct language option whenever possible.
Turn on speaker labels for group conversations.
Export in more than one format if different teammates need different outputs.