Example: K-pop
This example transcribes a korean song file using Whisper to its source language in TXT, VTT, SRT, TSV and JSON formats.
Then uses DeepL API to translate the generated transcripts to english.
Input
task: Transcribe
audio_file: kpop.wav
use_model: small
language: Korean
Output
[00:01.000 --> 00:05.000] 어둠 많이 나의 전부였던 동안
[00:05.000 --> 00:08.000] 숨이 뻗자도록 달려왔잖아
[00:08.000 --> 00:10.000] Never say, time's up
[00:10.000 --> 00:12.000] 경계의 끝으로
[00:12.000 --> 00:30.840] 지금 이따가
output_formats: txt,vtt,srt,tsv,json
deepl_target_language: English (American)
deepl_formality: default
deepl_coherence_preference: Translate each line independently
DeepL: Translate results from Korean [KO] to English (American) [EN-US]
[00:01.000 --> 00:05.000] While the darkness was all I had
[00:05.000 --> 00:08.000] You've been running to catch your breath
[00:08.000 --> 00:10.000] Never say, time's up
[00:10.000 --> 00:12.000] To the end of the boundary
[00:12.000 --> 00:30.840] Now, later
DeepL: Character usage: 62 / 500000 (0.01%)
audio_transcription/kpop_English (American).txt
audio_transcription/kpop_English (American).vtt
audio_transcription/kpop_English (American).srt