Whisper is an automatic recognition system of speech,
trained on 680,000 hours of multi-language and multi-tasking data collected on the Internet.
We establish that the use of data of such number and such diversity
is the reason why our system is able to understand many accents,
despite background noise, to understand technical vocabulary
and to succeed in translating from several languages ​​into English.
We distribute, as a free software, the source code for our models and for inference,
so that it can serve as a starting point to build useful applications
and to help to make research progress in speech processing.