Whisper Extension
Whisper is OpenAI's pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning. See more on Whisper's official page.
The Whisper extension is an advanced speech-to-text solution designed to transcribe spoken language into text with high accuracy. It is particularly useful in applications requiring real-time transcription, voice command processing, or accessibility features for users with hearing impairments.
-
Multiple Models Available: choose from vareity of Whisper models to suite your needs, whether you require higher accuracy with more extensive training data or a lightweight model optimized for faster performance and lower resource usage.
-
Wide Language Coverage: a diverse set of languages are supported
-
CUDA Acceleration: run Whisper inference on CUDA to harness power of NVIDIA GPUs for more efficient speech processing.
-
OpenVINO Acceleration: run Whisper inference on OpenVINO to harness power of Intel hardware (CPUs, GPUs and NPUs) more efficient speech processing (in beta).
Nodes​
The Whisper Extension provides the following audio nodes for a Switchboard SDK audio graph:
Node | Description |
---|---|
STT | Transcribes voice audio signal to text. |
Download​
You can find the download links for this extension on our Downloads page.