Skip to main content

Whisper Extension

Whisper is OpenAI's pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning. See more on Whisper's official page.

The Whisper extension is an advanced speech-to-text solution designed to transcribe spoken language into text with high accuracy. It is particularly useful in applications requiring real-time transcription, voice command processing, or accessibility features for users with hearing impairments.

  • Multiple Models Available: choose from vareity of Whisper models to suite your needs, whether you require higher accuracy with more extensive training data or a lightweight model optimized for faster performance and lower resource usage.

  • Wide Language Coverage: a diverse set of languages are supported

  • CUDA Acceleration: run Whisper inference on CUDA to harness power of NVIDIA GPUs for more efficient speech processing.

  • OpenVINO Acceleration: run Whisper inference on OpenVINO to harness power of Intel hardware (CPUs, GPUs and NPUs) more efficient speech processing (in beta).

Nodes​

The Whisper Extension provides the following audio nodes for a Switchboard SDK audio graph:

NodeDescription
STTTranscribes voice audio signal to text.

Download​

You can find the download links for this extension on our Downloads page.