Speech-to-Text
Accurately convert
speech into text using an API powered by Google’s AI
technologies.
Try it for free
-
Transcribe your content in real time or from stored files
-
Deliver a better user experience in products through voice commands
-
Gain insights from customer interactions to improve your service
Benefits
State-of-the-art accuracy
Apply Google’s most advanced deep learning neural network
algorithms for automatic speech recognition (ASR).
Global reach
Meet your users where they are, globally, with voice
recognition that supports more than
125 languages and variants.
Flexible deployment
Deploy speech recognition wherever you need, whether in
the cloud with the API or on-premises with
Speech-to-Text On-Prem.
Demo
Put Speech-to-Text into action
Key features
Key features
Speech adaptation
Customize speech recognition to transcribe
domain-specific terms and rare words by providing hints
and boost your transcription accuracy of specific words or
phrases. Automatically convert spoken numbers into
addresses, years, currencies, and more using classes.
Domain-specific models
Choose from a
selection of trained models
for voice control and phone call and video transcription
optimized for domain-specific quality requirements. For
example, our enhanced phone call model is tuned for audio
originated from telephony, such as phone calls recorded at
an 8khz sampling rate.
Streaming speech recognition
Receive real-time speech recognition results as the API
processes the audio input streamed from your application’s
microphone or sent from a prerecorded audio file (inline
or through Cloud Storage).
Speech-to-Text On-Prem
Have full control over your infrastructure and protected
speech data while leveraging Google’s speech recognition
technology
on-premises,
right in your own private data centers.
Contact sales to
get started.
Customers
Customers
What's new
What's new
Sign up
for Google Cloud newsletters to receive product updates,
event information, special offers, and more.
Documentation
Documentation
Speech-to-Text basics
Learn the fundamental
concepts in Speech-to-Text.
Quickstart: Using the gcloud tool
Send an audio
transcription request to Speech-to-Text using the
gcloud tool from the command line.
Best practices
Review the best
practices for transcribing audio with
Speech-to-Text.
Supported languages
Learn which languages
are available for Speech-to-Text, plus the features
and recognition models available for each.
Speech-to-Text On-Prem
Learn more about
Speech-to-Text On-Prem, which enables easy integration
of Google speech recognition technology into your
on-premises solutions.
Not seeing what you’re looking for?
Use cases
Use cases
Improve
customer service
Empower your customer service system by adding IVR
(interactive voice response) and agent conversations to your
call centers. Perform analytics on your conversation data to
gain more insights into the calls and your customers.
Speech-to-Text and its enhanced phone call models are
already powering Google Cloud’s powerful solution,
Contact Center AI.
Enable voice
control
Implement voice commands such as “turn the volume up,” and
voice search such as saying “what is the temperature in
Paris?” Combine this with the
Text-to-Speech API
to deliver voice-enabled experiences in IoT (Internet of
Things) applications.
Transcribe
multimedia content
Transcribe your audio and video to include captions and
improve your audience reach and experience. Add subtitles to
your content real time to your streaming content. Our
video transcription model
is ideal for indexing or subtitling video and/or
multispeaker content and uses machine learning technology
that is similar to video captioning on YouTube.
All features
All features
| Global vocabulary | Support your global user base with Speech-to-Text’s extensive language support in over 125 languages and variants. |
| Streaming speech recognition | Receive real-time speech recognition results as the API processes the audio input streamed from your application’s microphone or sent from a prerecorded audio file (inline or through Cloud Storage). |
| Speech adaptation | Customize speech recognition to transcribe domain-specific terms and rare words by providing hints and boost your transcription accuracy of specific words or phrases. Automatically convert spoken numbers into addresses, years, currencies, and more using classes. |
| Speech-to-Text On-Prem | Have full control over your infrastructure and protected speech data while leveraging Google’s speech recognition technology on-premises, right in your own private data centers. Contact sales to get started. |
| Multichannel recognition | Speech-to-Text can recognize distinct channels in multichannel situations (e.g., video conference) and annotate the transcripts to preserve the order. |
| Noise robustness | Speech-to-Text can handle noisy audio from many environments without requiring additional noise cancellation. |
| Domain-specific models | Choose from a selection of trained models for voice control and phone call and video transcription optimized for domain-specific quality requirements. For example, our enhanced phone call model is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate. |
| Content filtering | Profanity filter helps you detect inappropriate or unprofessional content in your audio data and filter out profane words in text results. |
| Auto-detect language (beta) | Specify up to four language codes and Speech-to-Text will identify the correct language spoken in multilingual scenarios. |
| Automatic punctuation (beta) | Speech-to-Text accurately punctuates transcriptions (e.g., commas, question marks, and periods). |
| Speaker diarization (beta) | Know who said what by receiving automatic predictions about which of the speakers in a conversation spoke each utterance. |
Pricing
Pricing
The first 60 minutes of Speech-to-Text successfully
processed each month is free, then it is priced per 15
seconds of audio. Specific rates vary depending on the model
used, if there is data logging, and the number of audio
channels.
Take the next step
Start
building on Google Cloud with $300 in free credits and 20+
always free products.
Try it for free
-
Need help getting started?Contact sales
-
Work with a trusted partnerFind a partner
-
Continue browsingSee all products
