Select Page

Google boosts Cloud Speech API with word-level timestamps and support for 30 new languages

Google has announced a number of notable updates to its Cloud Speech API, a product first unveiled as part of the company’s Cloud Machine Learning platform last year.

The Cloud Speech API, in a nutshell, allows third-party developers and companies to integrate Google’s speech recognition smarts into their own products. For example, contact centers may wish to use the API to automatically route calls to specific departments by “listening” to a caller’s commands. Earlier this year, Twilio tapped the API for its voice platform, enabling its own developer customers to transform speech into text within their products.

Now Google has announced three new updates to the Cloud Speech API. Top of the list, arguably, is word-level time offsets, or timestamps. These are particularly useful for longer audio files when the user may need to find a specific word in the audio. It basically allows the audio to be mapped directly to the text, allowing anyone from researchers to reporters to find exactly where a word or phrase was used in, say, an interview. It will also enable text to be displayed in real time as the audio is playing.

“Our number one most requested feature has been providing timestamp information for each word in the transcript,” explained Google product manager Dan Aharon, in a blog post.

Somewhat related to this, Google has also now extended long-form audio support from 80 minutes to 180 minutes, and it may support longer files on a “case-by-base” basis upon request, according to Aharon.

The final piece of the…

Share This