Google Cloud’s Speech to text AI

The University of Oslo has an institutional agreement with Google, giving you access to not only Google Docs, Google Sheets, Google Drive through your UiO user, but also to the Google Cloud Platform. This gives you the opportunity to use a lot of really clever models to help make your data acquisition and analysis easier. One of those models is their Speech to Text-service.

Start using GCP Speech to text.

With the Google Cloud Platform, usually referred to simply as GCP, you also get access to the ready-made machine learning (ML) solutions that Google has developed and made available to its users. The Speech to Text-service is one of them.

Automatic transcription

The service is based on neural networks, but you as a user do not need to know anything about how to train a machine learning (ML) model to use it. All you need is a recording that you want to transcribe. You can use it for free on smaller files, but if you want to use it for anything longer than 60 minutes there is a minor fee per every additional 15 seconds ($0.004-0.009).

In theory the service is supposed to recognize which language is spoken, but you can also make it easier (and increase the odds of a good result) by choosing a language from among 120 different languages that the model is trained for. The service also lets you indicate the number of speakers, claiming that it should be able to recognize up to five different speakers.

How well does it work?

If you have a Google Home, or have ever tried one, you have already indirectly used this service. This means that you have already experienced some of its limitations as well. For one thing, it understands English much better than any other language. 

The quality of the transcription is better for more common languages, especially for English, but our initial tests with Norwegian are not nearly as impressive (yet). The quality tends to be better the fewer speakers you have, especially if different speakers have different dialects. If you want to transcribe a recording of a single person speaking clearly and slowly, like in a well-rehearsed lecture, it performs significantly better than when tested on a recording of two people speaking, sometimes interrupting each other. The quality of the sound also affects the results. 

With time hopefully this service may become more useful all around than it is today.

Try it out

If you want to try it out on your data, all you need is to apply for a project in Google Cloud from the UiO GCP website.

Google are constantly feeding it new data, but any data you use this service on through our UiO agreement will not end up improving the model. Still, this service should only be used on green or yellow data. You should not have red (sensitive) data in Google Drive and you should not have red data in GCP either.  

Published June 3, 2020 2:52 PM - Last modified June 3, 2020 2:57 PM