Posted by Marta on February 1, 2023 Viewed 16066 times
In this tutorial, you will learn step by step how to perform speech recognition in Python, voice to text, using the Google API.
Speech Recognition means that the program will capture the words produced by a person and converts them into written words. It can be handy to generate subtitles, transcript a meeting discussion, and many other use cases.
Converting speech to text is quite a complex machine learning problem where an algorithm needs to receive every sound produced by a person and identify the corresponding written letters. Plus, depending on the language used, different sounds might correspond to other characters. As a result, speech recognition is too complex to be solved using a traditional programming approach.
Fortunately, big companies like Google, Amazon, IBM, and others have already solved this problem. They collected many audios, fed this data to algorithms using machine learning techniques, and produced trained algorithms to convert speech to text with really high accuracy. Plus, these algorithms are available through API’s to easily integrate them into your programs.
This article will show you how using Python, and the Google API can transcribe audio with a few code lines. Let’s get started!
Google offers a Speech-To-Text service through an API, meaning that you can send a request with an audio file, and you will receive the transcription of the audio file. This service makes simple, including python speech recognition functionality in your programs.
Here are the steps you need to follow to integrate your program with the Google Speech-To-Text API.
The first thing you need to access Google APIs is a Google account and create a Google application. You can create a google application using the google console:
Once you open the google console, click on the dropdown at the top. This dropdown is displaying your existing google application. After clicking, a pop up will appear, then click on “New Project.”
Then enter your application name and click on Create.
Once you have created your google application, you need to grant your application access to the “Google Cloud Speech-To-Text” API. To do so, go to the application dashboard and from there, go to the APIs overview. See below how to access:
Click on “Enable Apis and Service,” and then search by “speech,” then all Google APIs to do with text will be listed.
And then click “Enable.” Once enabled, you will grant permissions to your application to access the “Google Cloud Speech to Text API.”
The next step is Downloading your Google credentials. The credentials are necessary so Google can authenticate your application, and therefore Google knows that their API is being accessed by you. This way, they can measure how much you are using their APIs and charge you if the consumption passes the free threshold.
Here are the steps to download the google credentials. First, from the home dashboard, got to “Go to APIs overview,” just like before, and on the left-hand side menu, click on credentials.
Then click on “Create Credentials” and create a “Service Account.”
Enter any service account name you like, and click Create.
Now click on the service account you just created. The last click will take you to the service account details. Go to the “Keys” section and click on “Add Key” and “Create New Key,” which will create a new key. This key is associated with your application through the service account.
In the pop-up, select JSON and click on Create, which will download a JSON file containing the key to your machine. Please make a note of where you save this file since you will need it next.
Now, the last step is setting up the environment variable GOOGLE_APPLICATION_CREDENTIALS. This variable will be used by your program to authenticate with Google. See below the command you need to run to create the environment variable from your terminal:
>> export GOOGLE_APPLICATION_CREDENTIALS="/<replace-with-the-path-where-the-key-is>/key-file.json"
Perfect! You have done all configuration needed to use the google speech-to-text API, so we got to the last step. Write the python code!
Our program will need the third-party library google-cloud-speech
, which will send requests to Google. You can install this library running the following command from your terminal:
>> pip install --upgrade google-cloud-speech
Lastly, you can copy the code below and save it as a python script. Please note the audio file should be in the same folder as the script. Also, you will need to replace the file name test.wav
with your file name.
from google.cloud import speech import os import io # Creates google client client = speech.SpeechClient() # Full path of the audio file, Replace with your file name file_name = os.path.join(os.path.dirname(__file__),"test.wav") #Loads the audio file into memory with io.open(file_name, "rb") as audio_file: content = audio_file.read() audio = speech.RecognitionAudio(content=content) config = speech.RecognitionConfig( encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16, audio_channel_count=2, language_code="en-US", ) # Sends the request to google to transcribe the audio response = client.recognize(request={"config": config, "audio": audio}) # Reads the response for result in response.results: print("Transcript: {}".format(result.alternatives[0].transcript))
In case your file has a different extension, you can convert it using an online file converter. Go to m4a to wav converter.
If your program is working correctly, this is the output you will see after executing your script:
>> python speech_to_text.py # Replace with your program file name
Output
Transcript: hey there in this area you will learn how you can set your django version there are a few ways Transcript: there are a few ways to check your django version and in this video I will show you a few of them I will also show you how you can upgrade and downgrade your django version
What can go wrong? There are a few errors that you might encounter when executing this script. Let’s see some of them.
Here is one of the errors that you might receive. This error means that your program couldn’t authenticate with the google API.
last exception: 503 Getting metadata from plugin failed with error: ('invalid_grant: Bad Request', '{\n "error": "invalid_grant",\n "error_description": "Bad Request"\n}')
To fix it, first, make sure you to the “Google APIs overview” and check that the “Google Cloud Speech-to-text” API is enabled.
Second, run the following command from your terminal to list the environment variables and check the GOOGLE_APPLICATION_CREDENTIALS
was created, and it’s pointing to your JSON file.
# List environment variables >> env # See the variable value >> echo $GOOGLE_APPLICATION_CREDENTIALS
To summarise, in this tutorial, we have seen how you can perform speech recognition with python using the “Google Cloud Speech-to-text” API. Although speech to text is challenging and problematic, it can be complete in a few code lines by taking advantage of the google-ready trained algorithms.
Hope you enjoy this tutorial and thank you so much for reading and supporting this blog! 🙂
Steady pace book with lots of worked examples. Starting with the basics, and moving to projects, data visualisation, and web applications
Unique lay-out and teaching programming style helping new concepts stick in your memory
Great guide for those who want to improve their skills when writing python code. Easy to understand. Many practical examples
Perfect Boook for anyone who has an alright knowledge of Java and wants to take it to the next level.
Excellent read for anyone who already know how to program and want to learn Best Practices
Perfect book for anyone transitioning into the mid/mid-senior developer level
Great book and probably the best way to practice for interview. Some really good information on how to perform an interview. Code Example in Java