Speech recognition is the ability of a computer to convert spoken language into written text. With the help of Python and the Google Cloud API, you can easily create a speech recognition app that can transcribe audio files or live speech.
In this tutorial, you will learn how to create a speech recognition app using Python and the Google Cloud API. By the end of this tutorial, you will be able to transcribe audio files and perform live speech recognition.
Prerequisites
Before you start, you will need the following:
- Python installed on your machine
- Google Cloud account with Speech-to-Text API enabled
- API credentials for the Speech-to-Text API
Step 1: Setting up the Google Cloud API
To use the Google Cloud API for speech recognition, you first need to set up the API and obtain the necessary credentials.
1.1 Enable the Speech-to-Text API
- Go to the Google Cloud Console.
- Create a new project or select an existing project.
- In the left sidebar, click on “APIs & Services” > “Library”.
- Search for “Speech-to-Text API” and click on it.
- Click the “Enable” button.
1.2 Create API Credentials
- In the left sidebar, click on “APIs & Services” > “Credentials”.
- Click the “+ Create Credentials” button and select “Service Account”.
- Enter a name for your service account and click the “Create” button.
- Select the role “Project” > “Owner” and click the “Continue” button.
- Click the “Create Key” button and select the key type “JSON”.
- The credentials JSON file will be downloaded to your machine. Keep it safe, as you will need it later.
Step 2: Installing Required Libraries
To interact with the Google Cloud API from Python, you need to install the google-cloud-speech
library.
Open a terminal window and run the following command to install the library:
pip install google-cloud-speech
Step 3: Configuring Authentication
To authenticate your Python script with the Google Cloud API, you need to set the environment variable GOOGLE_APPLICATION_CREDENTIALS
to the path of your credentials JSON file.
If you are using Windows, run the following command in the terminal:
set GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
If you are using macOS or Linux, run the following command instead:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
Replace /path/to/credentials.json
with the actual path to your credentials JSON file.
Step 4: Uploading Audio File to Google Cloud Storage
To transcribe an audio file using the Speech-to-Text API, you first need to upload the audio file to Google Cloud Storage.
4.1 Create a Bucket in Google Cloud Storage
- Go to the Google Cloud Console.
- Select your project from the project dropdown.
- In the left sidebar, click on “Storage” > “Browser”.
- Click the “+ Create Bucket” button.
- Enter a name for your bucket, select a location, and click the “Create” button.
4.2 Upload the Audio File
- Open a terminal window and run the following command to install the
google-cloud-storage
library:pip install google-cloud-storage
- Create a new Python script named
upload_audio.py
and open it in your preferred code editor. -
Import the necessary libraries:
from google.cloud import storage
- Upload the audio file to the Google Cloud Storage bucket:
def upload_audio(bucket_name, audio_file_path): client = storage.Client() bucket = client.bucket(bucket_name) blob = bucket.blob(audio_file_path) blob.upload_from_filename(audio_file_path) print(f"Audio file {audio_file_path} uploaded to {bucket_name}.")
Replace
bucket_name
with the name of your bucket andaudio_file_path
with the path to your audio file. -
Test the script by calling the
upload_audio
function:if __name__ == "__main__": bucket_name = "your-bucket-name" audio_file_path = "path-to-your-audio-file" upload_audio(bucket_name, audio_file_path)
Replace
your-bucket-name
with the name of your bucket andpath-to-your-audio-file
with the actual path to your audio file. -
Run the script by executing the following command in the terminal:
python upload_audio.py
If successful, you will see a message indicating that the audio file has been uploaded to the specified bucket.
Step 5: Transcribing Audio using the Speech-to-Text API
Now that you have uploaded the audio file to Google Cloud Storage, you can transcribe it using the Speech-to-Text API.
5.1 Create a Speech-to-Text Transcription Job
- Go to the Google Cloud Console.
- In the left sidebar, click on “Storage” > “Browser”.
- Click on your bucket name to open it.
- Click on the audio file you uploaded in the previous step.
- In the top bar, click on “More” (represented by three dots) and select “Request transcription”.
- Configure the transcription job by selecting the language, the transcript output format, and other settings.
- Click the “Submit” button to start the transcription job.
5.2 Check Transcription Job Status
- Go to the Google Cloud Console.
- In the left sidebar, click on “Storage” > “Browser”.
- Click on your bucket name to open it.
- Click on the audio file with the transcription job.
- In the top bar, click on “More” (represented by three dots) and select “Listening”.
- You will see the status of the transcription job: “Processing”, “Succeeded”, or “Failed”. Wait until the status is “Succeeded” before proceeding.
5.3 Download the Transcription Result
- In the “Listening” dialog, click on the “Transcripts” tab.
- Click the “Download” button next to the transcription you want to download.
- The transcription result will be downloaded to your machine in the specified format.
Step 6: Performing Live Speech Recognition
In addition to transcribing audio files, you can also perform live speech recognition using the Speech-to-Text API.
6.1 Install the Required Libraries
To perform live speech recognition, you need to install the pyaudio
library. Open a terminal window and run the following command:
pip install pyaudio
6.2 Create a Live Speech Recognition Script
- Create a new Python script named
live_speech_recognition.py
and open it in your preferred code editor. - Import the necessary libraries:
import speech_recognition as sr
- Create a function to perform live speech recognition:
def live_speech_recognition(): recognizer = sr.Recognizer() microphone = sr.Microphone() with microphone as source: recognizer.adjust_for_ambient_noise(source) while True: print("Say something...") audio = recognizer.listen(source) try: text = recognizer.recognize_google_cloud(audio) print("You said:", text) except sr.UnknownValueError: print("Could not understand audio") except sr.RequestError as e: print("Error:", str(e))
- Test the live speech recognition script by calling the
live_speech_recognition
function:if __name__ == "__main__": live_speech_recognition()
- Run the script by executing the following command in the terminal:
python live_speech_recognition.py
The script will continuously listen for your speech input and display the recognized text on the console.
Conclusion
In this tutorial, you learned how to create a speech recognition app using Python and the Google Cloud API. You learned how to set up the Google Cloud API, install the required libraries, configure authentication, upload an audio file to Google Cloud Storage, transcribe the audio using the Speech-to-Text API, and perform live speech recognition.
With this knowledge, you can now create your own speech recognition apps and integrate them into your projects.