How to Create a Speech Recognition App with Python and Google Cloud API

Speech recognition is the ability of a computer to convert spoken language into written text. With the help of Python and the Google Cloud API, you can easily create a speech recognition app that can transcribe audio files or live speech.

In this tutorial, you will learn how to create a speech recognition app using Python and the Google Cloud API. By the end of this tutorial, you will be able to transcribe audio files and perform live speech recognition.

Prerequisites

Before you start, you will need the following:

  • Python installed on your machine
  • Google Cloud account with Speech-to-Text API enabled
  • API credentials for the Speech-to-Text API

Step 1: Setting up the Google Cloud API

To use the Google Cloud API for speech recognition, you first need to set up the API and obtain the necessary credentials.

1.1 Enable the Speech-to-Text API

  1. Go to the Google Cloud Console.
  2. Create a new project or select an existing project.
  3. In the left sidebar, click on “APIs & Services” > “Library”.
  4. Search for “Speech-to-Text API” and click on it.
  5. Click the “Enable” button.

1.2 Create API Credentials

  1. In the left sidebar, click on “APIs & Services” > “Credentials”.
  2. Click the “+ Create Credentials” button and select “Service Account”.
  3. Enter a name for your service account and click the “Create” button.
  4. Select the role “Project” > “Owner” and click the “Continue” button.
  5. Click the “Create Key” button and select the key type “JSON”.
  6. The credentials JSON file will be downloaded to your machine. Keep it safe, as you will need it later.

Step 2: Installing Required Libraries

To interact with the Google Cloud API from Python, you need to install the google-cloud-speech library.

Open a terminal window and run the following command to install the library:

pip install google-cloud-speech

Step 3: Configuring Authentication

To authenticate your Python script with the Google Cloud API, you need to set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of your credentials JSON file.

If you are using Windows, run the following command in the terminal:

set GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

If you are using macOS or Linux, run the following command instead:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

Replace /path/to/credentials.json with the actual path to your credentials JSON file.

Step 4: Uploading Audio File to Google Cloud Storage

To transcribe an audio file using the Speech-to-Text API, you first need to upload the audio file to Google Cloud Storage.

4.1 Create a Bucket in Google Cloud Storage

  1. Go to the Google Cloud Console.
  2. Select your project from the project dropdown.
  3. In the left sidebar, click on “Storage” > “Browser”.
  4. Click the “+ Create Bucket” button.
  5. Enter a name for your bucket, select a location, and click the “Create” button.

4.2 Upload the Audio File

  1. Open a terminal window and run the following command to install the google-cloud-storage library:
    pip install google-cloud-storage
    
  2. Create a new Python script named upload_audio.py and open it in your preferred code editor.
  3. Import the necessary libraries:

    from google.cloud import storage
    
  4. Upload the audio file to the Google Cloud Storage bucket:
    def upload_audio(bucket_name, audio_file_path):
        client = storage.Client()
        bucket = client.bucket(bucket_name)
        blob = bucket.blob(audio_file_path)
    
        blob.upload_from_filename(audio_file_path)
    
        print(f"Audio file {audio_file_path} uploaded to {bucket_name}.")
    

    Replace bucket_name with the name of your bucket and audio_file_path with the path to your audio file.

  5. Test the script by calling the upload_audio function:

    if __name__ == "__main__":
        bucket_name = "your-bucket-name"
        audio_file_path = "path-to-your-audio-file"
    
        upload_audio(bucket_name, audio_file_path)
    

    Replace your-bucket-name with the name of your bucket and path-to-your-audio-file with the actual path to your audio file.

  6. Run the script by executing the following command in the terminal:

    python upload_audio.py
    

    If successful, you will see a message indicating that the audio file has been uploaded to the specified bucket.

Step 5: Transcribing Audio using the Speech-to-Text API

Now that you have uploaded the audio file to Google Cloud Storage, you can transcribe it using the Speech-to-Text API.

5.1 Create a Speech-to-Text Transcription Job

  1. Go to the Google Cloud Console.
  2. In the left sidebar, click on “Storage” > “Browser”.
  3. Click on your bucket name to open it.
  4. Click on the audio file you uploaded in the previous step.
  5. In the top bar, click on “More” (represented by three dots) and select “Request transcription”.
  6. Configure the transcription job by selecting the language, the transcript output format, and other settings.
  7. Click the “Submit” button to start the transcription job.

5.2 Check Transcription Job Status

  1. Go to the Google Cloud Console.
  2. In the left sidebar, click on “Storage” > “Browser”.
  3. Click on your bucket name to open it.
  4. Click on the audio file with the transcription job.
  5. In the top bar, click on “More” (represented by three dots) and select “Listening”.
  6. You will see the status of the transcription job: “Processing”, “Succeeded”, or “Failed”. Wait until the status is “Succeeded” before proceeding.

5.3 Download the Transcription Result

  1. In the “Listening” dialog, click on the “Transcripts” tab.
  2. Click the “Download” button next to the transcription you want to download.
  3. The transcription result will be downloaded to your machine in the specified format.

Step 6: Performing Live Speech Recognition

In addition to transcribing audio files, you can also perform live speech recognition using the Speech-to-Text API.

6.1 Install the Required Libraries

To perform live speech recognition, you need to install the pyaudio library. Open a terminal window and run the following command:

pip install pyaudio

6.2 Create a Live Speech Recognition Script

  1. Create a new Python script named live_speech_recognition.py and open it in your preferred code editor.
  2. Import the necessary libraries:
    import speech_recognition as sr
    
  3. Create a function to perform live speech recognition:
    def live_speech_recognition():
        recognizer = sr.Recognizer()
        microphone = sr.Microphone()
    
        with microphone as source:
            recognizer.adjust_for_ambient_noise(source)
    
            while True:
                print("Say something...")
                audio = recognizer.listen(source)
    
                try:
                    text = recognizer.recognize_google_cloud(audio)
                    print("You said:", text)
                except sr.UnknownValueError:
                    print("Could not understand audio")
                except sr.RequestError as e:
                    print("Error:", str(e))
    
  4. Test the live speech recognition script by calling the live_speech_recognition function:
    if __name__ == "__main__":
        live_speech_recognition()
    
  5. Run the script by executing the following command in the terminal:
    python live_speech_recognition.py
    

    The script will continuously listen for your speech input and display the recognized text on the console.

Conclusion

In this tutorial, you learned how to create a speech recognition app using Python and the Google Cloud API. You learned how to set up the Google Cloud API, install the required libraries, configure authentication, upload an audio file to Google Cloud Storage, transcribe the audio using the Speech-to-Text API, and perform live speech recognition.

With this knowledge, you can now create your own speech recognition apps and integrate them into your projects.

Related Post