How to Create a Voice Assistant with Python and Google Speech API

Introduction

Voice assistants have become increasingly popular in recent years, allowing users to interact with computers and other smart devices using only their voice. In this tutorial, we will learn how to create a voice assistant using Python and the Google Speech API.

The Google Speech API is a powerful tool that allows developers to convert spoken language into written text. Using this API, we can easily integrate speech recognition capabilities into our Python applications.

Prerequisites

To follow along with this tutorial, you will need the following:

Python (version 3.6 or higher)
Google Cloud account with the Speech-to-Text API enabled
Google Cloud SDK installed and authenticated

Setting up the Google Cloud Platform

Before we can start using the Google Speech API, we need to set up a project in the Google Cloud Platform and enable the Speech-to-Text API.

Go to the Google Cloud Console and sign in with your Google account.
Create a new project by clicking the project drop-down and selecting “New Project”. Enter a name for your project and click “Create”.
Once the project is created, click on the project drop-down again and select your newly created project.
Enable the Speech-to-Text API by clicking on the navigation menu (☰) and selecting “APIs & Services > Library”. Search for “Speech-to-Text API” and click on the result.
On the Speech-to-Text API page, click “Enable” to enable the API for your project.
We now need to set up authentication. Click on the navigation menu (☰) and select “APIs & Services > Credentials”.
On the Credentials page, click on “Create Credentials” and select “Service Account”.
Enter a name for your service account and click “Create”. Make sure to give the account the “Editor” role so it has the necessary permissions.
Once the service account is created, click on the “Actions” button in the “Actions” column and select “Create Key”.
Choose the key type as JSON and click “Create”. This will download a JSON file containing your service account credentials. Keep this file secure as it contains sensitive information.
Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to the path of your service account JSON file. This can be done by running the following command in your terminal:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/credentials.json

With the Google Cloud Platform set up, we can now move on to coding our voice assistant.

Installing the Required Libraries

To interact with the Google Speech API, we will need to install the google-cloud-speech library. Open a terminal and run the following command:

pip install google-cloud-speech

Note that this library requires the Google Cloud SDK to be installed and authenticated as mentioned in the prerequisites.

Implementing the Voice Assistant

Now that we have the necessary setup and libraries installed, we can start implementing our voice assistant. In this tutorial, we will create a simple voice assistant that listens to the user’s command, converts the speech to text, and responds accordingly.

First, create a new Python file called voice_assistant.py and open it in your favorite text editor or IDE.

Importing the Required Libraries

Start by importing the necessary libraries:

from google.cloud import speech

import os
import pyaudio
import wave

We import the speech module from google.cloud to use the Google Speech-to-Text API. We also import os, pyaudio, and wave to record and play audio.

Setting up the Google Speech-to-Text API

Before we can use the Google Speech API, we need to set up a client that will interact with the API. Add the following code to your voice_assistant.py file:

# Set up Google Speech-to-Text client
client = speech.SpeechClient()

Recording Audio

Next, we need to implement a function that records audio from the user’s microphone. We will use the pyaudio library for this. Add the following code to your voice_assistant.py file:

def record_audio(file_path, duration=5):
    """
    Record audio from the user's microphone and save it to a file.

    Args:
        file_path (str): Path to save the audio file.
        duration (int): Duration of the recording in seconds (default: 5).
    """
    CHUNK = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 16000

    p = pyaudio.PyAudio()

    stream = p.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)

    print("Recording audio...")
    frames = []

    for i in range(0, int(RATE / CHUNK * duration)):
        data = stream.read(CHUNK)
        frames.append(data)

    print("Finished recording audio.")

    stream.stop_stream()
    stream.close()
    p.terminate()

    wf = wave.open(file_path, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

This function takes a file path and a duration as parameters. It uses the pyaudio library to record audio from the user’s microphone and save it to the specified file path.

Converting Speech to Text

Now that we are able to record audio, we can use the Google Speech-to-Text API to convert the recorded speech into text. Add the following code to your voice_assistant.py file:

def transcribe_audio(file_path):
    """
    Transcribe speech from an audio file using the Google Speech-to-Text API.

    Args:
        file_path (str): Path to the audio file.

    Returns:
        str: Transcribed text.
    """
    with open(file_path, 'rb') as audio_file:
        audio = speech.RecognitionAudio(content=audio_file.read())

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
    )

    response = client.recognize(config=config, audio=audio)

    for result in response.results:
        return result.alternatives[0].transcript

    return ""

This function takes a file path as a parameter and transcribes the speech from the audio file using the Google Speech-to-Text API. It returns the transcribed text.

Playing Audio

Lastly, we need a function to play audio responses. We will use the wave library for this. Add the following code to your voice_assistant.py file:

def play_audio(file_path):
    """
    Play an audio file.

    Args:
        file_path (str): Path to the audio file.
    """
    os.system("afplay " + file_path)

This function takes a file path as a parameter and plays the audio file using the afplay command on macOS. You can modify this function if you are using a different operating system.

Putting it All Together

Now that we have implemented all the necessary functions, let’s put them together in a main function that will use the voice assistant. Add the following code to your voice_assistant.py file:

def main():
    # Record audio from the user
    audio_file = "audio.wav"
    record_audio(audio_file)

    # Convert speech to text
    text = transcribe_audio(audio_file)
    print("You said:", text)

    # Generate a response based on the transcribed text
    response = generate_response(text)
    print("Response:", response)

    # Convert text to speech and play the response
    response_file = "response.wav"
    generate_audio(response, response_file)
    play_audio(response_file)

if __name__ == "__main__":
    main()

In the main function, we first record audio from the user and save it to a file. Then, we convert the recorded speech to text using the Google Speech API. Next, we generate a response based on the transcribed text (you can implement your own logic for generating responses). Finally, we convert the response text to speech and play it back to the user.

Testing the Voice Assistant

To test our voice assistant, simply run the voice_assistant.py script from the terminal:

python voice_assistant.py

The script will prompt you to speak and record your speech. After transcribing and generating a response, it will play the response audio. You can modify the generate_response function to generate appropriate responses based on the user’s commands.

Conclusion

In this tutorial, we have learned how to create a simple voice assistant using Python and the Google Speech API. We set up the Google Cloud Platform, recorded audio from the user’s microphone, transcribed the speech to text using the Google Speech-to-Text API, generated responses based on the transcribed text, and played the response audio back to the user.

Voice assistants are becoming increasingly popular and can be integrated into a wide range of applications to provide a more natural and intuitive interface for users. With the Google Speech API and Python, you have the tools to create your own voice assistant that can understand and respond to user commands.