How to Create a Text-to-Speech App with Python and Google Cloud API

In this tutorial, we will learn how to create a text-to-speech application using Python and the Google Cloud Text-to-Speech API. The Google Cloud Text-to-Speech API allows us to convert text into natural-sounding speech.

By the end of this tutorial, you will have a working text-to-speech app that can read out any text you provide.

Prerequisites

Before we begin, make sure you have the following prerequisites:

Basic knowledge of Python programming.
A Google Cloud Platform (GCP) account.
Python 3.x installed on your machine.

Setting Up the Google Cloud Project

To get started, we need to create a new project in the Google Cloud Platform Console.

Go to the Google Cloud Platform Console and sign in with your Google account.
Click on the project drop-down and select “New Project”.
Give your project a name and click on the “Create” button to create the project.
Once the project is created, select it from the project drop-down.
Enable the Text-to-Speech API by going to the API Library and searching for “Text-to-Speech”. Click on the API and then click on the “Enable” button.
Next, we need to create a service account key. Go to the IAM & Admin > Service Accounts section and click on the “Create Service Account” button.
Give your service account a name and click on the “Create and Continue” button.
Add the “Text-to-Speech Admin” role to the service account and click on the “Done” button.
Click on the three dots next to the newly created service account, click on “Create Key”, select “JSON” as the key type, and click on the “Create” button. This will download a JSON key file to your computer.
Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of the downloaded service account key file. You can do this by executing the following command in your terminal:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json

Make sure to replace /path/to/key.json with the actual path to the downloaded key file.

Installing the Required Libraries

Next, we need to install the required libraries to interact with the Google Cloud Text-to-Speech API.

Open your terminal and run the following command:

pip install google-cloud-texttospeech

This command will install the google-cloud-texttospeech library, which we will use to programmatically interface with the Text-to-Speech API.

Creating the Python Text-to-Speech App

Now that we have set up the Google Cloud project and installed the required libraries, let’s start building our text-to-speech app.

First, create a new Python file called text_to_speech_app.py. Open the file in your favorite text editor or IDE.

Next, let’s import the required modules and create an instance of the Text-to-Speech client:

from google.cloud import texttospeech

# Create the Text-to-Speech client
text_to_speech_client = texttospeech.TextToSpeechClient()

The texttospeech module provides the necessary classes and methods to interact with the Google Cloud Text-to-Speech API. The above code creates an instance of the TextToSpeechClient class, which we will use to make API requests.

Now, let’s define a function that takes a text input and converts it to speech:

def text_to_speech(text, output_file):
    # Set the input text
    synthesis_input = texttospeech.SynthesisInput(text=text)

    # Set the voice parameters
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
    )

    # Set the audio file format
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    # Perform the text-to-speech conversion
    response = text_to_speech_client.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )

    # Write the response to the output file
    with open(output_file, "wb") as f:
        f.write(response.audio_content)

In the above code, we define the text_to_speech function that takes two parameters: the text to convert and the path to the output file.

Inside the function, we create an instance of the SynthesisInput class with the input text. We also set the voice parameters using the VoiceSelectionParams class and specify the language code and gender.

Next, we set the audio file format to MP3 using the AudioConfig class.

Finally, we call the synthesize_speech method on the TextToSpeechClient instance and pass in the synthesis input, voice, and audio configuration. The method returns a SynthesizeSpeechResponse object, which contains the synthesized audio content.

We then write the audio content to the output file.

Now, let’s add the main part of our script that interacts with the user:

if __name__ == "__main__":
    text = input("Enter the text to convert to speech: ")
    output_file = input("Enter the path for the output audio file: ")

    text_to_speech(text, output_file)

    print("Text-to-speech conversion successful!")

In the above code, we use the input function to get the text and output file path from the user. We then call the text_to_speech function with these inputs.

Finally, we print a success message to the console.

Running the Text-to-Speech App

To run the text-to-speech app, open your terminal and navigate to the directory where you saved the text_to_speech_app.py file.

Execute the following command:

python text_to_speech_app.py

You will be prompted to enter the text and the output file path. After entering the inputs, press enter to start the conversion process.

The app will send a request to the Google Cloud Text-to-Speech API and save the synthesized audio to the specified output file.

Once the conversion is complete, you will see the success message on the console.

Congratulations! You have created a text-to-speech app using Python and the Google Cloud Text-to-Speech API.

Conclusion

In this tutorial, we learned how to create a text-to-speech app using Python and the Google Cloud Text-to-Speech API.

We set up a Google Cloud project, installed the necessary libraries, and wrote Python code to convert text to speech.

By following this tutorial, you can now build your own text-to-speech applications and explore the various features and options provided by the Google Cloud Text-to-Speech API.