How to Create a Speech Synthesis App with Python and Google Text-to-Speech API

Speech synthesis, also known as text-to-speech (TTS), is the process of converting written content into spoken words. It has countless applications, from voice assistants to audiobook production. In this tutorial, we will explore how to create a speech synthesis app using Python and the Google Text-to-Speech API.

Prerequisites

To follow along with this tutorial, you will need:

Python 3.6 or higher installed on your machine
A Google Cloud Platform (GCP) account with billing enabled (the Google Text-to-Speech API is a paid service)
The gTTS library, which you can install by running pip install gTTS

Once you have these prerequisites, you’re ready to dive into building your speech synthesis app.

Setting up Google Cloud Platform

Before we can start using the Google Text-to-Speech API, we need to set up a project on the Google Cloud Platform console and enable the API. Here’s how:

Go to the Google Cloud Platform Console and sign in with your GCP account.
Create a new project by clicking the project name dropdown at the top of the page and selecting “New Project.” Give your project a unique name, and make note of the project ID.
Once your new project is created, select it from the project name dropdown.
Enable billing for your project by clicking the three-bar menu in the top-left corner, selecting “Billing,” and following the prompts to enable billing.
Enable the Google Text-to-Speech API by clicking the three-bar menu again, selecting “APIs & Services,” then “Library.” Search for “Text-to-Speech API,” click on it, and enable it for your project.
Create service account credentials by clicking the three-bar menu, selecting “APIs & Services,” then “Credentials.” Click “Create Credentials” and choose “Service Account.” Enter a name for your service account, select a role (we recommend “Project Owner” for this tutorial), and click “Continue.” Follow the prompts until you see your newly created service account.
Click on your newly created service account, then switch to the “Keys” tab. Click “Add Key” and choose “JSON.” Save the JSON file that is downloaded to your computer, as we will need it later.

With these steps completed, you have set up the necessary resources on the Google Cloud Platform to use the Text-to-Speech API.

Installing Dependencies

Before we dive into writing code, let’s install the necessary dependencies. Open a terminal or command prompt and run the following command:

pip install gTTS

This will install the gTTS library, which stands for “Google Text-to-Speech.” It provides a simple interface to interact with the Google Text-to-Speech API from Python.

Writing the Code

Now it’s time to write the code for our speech synthesis app. Create a new Python file and open it in your favorite text editor. We will start by importing the necessary libraries and setting up the authentication for the Google Text-to-Speech API.

from gtts import gTTS
import os
from google.oauth2 import service_account

# Load the service account credentials
credentials = service_account.Credentials.from_service_account_file(
    'path/to/your/service-account-key.json'
)
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/your/service-account-key.json'

In the code above, we import the gTTS library, which we installed earlier. We also import the os module, which we will use to set the environment variable required for authentication. Additionally, we import the service_account module from the google.oauth2 package to load the service account credentials.

Replace 'path/to/your/service-account-key.json' with the path to the JSON file you downloaded when creating the service account credentials. This file contains the necessary authentication information to access the Google Text-to-Speech API.

Now let’s define a function that takes a text input and uses the Google Text-to-Speech API to synthesize it into speech. Add the following code to your script:

def synthesize_text(text, language='en-US', output_file='output.mp3'):
    # Create a TTS client
    tts = gTTS(text, lang=language, credentials=credentials)

    # Save the synthesized speech to a file
    tts.save(output_file)

In this function, we first create a gTTS object by passing in the input text, language code (defaults to ‘en-US’), and the credentials we loaded earlier. This object represents the request to synthesize speech using the Google Text-to-Speech API.

Next, we call the save() method on the gTTS object and pass in the output file name. This method sends the request to the API and saves the synthesized speech to the specified file.

Now that we have defined the function to synthesize text into speech, let’s test it by calling the function with some sample text. Add the following code to the end of your script:

synthesize_text('Hello, world!', output_file='hello.mp3')

In this example, we pass the text 'Hello, world!' and specify the output file name as 'hello.mp3'. Feel free to change the text and output file name to suit your needs.

Running the App

To run the speech synthesis app, save the script file and open a terminal or command prompt in the same directory. Run the following command:

python your_script_file_name.py

Replace your_script_file_name.py with the name you gave to your script file.

If everything is set up correctly, you should see a new file named hello.mp3 (or whatever output file name you specified) in the same directory as your script file. Open the file, and you should hear the synthesized speech saying “Hello, world!”

Congratulations! You have successfully created a speech synthesis app using Python and the Google Text-to-Speech API.

Advanced Usage

The basic usage we covered so far is just the beginning. The gTTS library provides several additional features and options for more advanced usage. Here are a few examples:

Modifying Speech Parameters

You can modify various speech parameters such as the speech speed, pitch, and volume by passing additional arguments to the gTTS() function. For example:

tts = gTTS('Hello!', speed=1.5, pitch=2.0, volume=0.8, lang='en', credentials=credentials)

In this example, we set the speech speed to 1.5 times the normal speed, the pitch to be higher than normal, and the volume to be lower than normal.

Combining Multiple Text Inputs

You can combine multiple text inputs into a single synthesized speech output by concatenating them using the '+' operator. For example:

text = 'Hello!' + ' How are you?'
tts = gTTS(text, lang='en', credentials=credentials)

In this example, we concatenate the strings 'Hello!' and 'How are you?' to create the input text 'Hello! How are you?'.

Language Support

The Google Text-to-Speech API supports a wide range of languages and dialects. You can change the language by setting the lang parameter. The language code should be in the format [language]-[region] (e.g., 'en-US' for US English or 'es-ES' for Spanish).

tts = gTTS('¡Hola!', lang='es', credentials=credentials)

In this example, we set the language code to 'es', which represents Spanish.

Conclusion

In this tutorial, you learned how to create a speech synthesis app using Python and the Google Text-to-Speech API. We covered the necessary setup steps on the Google Cloud Platform, installing the required dependencies, writing the code to interact with the API, and running the app to synthesize speech from text. We also explored some advanced usage options provided by the gTTS library.

Speech synthesis opens up a world of possibilities for applications such as accessibility, voice assistants, and content creation. With Python and the Google Text-to-Speech API, you have the tools to bring your ideas to life. Happy coding!