Stay on Top of Enterprise Technology Trends
Get updates impacting your industry from our GigaOm Research Community
Would you like to get your products more deeply integrated into your customers’ daily lives?
Of course, what business wouldn’t.
An important first step towards making that happen is to give your products the ability to interact with your customers on their own terms. And the easiest way to do that is through natural speech.
The Power of Spoken Language
Human beings have been speaking to each other since the dawn of time. Speech is our most natural form of communication — and one of the reasons why we’ve been so successful as a species.
So let’s dive into what it takes to give your applications and devices the ability to speak in a manner that’s natural and comfortable to your customers.
Recent advancements in artificial intelligence have made this super easy, so it’ll be quick.
Got 5 Minutes?
This short guide will walk you through converting written text into a spoken audio file using the Amazon Polly text-to-speech service.
Note: Amazon Polly only provides a one-way speech capability — converting written text into spoken audio (text-to-speech). If you want to be able to understand spoken audio as well, you’ll additionally need a speech-to-text service, like Amazon Lex.
An easy on ramp for A.I.
This is a how-to guide intended for developers or tech-savvy business leaders looking for a proven entry point into A.I.-powered business systems.
The scripts we’ll be using are simple and easy to read — Amazon’s SDK has already done most of the heavy lifting for you.
So let’s get right to it…
What You’ll Need
Right off the bat, let’s get the initial requirements knocked out.
Download the source repository.
To start, let’s pull down the source files. (You’ll need a git client installed on your computer for this step.)
Note: If you prefer a different programming language, AWS provides SDKs for nearly every major language — and the scripts are very easy to port over.
Move to the directory you want to use for this demo and run the following commands in a terminal…
# Download source repository & install dependencies
git clone https://github.com/10xNation/amazon-polly-demo-php.git
Feel free to leave the terminal window open — you’ll need it soon.
Create an AWS account.
If you don’t already have an AWS account, go ahead and set one up.
Verify user permissions.
And if you aren’t using an administrator-level user account for AWS, you’ll need to make sure your account has full control over the Polly service.
Enter your credentials.
You’ll need to enter your API credentials into the script files. And you can do that by opening
speak_ssml.php and editing the following section in both files…
'credentials' => [ // Change these to your respective AWS credentials
'key' => 'XXXXXXXXXXXXXXXXXXXX',
'secret' => 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
XXXXXXXXXXXXXXXXXXXX with your user account’s “Access key ID” and
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX with your respective “secret.”
Now that the formalities are out of the way, let’s get to the good stuff…
Plain Text to Speech
Go to your Polly Dashboard.
Once you’re signed in, select a language and voice then hit “Listen to speech” to test them out — there are close to 50 different voices.
Interacting with the API.
Open up the
speak_text.php file and edit the text you want to speak…
// Change this to whatever text you want to convert to audio
'Text' => 'Hi! My name is Emma. Welcome to the Amazon Polly demo.',
Hi! My name is Emma. Welcome to the Amazon Polly demo.
You can also change the “VoiceId” if you want to use a different voice…
'VoiceId' => 'Emma'
Then to send your request to the API, simply run the following command (in the terminal you set up in What You’ll Need…
And that should deposit an audio file called
text.mp3 in the same directory — play it.
Easy enough. Let’s try it using a little Speech Synthesis Markup Language (SSML)…
SSML to Speech
Go back to your Polly Dashboard.
Still signed in? Select a language and voice then hit “Listen to speech” to test them out in SSML mode.
Interacting with the API.
Open up the
speak_ssml.php file and edit the text you want to speak…
// Change this to whatever SSML you want to convert to audio
'Text' => '
Hi! My name is Emma.
Welcome to the Amazon Polly demo.
Today is <say-as interpret-as="date">????0406</say-as>
Hi! My name is Emma. Welcome to the Amazon Polly demo. Today is <say-as interpret-as="date">????0406</say-as> to your desired output.
If you’d like to dive deeper into the markup syntax — which I highly recommend — here is an SSML reference. Compared to the plain text, SSML gives more granular control over the pronunciation, volume, and speech rate.
And as above, you can change the “VoiceId” if you want to use a different voice…
'VoiceId' => 'Emma'
And again, to send your request to the API, simply run the following command (in the terminal you set up in What You’ll Need…
That should deposit an audio file called
ssml.mp3 in the same directory — play it.
Custom pronunciation lexicons give you the ability to control how the system pronounces words in your text and SSML.
Once again, go back to your Polly Dashboard.
Assuming you’re still signed in…Click on the “Lexicons” link.
metals.pls file from the source code you downloaded in What You’ll Need. Here’s what it looks like…
<?xml version="1.0" encoding="UTF-8"?>
This lexicon tells the system to pronounce ‘Au,’ ‘Ag,’ and ‘Fe’ using their common names — assuming you activate the lexicon when making the API call. Make sure your lexicon and speech languages match as well.
You can test it by activating the lexicon on the Polly Dashboard…
Click on “Customize pronunciation” then select your lexicon from the drop down menu and enter
Au, Ag, and Fe are metals. in the text field — then hit “Listen to speech.”
Note: Currently, you can apply up to five lexicons to any given chunk of text.
You’ve built a text-to-speech engine that you can use for nearly anything — a mobile app, an IoT device, a chatbot — anything with access to a speaker.
And just a quick reminder…If you also integrate a speech-to-text engine, like Amazon Lex, in addition to Polly, you can give your products and apps the full two-way power of conversation.
You can dig deeper into Amazon’s Polly API — including additional tutorials — in the developer documentation.