We recently discussed an update to one of our beloved AI Robots, and today, we're building on that momentum by adding another sibling to the AI Robot family. Say hi to our new /text/speak Robot! Like the last AI Robot we covered, the /speech/transcribe Robot, this Robot's key functionality surrounds speech. But instead of writing text from speech, it lets you process text to speech (TTS).
Since this is such an exciting addition, my co-worker Joseph and I decided we would both create a project with this Robot at the heart of the design and then create a blog covering our design process to show this Robot's versatility. So if you enjoy this blog, look out for Joseph's coming soon!
For my project, I'll create a screen reader that can be easily plugged into your website to convert an entire page worth of text into speech at the click of a button.
Before getting started, you need to set up a standard website folder. Go ahead and create the following files:
index.js and, optionally,
style.css. To not bloat this blog, we'll only be discussing the contents of the js file and where to interface it with our HTML file. Still, if you want to copy the entire contents of our other files, along with the loading animation GIF that is used later on in this showcase, you can view the following repository.
<button id="button_event" onclick="runScript()">Generate</button> <div data-screenreaderLanguage="en-US"> <p>Sample Text</p> </div> <script src="index.js"></script>
Getting our text
In this code, we use the
Document.querySelectorAll() method to create an array containing all the information elements associated with the screenreaderLanguage dataset. We use a dataset, so if we want to exclude any readable text from our screen reader output (such as text inside a
<code> tag), we can close our div tag before that unwanted text then create a new div tag under the same dataset where we want our screenreader to continue processing text.
With that base information gathered, we still need to parse the readable text from our collected data. So first, we initialize an array variable to store our readable text, before using the
forEach() method to loop through each element in our result array and saving each element's innerText value.
Closing this section off, we set a language variable so our /text/speak Robot knows our targeted language later in our program. To do this, we declare our language variable with our dataset's
Converting for Uppy
With our data in place, we need to make it suitable for our Robodog instance to handle.
This line creates a new text file object that contains our previous text array.
In order to make users aware that there is processing going on, we want a loading animation to play while our /text/speak Robot is processing. To do this, we need to have some placeholder variables in place to replace our button element with a GIF. Then, if our text processing is successful, replace that GIF element with an audio player.
Here we have declared three variables to store elements of our HTML document; the first variable,
buttonEl, references an existing element, whereas the other two variables create new elements. Below that, we assign our replacement GIF element several attributes. Finally, we set a source GIF file from our working directory, set dimensions, and set an ID to refer to later in our program.
With those preliminary Steps in place, we can now integrate the main functionality of our program. Let's dive into creating the
"generate" button script.
To get started, declare a function with the same name as the function we referenced earlier in our HTML page.
With our function set up, we can use our first DOM manipulation method,
parentNode.replaceChild(). This replaces our pages button with the new GIF element when our function is executed.
Now that the GIF is in place, signifying that data is being processed, we can now use the Transloadit API to synthesize the speech for our program. To do this, we use the Robodog script that we imported into our HTML file.
In our new Robodog instance, there are two parameters we need to set up. The first is for our uploaded file, which we declare as the text file variable that we created earlier in our program. The other parameter is an object containing all other additional options.
In our object parameter, we need to set up a few Steps. First,
waitForEncoding needs to be set to
true so we can call our Assembly results later on in our program. Next, we also need to insert our authentication key into
auth parameter. This key can be found under your Transloadit console's Credentials tab. Lastly then, we need to insert our Template Steps: one for uploading and the other for the actual speech transcription.
A few parameters make up the speech Step. First, we must inform this Step that we want to use the text file we uploaded; this is accomplished with the
"use" parameter. Next, we need to specify using the
"robot" parameter to use our /text/speak Robot. The following parameter
"provider" lets us decide the backend API of the speech transcription. The available values are "gcp" and "aws". Each provider has a variety of different voices, but we'll just be sticking with the default. The final parameter we'll use,
"target language", tells our API in which language we want our text pronounced. This value comes from the
language variable we set up at the beginning of our program.
then() method to return a promise containing our speech data so we can refer to it later on.
Using the result
With Robodog all setup, we can now store the resulted speech so we can later play it from the browser.
then() method, using the promise provided by Robodog, we can parse the Assembly's JSON results and save the result URL to a new variable. With this new variable,
audio_url, we can use the same method for replacing elements that we used earlier in this showcase. Then, to get the audio player up and running, we set some attributes that provide the audio's source, tweak some control options, and instruct for the audio to begin playing as soon as the element has loaded. With that all in place, a nifty screen reader has now been generated and integrated into your website!
catch() method to handle any errors.
Any errors that may occur will be posted to the console, and our processing GIF will be swapped back for the original button element. This means if you were to momentarily lose a network connection, but then regain it, you could click the generate button for the speech generation to unfold again.
And with that, this showcase comes to an end! We hope you agree it has further demonstrated the sheer versatility of our API. Of course, you can simply add the program we created to any website for a quick and easy screen reader – but don't just stop there! Instead, use this blog as a set of building blocks to expand your projects with, and be sure to let us know about the results :) Our /text/speak Robot is available to all our paying customers, so you may want to consider upgrading your account if this blog was of interest to you. Our first paid tier comes in at $49/mo with 10GB of encoding data!
Share your Transloadit project to earn $300
We love seeing how our community uses Transloadit. Tweet @transloadit to get your proposal approved, share what you’ve built in the form of a blog post or a tutorial on your website, and earn a $300 Gift certificate of your choice as well as a full year of the Startup Plan, at no cost after you publish.
Follow us on Twitter:Follow @transloadit
We're SO STOKED to announce the Transloadit Community Plan! 🎉🎉🎉 Unlimited uploading, importing and exporting, 5GB of encoding/month, and access to 50 different file conversion features for all.— 🤖 Transloadit (@transloadit) July 2, 2020
Best part? It's free. Forever. 🤑 Find out more https://t.co/zXWLi3Xa0G pic.twitter.com/DlY5xz1mPG