The goal of text2speech is to harmonize various text-to-speech engines, including Amazon Polly, Coqui TTS, Google Cloud Text-to-Speech API, and Microsoft Cognitive Services Text to Speech REST API.
With the exception of Coqui TTS, all these engines are accessible as R packages:
- aws.polly is a client for Amazon Polly
- googleLanguageR is a client to the Google Cloud Text-to-Speech API
- conrad is a client to the Microsoft Cognitive Services Text to Speech REST API
text2speech directly incorporates the functionality of Coqui TTS.
Installation
You can install this package from CRAN or the development version from GitHub with:
# Install from CRAN
install.packages("text2speech")
# or the development version from GitHub
# install.packages("devtools")
devtools::install_github("jhudsl/text2speech")
Authentication
Check for authentication. If not already authenticated, users must individually configure it for each service.
library(text2speech)
# Amazon Polly
tts_auth("amazon")
#> [1] TRUE
# Coqui TTS
tts_auth("coqui")
#> [1] TRUE
# Google Cloud Text-to-Speech API
tts_auth("google")
#> [1] TRUE
# Microsoft Cognitive Services Text to Speech REST API
# tts_auth("microsoft")
Voices
List different voice options for each service.
# Amazon Polly
voices_amazon <- tts_amazon_voices()
head(voices_amazon)
#> voice language language_code gender service
#> 1 Zeina Arabic arb Female amazon
#> 2 Zhiyu Chinese Mandarin cmn-CN Female amazon
#> 3 Naja Danish da-DK Female amazon
#> 4 Mads Danish da-DK Male amazon
#> 5 Ruben Dutch nl-NL Male amazon
#> 6 Lotte Dutch nl-NL Female amazon
# Coqui TTS
voices_coqui <- tts_coqui_voices()
#> ℹ Test out different voices on the CoquiTTS Demo (<https://huggingface.co/spaces/coqui/CoquiTTS>)
head(voices_coqui)
#> # A tibble: 6 × 4
#> language dataset model_name service
#> <chr> <chr> <chr> <chr>
#> 1 multilingual multi-dataset your_tts [already downloaded] coqui
#> 2 bg cv vits coqui
#> 3 cs cv vits coqui
#> 4 da cv vits coqui
#> 5 et cv vits coqui
#> 6 ga cv vits coqui
# Google Cloud Text-to-Speech API
voices_google <- tts_google_voices()
head(voices_google)
#> voice language language_code gender service
#> 1 af-ZA-Standard-A <NA> af-ZA FEMALE google
#> 2 ar-XA-Standard-D Arabic ar-XA FEMALE google
#> 3 ar-XA-Wavenet-D Arabic ar-XA FEMALE google
#> 4 ar-XA-Wavenet-C Arabic ar-XA MALE google
#> 5 ar-XA-Wavenet-A Arabic ar-XA FEMALE google
#> 6 ar-XA-Standard-B Arabic ar-XA MALE google
# Microsoft Cognitive Services Text to Speech REST API
# voices_microsoft <- tts_microsoft_voices()
# head(voices_microsoft)
Convert text to speech
Synthesize speech with tts(text = "TEXT", service = "ENGINE")
# Amazon Polly
tts("Hello world!", service = "amazon")
# Coqui TTS
tts("Hello world!", service = "coqui")
# Google Cloud Text-to-Speech API
tts("Hello world!", service = "google")
# Microsoft Cognitive Services Text to Speech REST API
tts("Hello world!", service = "microsoft")
The resulting output will consist of a standardized tibble featuring the following columns:
-
index
: Sequential identifier number -
original_text
: The text input provided by the user -
text
: In caseoriginal_text
exceeds the character limit,text
represents the outcome of splittingoriginal_text
. Otherwise,text
remains the same asoriginal_text
. -
wav
: Wave object (S4 class) -
file
: File path to the audio file -
audio_type
: The audio format, either mp3 or wav -
duration
: The duration of the audio file -
service
: The text-to-speech engine used