Convert text-to-speech using various engines, including Amazon Polly, Coqui TTS, Google Cloud Text-to-Speech API, and Microsoft Cognitive Services Text to Speech REST API.
With the exception of Coqui TTS, all these engines are accessible as R packages:
aws.polly is a client for Amazon Polly.
googleLanguageR is a client to the Google Cloud Text-to-Speech API.
conrad is a client to the Microsoft Cognitive Services Text to Speech REST API
Usage
tts(
text,
output_format = c("mp3", "wav"),
service = c("amazon", "google", "microsoft", "coqui"),
bind_audio = TRUE,
...
)
tts_amazon(
text,
output_format = c("mp3", "wav"),
voice = "Joanna",
bind_audio = TRUE,
save_local = FALSE,
save_local_dest = NULL,
...
)
tts_google(
text,
output_format = c("mp3", "wav"),
voice = "en-US-Standard-C",
bind_audio = TRUE,
save_local = FALSE,
save_local_dest = NULL,
...
)
tts_microsoft(
text,
output_format = c("mp3", "wav"),
voice = NULL,
bind_audio = TRUE,
save_local = FALSE,
save_local_dest = NULL,
...
)
tts_coqui(
text,
exec_path,
output_format = c("wav", "mp3"),
model_name = "tacotron2-DDC_ph",
vocoder_name = "ljspeech/univnet",
bind_audio = TRUE,
save_local = FALSE,
save_local_dest = NULL,
...
)
Arguments
- text
A character vector of text to be spoken
- output_format
Format of output files: "mp3" or "wav"
- service
Service to use (Amazon, Google, Microsoft, or Coqui)
- bind_audio
Should the
tts_bind_wav()
be run on after the audio has been created, to ensure that the length of text and the number of rows is consistent?- ...
Additional arguments
- voice
Full voice name
- save_local
Should the audio file be saved locally?
- save_local_dest
If to be saved locally, destination where output file will be saved
- exec_path
System path to Coqui TTS executable
- model_name
(Coqui TTS only) Deep Learning model for Text-to-Speech Conversion
- vocoder_name
(Coqui TTS only) Voice coder used for speech coding and transmission
Value
A standardized tibble
featuring the following columns:
index
: Sequential identifier numberoriginal_text
: The text input provided by the usertext
: In case original_text exceeds the character limit, text represents the outcome of splitting original_text. Otherwise, text remains the same as original_text.wav
: Wave object (S4 class)file
: File path to the audio fileaudio_type
: The audio format, either mp3 or wavduration
: The duration of the audio fileservice
: The text-to-speech engine used