Text-to-Speech (Speech Synthesis) — tts • text2speech

Convert text-to-speech using various engines, including Amazon Polly, Coqui TTS, Google Cloud Text-to-Speech API, and Microsoft Cognitive Services Text to Speech REST API.

With the exception of Coqui TTS, all these engines are accessible as R packages:

aws.polly is a client for Amazon Polly.
googleLanguageR is a client to the Google Cloud Text-to-Speech API.
conrad is a client to the Microsoft Cognitive Services Text to Speech REST API

Usage

tts(
  text,
  output_format = c("mp3", "wav"),
  service = c("amazon", "google", "microsoft", "coqui"),
  bind_audio = TRUE,
  ...
)

tts_amazon(
  text,
  output_format = c("mp3", "wav"),
  voice = "Joanna",
  bind_audio = TRUE,
  save_local = FALSE,
  save_local_dest = NULL,
  ...
)

tts_google(
  text,
  output_format = c("mp3", "wav"),
  voice = "en-US-Standard-C",
  bind_audio = TRUE,
  save_local = FALSE,
  save_local_dest = NULL,
  ...
)

tts_microsoft(
  text,
  output_format = c("mp3", "wav"),
  voice = NULL,
  bind_audio = TRUE,
  save_local = FALSE,
  save_local_dest = NULL,
  ...
)

tts_coqui(
  text,
  exec_path,
  output_format = c("wav", "mp3"),
  model_name = "tacotron2-DDC_ph",
  vocoder_name = "ljspeech/univnet",
  bind_audio = TRUE,
  save_local = FALSE,
  save_local_dest = NULL,
  ...
)

Arguments

text: A character vector of text to be spoken
output_format: Format of output files: "mp3" or "wav"
service: Service to use (Amazon, Google, Microsoft, or Coqui)
bind_audio: Should the tts_bind_wav() be run on after the audio has been created, to ensure that the length of text and the number of rows is consistent?
...: Additional arguments
voice: Full voice name
save_local: Should the audio file be saved locally?
save_local_dest: If to be saved locally, destination where output file will be saved
exec_path: System path to Coqui TTS executable
model_name: (Coqui TTS only) Deep Learning model for Text-to-Speech Conversion
vocoder_name: (Coqui TTS only) Voice coder used for speech coding and transmission

Value

A standardized tibble featuring the following columns:

index : Sequential identifier number
original_text : The text input provided by the user
text : In case original_text exceeds the character limit, text represents the outcome of splitting original_text. Otherwise, text remains the same as original_text.
wav : Wave object (S4 class)
file : File path to the audio file
audio_type : The audio format, either mp3 or wav
duration : The duration of the audio file
service : The text-to-speech engine used