Skip to contents

Convert text-to-speech using various engines, including Amazon Polly, Coqui TTS, Google Cloud Text-to-Speech API, and Microsoft Cognitive Services Text to Speech REST API.

With the exception of Coqui TTS, all these engines are accessible as R packages:

  • aws.polly is a client for Amazon Polly.

  • googleLanguageR is a client to the Google Cloud Text-to-Speech API.

  • conrad is a client to the Microsoft Cognitive Services Text to Speech REST API

Usage

tts(
  text,
  output_format = c("mp3", "wav"),
  service = c("amazon", "google", "microsoft", "coqui"),
  bind_audio = TRUE,
  ...
)

tts_amazon(
  text,
  output_format = c("mp3", "wav"),
  voice = "Joanna",
  bind_audio = TRUE,
  save_local = FALSE,
  save_local_dest = NULL,
  ...
)

tts_google(
  text,
  output_format = c("mp3", "wav"),
  voice = "en-US-Standard-C",
  bind_audio = TRUE,
  save_local = FALSE,
  save_local_dest = NULL,
  ...
)

tts_microsoft(
  text,
  output_format = c("mp3", "wav"),
  voice = NULL,
  bind_audio = TRUE,
  save_local = FALSE,
  save_local_dest = NULL,
  ...
)

tts_coqui(
  text,
  exec_path,
  output_format = c("wav", "mp3"),
  model_name = "tacotron2-DDC_ph",
  vocoder_name = "ljspeech/univnet",
  bind_audio = TRUE,
  save_local = FALSE,
  save_local_dest = NULL,
  ...
)

Arguments

text

A character vector of text to be spoken

output_format

Format of output files: "mp3" or "wav"

service

Service to use (Amazon, Google, Microsoft, or Coqui)

bind_audio

Should the tts_bind_wav() be run on after the audio has been created, to ensure that the length of text and the number of rows is consistent?

...

Additional arguments

voice

Full voice name

save_local

Should the audio file be saved locally?

save_local_dest

If to be saved locally, destination where output file will be saved

exec_path

System path to Coqui TTS executable

model_name

(Coqui TTS only) Deep Learning model for Text-to-Speech Conversion

vocoder_name

(Coqui TTS only) Voice coder used for speech coding and transmission

Value

A standardized tibble featuring the following columns:

  • index : Sequential identifier number

  • original_text : The text input provided by the user

  • text : In case original_text exceeds the character limit, text represents the outcome of splitting original_text. Otherwise, text remains the same as original_text.

  • wav : Wave object (S4 class)

  • file : File path to the audio file

  • audio_type : The audio format, either mp3 or wav

  • duration : The duration of the audio file

  • service : The text-to-speech engine used