Configure Amazon Polly with a good file format (e.g., mp3), enter the text and choose a neural speech engine. It doesn’t make much difference if you don’t plan many intermediary processing steps, as the final file size should anyway be extremely low to bring your finished project to Facebook or Instagram. MP3 goes up to a sample rate of 24000 Hz, PCM is limited to 16000 Hz.Ĭhoose either PCM for uncompressed sound or go with MP3. In its additional settings, Polly offers MP3, OGG, PCM and Speech Marks. Neither Amazon Polly nor the Microsoft Azure Text-to-Speech cognitive service can directly produce an m4a audio file. Generating Audio using Text-to-Speech (mp3 / PCM) Spark AR has the following requirements on audio files: I’m using the free Audiacity tool, which integrates the open-source FFmpeg plug-in. This short tutorial is a guidance on how to convert artificially generated neural voices (in this case coming from an mp3 file as produced by Amazon Polly) to the m4a format accepted by Spark AR. Unfortunately, only M4A with specific settings is allowed. Currently, Facebook’s Spark AR Studio is restrictive with supported audio formats.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |