- Microsoft tts voices and streaming code#
- Microsoft tts voices and streaming trial#
- Microsoft tts voices and streaming plus#
It’s going to generate a new file on your server every time you call it, which may have an impact for you. You need to have a good think about whether this approach will work for you. The Bing Speech API is pretty fast, so that can all happen real-time without the user noticing.ĭisclaimer: this is the point at which I should warn you, this is all sample code. The way I’ve worked around this is to create a class which calls the API, converts the stream into a WAV file on the server, then returns the location of that file back so that the Teams API can call it.
![microsoft tts voices and streaming microsoft tts voices and streaming](https://miro.medium.com/max/1200/1*sxtNGKgRUvezCfxrvbyAww.png)
The tricky part here is that the Teams Calls & Meeting API will only accept a pre-recorded audio file, but we want to dynamically generate one. The response from this POST is a stream, representing a WAV audio file (exactly what type is dependent on some headers you can set, details later). X-Microsoft-OutputFormat: riff-16khz-16bit-mono-pcm The body of the POST message is Speech Synthesis Markup Language (SSML), a XML-based markup language for defining how TTS should be performed. Once you have your Bearer, then you can POST to using it as the Authorization. Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY You do this by POSTing to with your subscription key set in a header named Ocp-Apim-Subscription-Key. Once you have a subscription key, you have to exchange it for a Bearer token. If you already have an Azure account and just want to jump straight to creating a key there, create a new Bing Speech resource, then go to Keys to copy your subscription key. To get started, go to /en-us/try/cognitive-services. After that, you can create a key in Azure with varying pricing structures.
Microsoft tts voices and streaming trial#
It’s part of Azure Cognitive Services and is free to try for 7 days with a trial key.
Microsoft tts voices and streaming code#
This is a really brief summary of how it works, so that when we go through the code you’ll know what’s happening. The Microsoft Bing Speech API is a simple API that nicely abstracts the whole process of performing TTS. This is all fine, but what if we want to do TTS – and dynamically specify what we want to say? We can’t use pre-created audio files in this case. The list in that argument is a POCO containing a URI which is the location of the file to play. (probably because it’s been built from the JSON of the HTTP call, which has an array of prompts): ICall call = Īwait call.PlayPromptAsync(*A list of media prompts to play*).ConfigureAwait(false) The PlayMediaPrompt takes (oddly) an array of media to play. I’m going to concentrate on the C# wrapper, but most of what I’m going to say is equally applicable to the RESTful API call as well, as it takes the same input. This is done via the PlayPrompt call – either via a RESTful API call or via the C# wrapper. The media must already exist though – the input to the API is the file location of a previously recorded WAV file. The newly released Calls & Meeting API includes the ability for bots to play media over audio to listening humans.
Microsoft tts voices and streaming plus#
However, using the abilities it does have, plus the Bing Speech API, we can recreate the same functionality, enabling us to create similarly rich solutions in Microsoft Teams. Today, the Calls & Meeting API for Microsoft Teams does not include the ability to perform TTS. In Skype for Business, UCMA provided TTS capabilities, enabling a UCMA bot to dynamically ‘speak’ to a user. This can be very useful when working with audio-based bots, such as when creating IVR solutions or other automated workflows that involve a system ‘talking’ to a user. Text-to-Speech (TTS) is the ability of a system to convert a string into an audio file. How to: perform Text-To-Speech (TTS) with a Microsoft Teams bot using Bing Speech API and Teams Calls & Meetings API Introduction