Thanks to Josh Phillips for hosting last week while I was out! I should have the recording up soon at https://hsv.ai/videos/
This week we will take a look at Speech to Text models in three different categories:
- Products that create audio from text in an offline mode
- APIs that can be integrated into a product
- Open source models that you can host
Each of these presents different challenges that we’ll cover such as latency, realism, and hallucination. Here’s the list of products and models so far, so if you don’t see your favorite in the list, let me know and we’ll check it out as well:
- Parler
- Coqui
- Bark
- OpenAI (6 models)
- BASE TTS (Amazon)
- MetaVoice
- MeloTTS
- ElevenLabs
- Facebook MMS
Also – a few of us went to the Huntsville AI and Machine Learning Technology Exchange and Expo last week, so we might do an overview of those topics if time permits. |