© Brighteon.com All Rights Reserved. All content posted on this site is commentary or opinion and is protected under Free Speech. Brighteon is not responsible for comments and content uploaded by our users.
Moshi is the the lowest latency conversational AI ever released.
On July 4, kyutai_labs introduced Moshi, the lowest latency conversational AI ever released. Moshi can perform small talk, explain various concepts and engage in roleplay using many emotions and speaking styles. In this video, watch Moshi talk like a pirate and in a spooky whisper!
Talk to Moshi here: https://moshi.chat/?queue_id=talktomoshi .
____________________________________
More Info:
According to Philipp Schmid, @_philschmid on X,
Moshi:
> Expresses and understands emotions, e.g. speak with “french accent”
> Listens and generates Audio/Speech
> Generates realistic, human-like speech In a variety of accents
> Supports 2 streams of audio to listen and speak at the same time
> Used Joint pre-training on mix of text and audio
> Used synthetic data text data from Helium a 7B LLM (Kyutai created)
> Is fine-tuned on 100k “oral-style” synthetic (conversations) converted with TTS
> Learned its voice from synthetic data generated by a separate TTS model
> Achieves a end-to-end latency of 200ms
> Has a smaller variant that runs on a MacBook or consumer-size GPU. 🤯
> Uses watermarking to detect AI-generated audio (WIP)
> Will be released open source!!!
____________________________________
All clips used for fair use commentary, criticism, and educational purposes. See Hosseinzadeh v. Klein, 276 F.Supp.3d 34 (S.D.N.Y. 2017); Equals Three, LLC v. Jukin Media, Inc., 139 F. Supp. 3d 1094 (C.D. Cal. 2015).
____________________________________
artificial intelligence, technology, AI, large language models, LLMs, interactive





