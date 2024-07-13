BrighteonBrighteon UniversityBrighteon Social
Open Source AI Talks Like a Human - In Real Time!
AmazingAI
AmazingAI
13 followers
0
117 views • 9 months ago

Moshi is the the lowest latency conversational AI ever released.

On July 4, kyutai_labs introduced Moshi, the lowest latency conversational AI ever released. Moshi can perform small talk, explain various concepts and engage in roleplay using many emotions and speaking styles. In this video, watch Moshi talk like a pirate and in a spooky whisper!

Talk to Moshi here: https://moshi.chat/?queue_id=talktomoshi .

More Info:


According to Philipp Schmid, @_philschmid on X,

Moshi:

> Expresses and understands emotions, e.g. speak with “french accent”

> Listens and generates Audio/Speech

> Generates realistic, human-like speech In a variety of accents

> Supports 2 streams of audio to listen and speak at the same time

> Used Joint pre-training on mix of text and audio

> Used synthetic data text data from Helium a 7B LLM (Kyutai created)

> Is fine-tuned on 100k “oral-style” synthetic (conversations) converted with TTS

> Learned its voice from synthetic data generated by a separate TTS model

> Achieves a end-to-end latency of 200ms

> Has a smaller variant that runs on a MacBook or consumer-size GPU. 🤯

> Uses watermarking to detect AI-generated audio (WIP)

> Will be released open source!!!


All clips used for fair use commentary, criticism, and educational purposes. See Hosseinzadeh v. Klein, 276 F.Supp.3d 34 (S.D.N.Y. 2017); Equals Three, LLC v. Jukin Media, Inc., 139 F. Supp. 3d 1094 (C.D. Cal. 2015).


artificial intelligence, technology, AI, large language models, LLMs, interactive

Keywords
technologyaiartificial intelligenceinteractivellmslarge language models
