Text-to-speech options?

friendly_ghost ( @friendly_ghost@beehaw.org ) · 2 years ago

Text-to-speech options?

Shareni ( @Shareni@programming.dev ) · 2 years ago

It seems like most of the nice-sounding ones are proprietary.

That’s pretty standard. Most FOSS projects don’t have corporations feeding them 100’s of thousands of dollars. Even when they do, well people still say gimp is far worse than ps. Blender is one of the rare complex projects that can compete with proprietary alternatives.

And any ideas why this is an underdeveloped area for open source?

My best guess is that it’s really expensive and time consuming. I’d be surprised if those really good proprietary models didn’t cost $100k+ just for training.

Lemongrab ( @Lemongrab@lemmy.one ) · 2 years ago

Its underdeveloped because it isn’t flashy, though quite necessary. Accessibility is one thing that often is neglected (from large support) in general, OSS or otherwise.

Piper tts has quality models. Here are 2 references using it with speech-dispatcher:

Hacked together: https://github.com/rhasspy/piper/discussions/328

Ready-made: https://github.com/Elleo/pied

ninpnin ( @ninpnin@sopuli.xyz ) · 2 years ago

AFAIK all of the state of the art TTS models are openly available on huggingface.co or similar. However, I’m not sure if there are nice front ends/UIs for them

h3ndrik ( @h3ndrik@feddit.de ) · 2 years ago

It’s been an underdeveloped topic for some time. espeak-ng is available on most distros and has some integrations available that somewhat tie it into the desktop. There are more modern solutions that sound way better. For example Coqui’s xtts2, maybe Piper which is part of Home Assistand nowadays. If your language is English, you got quite some more solutions available to choose from. But it’s a mixed bag if they sound nice, are easy to install (that also depends on which Linux distro you use and if it’s available as a package) and if they tie into the rest of the system. I’m not an expert on this, but I’d also like to have TTS and STT available on my Linux desktop witout putting to much effort into it.

valvin ( @valvin@beehaw.org ) · 2 years ago

TTS with coqui xTTS is fun to run with a known voice (10sec wav file is enought). It requires some resources but far less than STT like faster-whisper. I think the main issue is not running them but integrate them with the OS/softwares.

mariah ( @mariah@feddit.rocks ) · 2 years ago

I tried tts but i get a error trying to tts from a wav file

eveninghere ( @eveninghere@beehaw.org ) · 2 years ago

deleted by creator

Paragone ( @Paragone@beehaw.org ) · 2 years ago

IF they’ve the horsepower to run it, I gather there is a reversal of Whisper, called WhisperSpeech, or something like that, which uses an LLM to convert text to speech.

…

Here: found it for you.

https://github.com/collabora/WhisperSpeech

shortwavesurfer ( @shortwavesurfer@monero.town ) · 2 years ago

As a blind person, I think it’s mostly due to the fact that Linux is only on 4% of desktop. So not many blind people are using it and therefore demanding better software.