For a while I've wanted to try out the new AI voices for long-form narration, but everything I found required a subscription that didn't justify my limited usage. I came across the open Kokoro model [0] and the voices are very good -- good enough to listen to for hours without the fatigue I got from legacy, robotic TTS voices. The model is 82m parameters and designed to run fast, but I still struggled to get reasonable times from CPU inference on my 12-core laptop. I thought a cloud-based GPU service would let me generate audiobooks fast enough to feed my own self-hosted library, and that same pipeline could become a product other people could use.I had two goals in building this: get some exposure to AI multi-agent coding workflows, and build a TTS product targeting ebook to audiobook conversion specifically. 99% of ebookaloud was written by DeepSeek v4 in OpenCode. I've used about 750 million tokens costing $12 in credits over the course of a month, and I'm very pleased with the resu...
Want to discover more AI signals like this?
Explore Steek