I built this primarily to solve my own reading fatigue from jumping between HN, Lobsters, and Reddit, while keeping data completely within my own infrastructure. Architecture & Storage Choices: The entire stack is built to live on Cloudflare’s free/low-cost tier to make self-hosting accessible. Backend & API: Implemented using Hono on Workers. Database & State: Cloudflare D1 for strict relational storage (sources, cron tracking, user state). Vector Search: Cloudflare Vectorize managing the 768-dimension embeddings generated natively via @cf/baai/bge-base-en-v1.5. Ingestion Details: For Lobsters specifically, the background cron job polls the .json endpoints rather than scraping raw HTML. For bootstrapping historical preferences, I provided endpoints to ingest past JSON or RSS activity exports so the Cosine Similarity calculation actually has a baseline vector profile to match against. Current Constraints & Trade-offs: Model Choice: I opted for bge-base-en-v1.5 because it executes compl...
Want to discover more AI signals like this?
Explore Steek