Signal #82128NEUTRAL

Show HN: Llm.sql – Run a 640MB LLM on SQLite, with 210MB peak RSS and 7.4 tok/s

100

Hi HN,I built llm.sql, an LLM inference framework that reimagines the LLM execution pipeline as a series of structured SQL queries atop SQLite.The motivation: Edge LLMs are getting better, but hardware remains a bottleneck, especially RAM (size and bandwidth).When available memory is less than the model size and KV cache, the OS incurs page faults and swaps pages using LRU-like strategies, resulting in throughput degradation that's hard to notice and even harder to debug. In fact, the memory access pattern during LLM inference is deterministic - we know exactly which weights are needed and when. This means even Bélády's optimal page replacement algorithm is applicable here.So instead of letting the OS manage memory, llm.sql takes over:- Model parameters are stored in SQLite BLOB tables- Computational logic is implemented as SQLite C extensions- Memory management is handled explicitly, not by the OS- Zero heavy dependencies. No PyTorch, no Transformers. Just Python, C, or C++This gives ...

HackerNews Show AIabout 5 hours ago
Read Full Article

Explore with AI-Powered Tools

View All Signals

Explore more AI intelligence

Want to discover more AI signals like this?

Explore Steek
Show HN: Llm.sql – Run a 640MB LLM on SQLite, with 210MB peak RSS and 7.4 tok/s — Steek | Steek