

You can’t just write off capital expenditure though. The hardware, even for “effecient” MOE inference is still very expensive to buy, house, run, and cool. Even assuming open-weight model serving at $0 r&d for the models themselves, mixing high-prefill workloads doesn’t batch well with decode heavy concurrency (or other prefill-heavy jobs). The moment you do anything nontrivial you start running into very complicated architectural problems to efficiently solve at scale.
Hardware that is useful for 5-10 years at most, plus development and support for the inference workflows, doesn’t leave a lot of margin on the table.
My gut, along with basically everything I read, suggests that not most (even pure inference) shops are not profitable and are still floating on loans or vc money.


Very cool project!
What was the motivation to make it cross platform? On the Linux side I see immediate value but I’m struggling to understand the usecase for win/Mac.